This document is also available in printable form:
librsync can be downloaded from http://rproxy.samba.org/download.html, and used, modified and redistributed under the terms of the GNU Lesser General Public License (version 2.1 or later).
librsync is a library for calculating and applying network deltas, with an interface designed to ease integration into diverse network applications. librsync is being developed as part of the rproxy <http://rproxy.samba.org/> and rsync <http://rsync.samba.org/> projects.
librsync encapsulates the core algorithms of the rsync protocol, which help with efficient calculation of the differences between two files. The rsync algorithm is different from most differencing algorithms because it does not require the presence of the two files to calculate the delta. Instead, it requires a set of checksums of each block of one file, which together form a signature for that file. Blocks at any in the other file which have the same checksum are likely to be identical, and whatever remains is the difference.
The library does not deal with file metadata or structure, such as filenames, permissions, or directories. To this library, a file is just a stream of bytes. Higher-level tools, such as rsync <http://rsync.samba.org> can deal with such issues in a way appropriate to their users.
The library supports three basic operations:
The public interface to librsync (rsync.h) has functions in several main areas:
rs_
'', or ``RS_
'' in the case of preprocessor symbols.
A key design requirement for librsync is that it should handle data as and when the hosting application requires it. librsync can be used inside applications that do non-blocking IO or filtering of network streams, because it never does IO directly, or needs to block waiting for data.
The programming interface to librsync is similar to that of zlib and bzlib. Arbitrary-length input and output buffers are passed to the library by the application, through an instance of rs_buffers_t. The library proceeds as far as it can, and returns an rs_result value indicating whether it needs more data or space.
All the state needed by the library to resume processing when more data is available is kept in a small opaque rs_job_t structure. After creation of a job, repeated calls to rs_job_iter() in between filling and emptying the buffers keeps data flowing through the stream. The rs_result_t values returned may indicate
Generating and applying deltas
All encoding operations are performed by using a *_begin
function to create a rs_job_t object, passing in any necessary initialization parameters. The various jobs available are:
After creating a job, input and output buffers are passed to rs_job_iter() in an rs_buffers_t structure.
On input, the buffers structure must contain the address and length of the input and output buffers. The library updates these values to indicate the amount of remaining buffer. So, on return, avail_out
is not the amount of output data produced, but rather the amount of output buffer space unfilled. This means that the values on return are consistent with the values on entry, but not necessarily what you would expect.
A similar system is used by libz
and libbz2
.
Some applications do not require fine-grained control over IO, but rather just want to process a whole file with a single call. librsync provides `whole-file' functionality to do exactly that.
Processing of a whole file begins with creation of a rs_job_t object for the appropriate operation, just as if the application was going to do buffering itself. After creation, the job may be passed to rs_whole_run(), which will feed it to and from two FILEs as necessary until end of file is reached or the operation completes.
Debugging trace and error logging
librsync can output trace or log messages as it proceeds. These follow a fairly standard priority-based filtering system (rs_trace_set_level()), using the same severity levels as UNIX syslog. Messages by default are sent to stderr, but may be passed to an application-provided callback (rs_trace_to(), rs_trace_fn_t).
Encoding and decoding routines accumulate compression performance statistics in a rs_stats_t structure as they run. These may be converted to human-readable form or written to the log file using rs_format_stats() or rs_log_stats() respectively.
Some additional functions are used internally and also exposed in the API: