Main Page Data Structures File List Data Fields Globals Related Pages

The librsync delta-encoding library

Author(s):: Martin Pool <mbp@samba.org>

Id:: main.dox,v 1.13 2001/03/12 01:30:50 mbp Exp

This document is also available in printable form:

More information about librsync can be found on the rproxy web site:

http://rproxy.samba.org/

librsync can be downloaded from http://rproxy.samba.org/download.html, and used, modified and redistributed under the terms of the GNU Lesser General Public License (version 2.1 or later).

Introduction

librsync is a library for calculating and applying network deltas, with an interface designed to ease integration into diverse network applications. librsync is being developed as part of the rproxy <http://rproxy.samba.org/> and rsync <http://rsync.samba.org/> projects.

librsync encapsulates the core algorithms of the rsync protocol, which help with efficient calculation of the differences between two files. The rsync algorithm is different from most differencing algorithms because it does not require the presence of the two files to calculate the delta. Instead, it requires a set of checksums of each block of one file, which together form a signature for that file. Blocks at any in the other file which have the same checksum are likely to be identical, and whatever remains is the difference.

The library does not deal with file metadata or structure, such as filenames, permissions, or directories. To this library, a file is just a stream of bytes. Higher-level tools, such as rsync <http://rsync.samba.org> can deal with such issues in a way appropriate to their users.

The library supports three basic operations:

Generating the signature S of a file A .
Calculating a delta D from S and a new file B.
Applying D to A to reconstruct B.

The library also provides the rdiff network delta tool command-line tool, which makes this functionality available to users and scripting languages.

Programming interface

The public interface to librsync (rsync.h) has functions in several main areas:

All external symbols have the prefix ``rs_'', or ``RS_'' in the case of preprocessor symbols.

Data streaming

A key design requirement for librsync is that it should handle data as and when the hosting application requires it. librsync can be used inside applications that do non-blocking IO or filtering of network streams, because it never does IO directly, or needs to block waiting for data.

The programming interface to librsync is similar to that of zlib and bzlib. Arbitrary-length input and output buffers are passed to the library by the application, through an instance of rs_buffers_t. The library proceeds as far as it can, and returns an rs_result value indicating whether it needs more data or space.

All the state needed by the library to resume processing when more data is available is kept in a small opaque rs_job_t structure. After creation of a job, repeated calls to rs_job_iter() in between filling and emptying the buffers keeps data flowing through the stream. The rs_result_t values returned may indicate

RS_DONE: processing is complete
RS_BLOCKED: processing has blocked pending more data
one of various possible errors in processing

These can be converted to a human-readable string by rs_strerror().

Generating and applying deltas

All encoding operations are performed by using a *_begin function to create a rs_job_t object, passing in any necessary initialization parameters. The various jobs available are:

rs_sig_begin(): Calculate the signature of a file.
rs_loadsig_begin(): Load a signature into memory.
rs_delta_begin(): Calculate the delta between a signature and a new file.
rs_patch_begin(): Apply a delta to a basis to recreate the new file.

Buffers

After creating a job, input and output buffers are passed to rs_job_iter() in an rs_buffers_t structure.

On input, the buffers structure must contain the address and length of the input and output buffers. The library updates these values to indicate the amount of remaining buffer. So, on return, avail_out is not the amount of output data produced, but rather the amount of output buffer space unfilled. This means that the values on return are consistent with the values on entry, but not necessarily what you would expect.

A similar system is used by libz and libbz2.

Warning:: The input may not be completely consumed by the iteration if there is not enough output space. The application must retain unused input data, and pass it in again when it is ready for more output.

Processing whole files

Some applications do not require fine-grained control over IO, but rather just want to process a whole file with a single call. librsync provides `whole-file' functionality to do exactly that.

Processing of a whole file begins with creation of a rs_job_t object for the appropriate operation, just as if the application was going to do buffering itself. After creation, the job may be passed to rs_whole_run(), which will feed it to and from two FILEs as necessary until end of file is reached or the operation completes.

Debugging trace and error logging

librsync can output trace or log messages as it proceeds. These follow a fairly standard priority-based filtering system (rs_trace_set_level()), using the same severity levels as UNIX syslog. Messages by default are sent to stderr, but may be passed to an application-provided callback (rs_trace_to(), rs_trace_fn_t).

Encoding statistics

Encoding and decoding routines accumulate compression performance statistics in a rs_stats_t structure as they run. These may be converted to human-readable form or written to the log file using rs_format_stats() or rs_log_stats() respectively.

Utility functions

Some additional functions are used internally and also exposed in the API:

encoding/decoding binary data: rs_base64(), rs_unbase64(), rs_hexify().
MD4 message digests: rs_mdfour(), rs_mdfour_begin(), rs_mdfour_update(), rs_mdfour_result().

Generated at Sun Mar 18 16:54:50 2001 for librsync by