Slide 0


About this show...

GIMP is cool

Scheme is cool

This presentation was brought
to you by PinPoint, the new GIMP
presentation tool.

Text -> GIMP -> JPEG

http://linuxcare.com.au/projects/pinpoint/


Slide 1



rproxy: dynamic web caching

Martin Pool
Linuxcare, Inc.

http://linuxcare.com.au/rproxy/


Slide 2



Problem Statement

People use web resources repeatedly

Therefore: cache recently-used resources
on client or proxy

On each request, check currency: either
reload or use same

Increasingly, content is dynamic:
all-or-nothing caches are less effective


Slide 3



WIBNI

It would be nice if we could transfer
only differences

Must interoperate smoothly with HTTP

Must work on dynamic documents

Must fit into popular HTTP software


Slide 4



rsync

Fast file transfer protocol

Finds identical blocks between two
files, therefore the delta

Send per-block checksums

Search for matching blocks

Whatever's left is the difference


Slide 5



Integration with HTTP

Request/respond protocol

Streaming

Proxies

Every response may be different


Slide 6



Protocol

Client transmits signature of cached
resource to server

Server computes & sends differences

Signature sent as new HTTP header

Delta as HTTP Transfer-Encoding

Ignored if not supported


Slide 7



Standalone Proxy

Run on on client, one upstream

Compress across slow links

Already in Debian/Woody & Sid


Slide 8



libhsync

Integrate smoothly with many apps

Become the encoding library for
rsync 3.0

LGPL license for nonfree apps


Slide 9



Hosting Applications:

Mozilla: threaded

Apache: multi-process-model

Squid: select/poll-based

Therefore: do no IO in library, caller
supplies buffer

State machine


Slide 10



Privacy problems?

Client holds server-supplied data &
retransmits

A "stealth cookie"?

No more so than normal Last-Modified

Client-generated signatures are even
safer


Slide 11



Tuning

Encode particular content-types

Fuzzy-matching of resources

Cache signatures

Choose block size

~90% saving


Slide 12



Other schemes

Explicit versioning

Client-side variable portions


Slide 13



Bonus slide: rsync 3.0

Scale to larger trees (1TB+ data, 10M+
files, 1000 machines)

Less hardcoded structure

Cached signatures, fuzzy matching

Multicast 1:m, n:m


Slide 14



rsync 3.0/2

Scriptable (Perl/Python/...): filtering,
matching, reporting, ...

Simpler client-server architecture

Documented protocol

SSL?

rdiff tool: rsync-over-email?


Slide 15



http://linuxcare.com.au/rproxy/

Questions?

Come and see Linus's penguin at the
Canberra aquarium.


Produced by PinPoint at Sat Jan 20 14:15:59 2001UTC