February 14, 2013 by thisisjedimike
Inside Canonical, we have projects going on and testers testing them all over the world. This means transferring daily builds of projects that can be gigabytes in size, to people who might not have a great deal of bandwidth.
We had been using rsync to do this, using a previous daily build as a seed so that rsync only transferred data when it found a diff. However, that still meant transferring a lot of data, as rsync’s capability of calculating a delta between ISOs could be better. Still, it did have a speedup of 25-30%.
Zsync is a project that does it better. The deltas are significantly smaller for our use cases, and it goes over http which means we don’t have to set up shell access for users.
We tested both rsync and zsync against two of our projects that get a lot of activity, and have large ISOs to transfer to testers. Here’s the results of the delta size that each tool transferred.
Target ISO Size for Project X: 1976.3mb Age of seed ISO: 7 days rsync transfer : 1486.7mb zsync transfer : 531mb Age of seed ISO: 2 days rsync transfer : 1479.9mb zsync transfer : 375.9mb Target ISO size for Project Y: 1104mb Age of seed ISO: 7 days rsync transfer : 758mb zsync transfer : 66.1mb
So for our use case, zsync seemed the obvious choice. There were a couple of barriers to using it though.
Our image archives are HTTPS and protected by OpenID authentication. The zsync client does neither of these things, as it has its own internal HTTP client. The project itself has not seen any activity since 2010, so the chances of a new version using libcurl getting released are pretty much zero.
There were a couple of projects attempting to update zsync, at various stages of completion, but they were either incomplete for our needs (i.e. missing authentication methods) or at the early stages of development.
So, we spun our own solution.
Zsync-curl is a fork of the zsync client that uses libcurl. To solve our OpenID problem, it allows you to set arbritary cookies, so we have a script that authenticates against our OpenID provider, and sets the authenticated session cookie when calling our libcurl backed zsync binary.
The zsync-curl packages install a new zsync_curl binary which sits alongside zsync and zsyncmake from the official zsync distribution.
All of our benchmarks show this will save a significant amount of data transfer, so we’re looking forward to getting it out to our testers around the world and seeing how much it saves in real world usage.