Allow delta downloads of mithril snapshots by changing the way the archive is created. (backward compatible with current procedures). #1597

stevenj · 2024-03-27T06:41:09Z

stevenj
Mar 27, 2024

Mithril snapshots are large and getting larger, however the majority of data between them remains static.
This means getting the latest snapshot when you have a previous one wastes a lot of bandwidth and time re-downloading files you already have.
Solutions like making epoch sized chunks have been discussed but this is a different packaging mechanism and would require reasonable tooling changes and would not be compatible with current processes.

The proposal here is to simply build the entire archive in a way to maximize redundancy between the archives such that an aware downloader can be made to only download the changes and recover the full file from that delta download.
This would save a lot of bandwidth on the download server, make it faster to sync updates to the mithril snapshots and this method also seemed to produce smaller archives overall.

I have made a simple POC that could enable delta mithril snapshots to be distributed so that a later snapshot only needs to download the differences from the previous one a user has.
This is fully symetrical, so if i had a snapshot made the way i propose and i then tried to sync a snapshot that was 10 versions newer, I would only need to download the differences that occured in those 10 snapshots ffrom the latest snapshot. The TLDR for making the archive is this:

find . -type f | sort | tar -cf $1.tar -T -
zstd -22 -v --progress --check --ultra -T0 --auto-threads=logical --rsyncable -B1048576 $1.tar -o $1.tar.zst

The archive made this way is repeatable, in a test i ran locally only 3 bytes in the first 1MB block of the latest snapshot changed, and then the next change occurred after 1.3GB of data.

Technically the easiest way to utilize this is with rsync, make a "preprod-latest.tar.zst" and rsync will happily update it to the latest version anytime the latest version changes it. Still, it would also be possible to build this functionality into the mithril cli itself. The client using the CLI could send the hash of the snapshot it has locally to the server over an API call, the backend could check that snapshot of that hash against the latest and return the list of blocks that need to be changed (in this case [0, 1244+]).
Using the http file download ability to download partial chunks of a file we could download block 0, and then start fetching all blocks from 1244 onwards, and reconstruct the new "latest" archive and save ourselves a 1.3GB transfer in the process.

This would be very efficient and very quick compared to fetching the otherwise 100% redundant 1.3GB of data every time.
On mainnet this would be a very large performance win.

Its also fully backward compatible with existing processes you could still pull a named snapshot every time and it doesn't matter that its been constructed this way.
Notes:
find ensure the immutable data always appears first in the archive and is sorted, so that data that doesn't change is always in exactly the same place in the archive. This is critical to maximize overlap in the compressed data.
In zstd i used -22 -ultra because why not use the strongest compression available (this turned the latest snapshot from 1.7GB to 1.4GB in my testing on a late preprod snapshot.
--check adds a zstd checksum to the file at the end, so once the new latest has been synced its easy to check if the archive still has integrity.
--rsyncable is the flag which is necessary to ensure the files blocks are orderly and there is maximum redundancy between archives.
-B1048576 sets the rsync block to 1MB so its efficient just to check for any change in a 1MB block and just pull a copy of that block vs trying to do a delta diff on every byte.

Doing this could save a massive amount of time, and make it feasible to run a full local node simply on delta snapshots, especially if the node is just being used to populate dbsync or what not. As new snapshots are posted every 1:15hrs and fetching a delta snapshot would take minutes on any reasonable connection, which could be more than reasonable timeliness of data for many applications.

This could also be done in reverse, when a new mithril archive is made it could be sent to the aggregator as a delta from the previous archive. Saving this bandwidth on the aggregator could reduce its operational costs and load.

stevenj · 2024-03-27T06:45:03Z

stevenj
Mar 27, 2024
Author

Further suggestions.
Rust already has at least 1 rust implementation of the rsync protocol:
https://crates.io/crates/arrsync
This would make integration of delta downloads into the rust based CLI simpler than using HTTP directly.
But one would need to run an rsync server on the aggregator to handle these downloads, which might be worthwhile given the performance and bandwidth savings.

1 reply

stevenj Mar 27, 2024
Author

Without using "latest" filenames, the downloader could simply follow this procedure:

take its latest snapshot archive.
Rename or copy to the latest archives name as informed by the aggregators metadata..
Then rsync against that copy.
The metadata in the aggregator could indicate if the archive was built for delta downloads using rsync (or another mechanism).

ghost · 2024-03-27T06:57:58Z

ghost
Mar 27, 2024

Thanks a lot @stevenj for the detailed proposal! And my apologies for not having suggested to write a discussion in the first place :)

2 replies

stevenj Mar 27, 2024
Author

Its no problem, having something like this would be super useful for me, so its very self serving :) The least I can do is make a detailed suggestion. If we had spare bandwidth I would ask a team member to look at contributing a enhancement to your code.

stevenj Mar 27, 2024
Author

but, as it would require service changes to the aggregator, much better to align on the approach and see if its viable given the current system before doing any more work on it.

jpraynaud · 2024-03-27T14:17:43Z

jpraynaud
Mar 27, 2024
Maintainer

Thanks @stevenj for this idea! I guess that it could probably work and save substantial time when restoring a node.

We had already been thinking about creating an incremental signature of the database, but it has not been implemented yet: the idea was to sign every immutable file in a Merkle tree, store them in separate archives and provide a download/verification mechanism for a range, but leveraging rsync looks like a very interesting alternative.

I am not convinced that the aggregator should support the rsync server on its own, but we could create another node that would have this responsibility.

The main modifications would probably be on the client side:

as we need to compute the digest of all the immutable files in order to verify if its signed by a Mithril certificate
this could lead to a situation where we unarchive immutable files that are not certified in the database directory of the Cardano node, and would have to handle rollback if that happens (this also means downtime of the Cardano node as it could probably not be running during the sync)

Regarding the compression level 22, we have decided to go with 9 as it was giving the best trade off between the time to compute the archive and the compression ratio on the final archive.

We could probably start a proof of concept in order to test an implementation 👍

0 replies

TrevorBenson · 2024-04-19T15:04:19Z

TrevorBenson
Apr 19, 2024

I love the idea of supporting some form of a delta snapshot download, supporting minimal transfer to go from the last snapshot downloaded to the current. I also love rsync, and have used it to great effect for many years in multiple IT situations. I'm not 100% convinced that using rsync in this case is the correct approach to take, or at least not a traditional rsync server/client model. However, this might align well with plans for how aggregators would operate, as I'm not as aware of the functionality and/or future plans for aggregators, but my understanding is they simply point to the locations snapshots are available and the client tries to find one place to download from. That said, what comes to mind for me is the centralization that a (traditional) rsync client/server model produces, but that also seems similar to what the client is currently doing.

Once the blockchain grows by a few 100GB more, or is multiple TBs in size, I think this could be a limiting factor when starting a new relay or node doing a complete sync of a snapshot.

What comes to mind for me is essentially like hybrid approach of a torrent protocol for decentralization and rsync for deltas, something akin to a BitTorrent Sync where the client would benefit from a decentralized download of the entire snapshot, or the delta between the most recent snapshot downloaded and the current snapshot available. The aggregator would act similar to a torrent tracker (which it kind of does now) and provides the list of "swarm" servers (places that have current snapshots available). Instead of the client picking a single server to download from, it distributes the load over multiple (as many as possible?) snapshot providers while only requesting the chunks that it does not already have.

1 reply

TrevorBenson Apr 20, 2024

Also, Please don't take my comments as a reason not to do a POC on the rsync implementation. These are just the thoughts and ideas running through my head as I think it over. Any snapshot delta feature is better than none.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow delta downloads of mithril snapshots by changing the way the archive is created. (backward compatible with current procedures). #1597

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Allow delta downloads of mithril snapshots by changing the way the archive is created. (backward compatible with current procedures). #1597

stevenj Mar 27, 2024

Replies: 4 comments · 4 replies

stevenj Mar 27, 2024 Author

stevenj Mar 27, 2024 Author

ghost Mar 27, 2024

stevenj Mar 27, 2024 Author

stevenj Mar 27, 2024 Author

jpraynaud Mar 27, 2024 Maintainer

TrevorBenson Apr 19, 2024

TrevorBenson Apr 20, 2024

stevenj
Mar 27, 2024

Replies: 4 comments 4 replies

stevenj
Mar 27, 2024
Author

stevenj Mar 27, 2024
Author

ghost
Mar 27, 2024

stevenj Mar 27, 2024
Author

stevenj Mar 27, 2024
Author

jpraynaud
Mar 27, 2024
Maintainer

TrevorBenson
Apr 19, 2024