convert File History cache serialization to Smile #4268
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change converts the serialization format of the file history cache to Smile (already used for annotation cache). Like always, this is a trade-off: for bigger cache there is a reduction of indexing speed. Also, this paves a way towards reducing the memory footprint of the webapp by fetching history in chunks, rather than whole (#3541). Also, it makes certain features like #4087 posible.
For benchmark, I used the Linux Git repository. Indexer options used for the test:
Tested on my trusty laptop with Intel Core i7 and built-in SSD.
Sizes reported by
du -sh
. So for almost 3x space the history cache creation speed is reduced by ~40 %. The space increase may sound like a lot, however one has to realize that not only the scheme was switched, but also there is no compression now. The lack of compression together with the switch allows for the stream based access. Given that some modern file systems support compression (e.g. per dataset in ZFS), it should not be a big deal. After uncompressing all the.gz
files in the history cache, it went from 696 MB to 2,6 GB so obviously uncompressed XML has bigger footprint than Smile.As for the implementation, the tags are stored in separate files, with the
.t
suffix. This makes it possible to desrialize theHistoryEntry
objects as stream.This also fixes some unrelated problems with progress reporting found when testing.