Skip to content

convert File History cache serialization to Smile #4268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 5, 2023

Conversation

vladak
Copy link
Member

@vladak vladak commented Apr 5, 2023

This change converts the serialization format of the file history cache to Smile (already used for annotation cache). Like always, this is a trade-off: for bigger cache there is a reduction of indexing speed. Also, this paves a way towards reducing the memory footprint of the webapp by fetching history in chunks, rather than whole (#3541). Also, it makes certain features like #4087 posible.

For benchmark, I used the Linux Git repository. Indexer options used for the test:

-P
-S
-H
-G
--progress
-s
/var/opengrok/src.linux
-d
/var/opengrok/data.linux
-W
/var/opengrok/etc/configuration.xml
_ size time
before 696MB 0:42:18
after 1,9 GB 0:24:35

Tested on my trusty laptop with Intel Core i7 and built-in SSD.

Sizes reported by du -sh. So for almost 3x space the history cache creation speed is reduced by ~40 %. The space increase may sound like a lot, however one has to realize that not only the scheme was switched, but also there is no compression now. The lack of compression together with the switch allows for the stream based access. Given that some modern file systems support compression (e.g. per dataset in ZFS), it should not be a big deal. After uncompressing all the .gz files in the history cache, it went from 696 MB to 2,6 GB so obviously uncompressed XML has bigger footprint than Smile.

As for the implementation, the tags are stored in separate files, with the .t suffix. This makes it possible to desrialize the HistoryEntry objects as stream.

This also fixes some unrelated problems with progress reporting found when testing.

@vladak vladak added the indexer label Apr 5, 2023
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 5, 2023
@vladak vladak merged commit 55d7874 into oracle:master Apr 5, 2023
@vladak vladak deleted the history_cache_smile branch April 5, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
indexer OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant