convert File History cache serialization to Smile #4268

vladak · 2023-04-05T10:52:51Z

This change converts the serialization format of the file history cache to Smile (already used for annotation cache). Like always, this is a trade-off: for bigger cache there is a reduction of indexing speed. Also, this paves a way towards reducing the memory footprint of the webapp by fetching history in chunks, rather than whole (#3541). Also, it makes certain features like #4087 posible.

For benchmark, I used the Linux Git repository. Indexer options used for the test:

-P
-S
-H
-G
--progress
-s
/var/opengrok/src.linux
-d
/var/opengrok/data.linux
-W
/var/opengrok/etc/configuration.xml

_	size	time
before	696MB	0:42:18
after	1,9 GB	0:24:35

Tested on my trusty laptop with Intel Core i7 and built-in SSD.

Sizes reported by du -sh. So for almost 3x space the history cache creation speed is reduced by ~40 %. The space increase may sound like a lot, however one has to realize that not only the scheme was switched, but also there is no compression now. The lack of compression together with the switch allows for the stream based access. Given that some modern file systems support compression (e.g. per dataset in ZFS), it should not be a big deal. After uncompressing all the .gz files in the history cache, it went from 696 MB to 2,6 GB so obviously uncompressed XML has bigger footprint than Smile.

As for the implementation, the tags are stored in separate files, with the .t suffix. This makes it possible to desrialize the HistoryEntry objects as stream.

This also fixes some unrelated problems with progress reporting found when testing.

merging breaks the serialization

vladak added 8 commits April 4, 2023 22:54

initial stab at changing the History cache serialization to Smile

84fbd1b

merging breaks the serialization

fix the merging of old and new history

35c24b3

support tags

992218f

remove unused imports

31f5681

remove unused import

87cc4e1

fix array out of bounds access

bc73187

fix test

2aa3976

fix test

e18a043

vladak added the indexer label Apr 5, 2023

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 5, 2023

bump year

0637dd5

vladak merged commit 55d7874 into oracle:master Apr 5, 2023

vladak deleted the history_cache_smile branch April 5, 2023 12:42

vladak mentioned this pull request Apr 5, 2023

remove leftover test/experiment #4270

Merged

vladak mentioned this pull request Oct 31, 2023

display first line of commit comment for each file in directory listing #1813

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert File History cache serialization to Smile #4268

convert File History cache serialization to Smile #4268

vladak commented Apr 5, 2023 •

edited

Loading

convert File History cache serialization to Smile #4268

convert File History cache serialization to Smile #4268

Conversation

vladak commented Apr 5, 2023 • edited Loading

vladak commented Apr 5, 2023 •

edited

Loading