-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate using different Collections libraries for filter speed-up #230
Comments
Some early numbers show that for a GSet of 1M inodes, the lacuna.bifurcan.Map implementation is 1/4 the memory cost of a java.util.HashMap and also about 4x as fast at filtering and collecting files and directories. This could be pretty substantial. |
Unfortunately the current lacuna.bifurcan Map, List, and Set implementations, and their derivatives, do not implement the java.util.Collection interface and therefore supporting them would involve re-writing the majority of NNA. Something I am not too keen on doing given the QueryEngine re-write I just went through. I think I will hold off on this for now. Here's hoping that I find either a similar library that maps to Collection or that the author of lacuna.bifurcan's repo adds such implementations. 🤞 |
I've been experimenting with microbenchmarks and a couple libraries. I think the path forward is going to be allowing a configuration option to experiment with different Collections. I think a simple path forward is, by default, to use ConcurrentHashMap (the current Collection). Then to allow regular HashMap. Then Trove. Trove has shown to take about as long as ConcurrentHashMap for loading while using alot less memory. It make take longer to run queries on however; need to experiment at scale. Meanwhile, HashMap is the fastest for loading - uses less memory but close to Trove. However I expect to see issues when iterating over HashMap for queries. |
I have been reading up on and looking into the following repo for a while now: https://github.com/lacuna/bifurcan/.
There are some interesting properties here similar to the optimization done in the NameNode's LightweightGSet implementation; namely the use of contiguous memory regions to reduce memory size and speed up iteration.
I believe it may be possible to use this library to speedup:
(1) Bootstrap time.
(2) General collection time for all queries.
(3) Reduce memory usage overall from intermittent Map collections.
I believe #224 has done enough optimization for now and is not necessary for the next immediate release, but perhaps this can be for a 1.6.4 or 1.7.x release.
The text was updated successfully, but these errors were encountered: