-
Notifications
You must be signed in to change notification settings - Fork 94
Hashcons NormalizedFilePath values for efficient heap usage #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Perhaps you can store the filepath chunks as a |
You are right, it will end up branching on the characters, which is a bit silly. I implemented your suggestion here: pepeiborra@49120e8 However, the heap profile doesn't change. Either I've done something wrong, or it's likely that the heap cost is dominated by the interned |
@wz1000 I don't think there's a material perf difference given most dir names are ~10 characters, and the cost of converting a String to Text is probably the same as 10 function calls in the Trie |
I tested several variations but didn't find any to be an overall improvement:
3 was a very minor improvement but not worth the extra complexity. For niceness, I have move the |
This is ready to merge and upload to Hackage please |
NormalizedFilePath
is one of the biggest costs in terms of heap space. The solution adopted here relies on theMemoTrie
package to create a global trie of normalised file paths. The trie is branched by dir name and not by character, which hopefully leads to a more compact representation.The profiles below have been collected using an Edit experiment on a synthetic repo containing 300 empty modules with maximal imports, (every module imports all the modules below it).
BEFORE
AFTER