-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intern strings #811
Intern strings #811
Conversation
@layus Although a 10% speedup is nice, I have a few concerns:
The answers to these questions weren't obvious to me from a quick look at the |
@ryantrinkle Thanks for your review :-)
It should. There is a possibility of introducing incorrect behavior by sorting on the internedText rather than on the Text themselves. I had to reorder operations to sort after uninterning. That mostly because nix has sorted attrsets, which are not sorted internally.
No, by design. We could imagine something like ref-counting, but It does not really matter. Most of the Texts are held in memory for the whole lifetime of the program. An you get a better space savings by interning strings than by garbage collection. Try to guess how many times the text "out" appears. Or "src" :-)
Not at all, if you trust
By the way, that what's nix does. I also trust them for properly benchmarking it and balancing the pros and cons of prevented garbage collection. |
@@ -172,3 +181,39 @@ alterF | |||
alterF f k m = f (M.lookup k m) <&> \case | |||
Nothing -> M.delete k m | |||
Just v -> M.insert k v m | |||
|
|||
|
|||
type VarName = InternedText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea to hash cons. I even forgot, only remotely knew, such a thing.
Kudos for type synonym.
Please explain (maybe in the code): why hash consing particularly VarName
.
As VarName
seems like hash consing what looks as minor stuff when there are elephants in the heap.
I read that the main thing is that the hash consing is computationally heavy and works in some situations.
There should be things that are definitely great to hash cons in the project, so far my mind is not landed on any particular, nor I have practice with doing it.
You probably have a better idea and what to say, and people would want to ask this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the library - it is indeed is mostly used just for the plain text types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although a 10% speedup is nice
It is 10% speedup of the currently slow code. So the improvement is drastic. So I am also interested in how is VarName brings this degree of drastic improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great if there is a way to dedupe (import ... )
.
It may indeed be a drastic increase if the use of pkgs
and so on gets deduped. with ...
are the scopes that probably creave to have a dedup. with {pkgs, haskellPackages, lib, etc...}
can be considered almost unique, big & static scopes, their match hash tables should be small, I wonder are they deduped as is currently.
@@ -130,12 +132,12 @@ builtins = do | |||
where | |||
go b@(Builtin TopLevel _) = b | |||
go (Builtin Normal (name, builtin)) = | |||
Builtin TopLevel ("__" <> name, builtin) | |||
Builtin TopLevel (intern $ "__" <> unintern name, builtin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add coerce
type synonym instance for Interned <-> Text
. In that way the code would read easy. And now there is HLS to show the type signature for what the equivalent transformation is between.
The idea is to abstract this theme into a well-known coerce
abstraction. So the interned types get represented and read by people simply as Text just in a different type. So we free other people during their work from the need to go read the specific lib functions and read about hash consing to understand what intern <-> unintern
does, if their work is not related to the hash consing.
@Anton-Latukha The implementation is indeed rough. I started by changing the type synonym for VarName, and started to use the same name for a variety of things that do not really fit this varname name. And I had to implement many instances that were missing for internedText. Then I had to move everything in an easily includable module, so I stuffed everything in Nix.Utils. There is definitely some interface improvements to be made. Decisions still to be made / benchmarked
|
The heap usage shows a lot of tuples, most probably originating from the HashMap implementation. That makes me think it would be worth to conside hashtables Again, speedscope is very interesting to understand what is going on. |
@layus I put together an alternative way of doing interning here: obsidiansystems@0adb998#diff-3d76949b3ec463adc3cb7c86318020458995f4f80602365dc8e413c34f278caeR189-R229 The basic idea is that we do not intern globally; instead, when two values are compared, if they are equal, we opportunistically "intern" one of them by making it point at the other. This has a number of advantages:
Now to the performance: I tested by running this, throwing away the first two samples, and taking the "real" time of the rest:
That gave me 8 results for each. For master (7e6cd97), i got: 15.57,15.57,15.57,15.56,15.58,15.65,15.63,15.59 (average: 15.5900) From this test, it appears that global inlining improves performance by 3.16% and local inlining improves it by 3.04%. I'm not entirely sure how much of the difference is due to measurement error (maybe someone with stats knowledge can chime in!) but the two approaches seem pretty close! Further Info I decided I wanted to get some cleaner data, so I pinned the program to a single CPU and gave it The code to run this was (run as root):
The machine I ran this on has an AMD EPYC 7702P. Final Thoughts Given that global interning seems to have a large impact on @layus's machine, I would be very interested to know whether my "local" interning also does. |
Here is a proper benchmarking of the three solutions:
|
In this case global interning adds a variable slowdown. And local interning has no effect whatsoever. I was looking for a solution to make attrset lookups and matching faster. In many places we extract, convert, compare and lookup attribute sets. The flamechart at https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/layus/2dfb3f18b12c1daec65a69d34ee29639/raw/10d33407cbd78a4a2ea3ffe89ac5db3b91c31de7/hnix.eventlog.json spends a lot of time manipulating these attr sets. from lists to hashmaps and conversely. Attrsets related functions take about 11s out of 30. So local interning does not help, because it does not speedup these attset queries. But global interning does not help either. I guess the Next attempt: use hashtables ? |
Hunches
Beautiful.
|
Tests are very unstable. I guess they are bound to some resource (ram, cache ?) because running with anything else in parallel (despite having mutiple cores) changes the results. On a second run with firefox, I get the the same times for all the implementations. I am clueless ATM. |
Okay, a final benchmark, on a very still machine (hear coffee pause :-D) yields more stable results (see variance).
Interning does not look like a killer feature. It drives me crazy that we are 40* slower than nix. I guess I could live with a factor 10, but a factor 2 would make me happy. Sadly Christmas is already over 🎅 |
Tangental offtopicYe, one probably should reload into the Linux console to run more or less stable benchmarks on the desktop. The empty networking single-board setup is probably the good-enough testing platform, it would punish the hungry processes by metrics and so distinguish between computations harder, but in most cases, it is ARM, and ARM chips have different implementation and sets of instructions that have different optimizations and GHC probably lands not perfectly on it, but overall it should give some results. I personally do not bother that HNix is slower. It can be 100 times slower and I would not care. Because the first concern is bug2bug language coverage, second is a good language debugging story, people would use HNix because it is in Haskell, or because it can actually do what Nix can not - really check & debug the code. I just plan to do a lot of clean-up and refactoring and we would find what is the causes.
--- that is After new Store release comes-out (because I basically do not care that HNix is slow, especially before builtins and all clean-up & refactors. The base needs to fully work before the speed or memory would become a concern. People would start using HNix not because of the speed, but because it does something that Nix can not do. When Haskell.Nix infra breaks they would only care to find the place where and why, and Nix can not deliver that, HNix can, idk situation in But at the end of all, we should receive the maximally lean and flexible implementation by the code & design, since Haskell implements the language with the same paradigms. Because of the same math paradigms in languages in type system, in the most theoretical efficient implementation, HNix language core is a functor from Nix category into Hask category. So I think in the end game after our work is done the HNix should be performant enough for use. Hashing the structures definitely should be effective in some places in the implementation. Maybe narrow to the most logically cost-effective case and lets roll. |
Good catch ! That quite a good example of a counter-performance :-). Is it hanging on any branch we can merge ? |
Notifying on
|
Dropping for now. Adds complexity for no real gain. |
Some speedup is gained:
avg (5 runs): 21.2s vs 23.4s on evaluating nixpkgs.hello
Nix also caches the strings content. I guess we should just benchmark it. The only downside is that we encounter bigger texts, and compute a hash unnecessarily on some of them.