-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irmin-pack: unify LRUs and add max memory config #2254
Conversation
|
||
let to_kinded t = Node t | ||
let of_kinded = function Node n -> n | _ -> assert false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer if this extensible type was hidden in the LRU implementation.. I think it should be possible to create behind the scene the fresh += Node
new type constructor by keeping the LRU a functor to instantiate for each type of value that we want to store in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea! I'll see what I can do. I do agree it would be nicer to hide this bookkeeping.
let hash = Hashtbl.hash | ||
end) | ||
|
||
type key = int63 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a problem atm, but just to be sure: this works because we only use the LRU once per kind of objects, and the keys of the different kind of objects are naturally distinct since they are file offsets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is a good point to bring up!
A problem could arise if a particular offset is re-used for a different object type than what was originally cached, but that can't happen since it is correlated to the ever-growing file offsets of the pack file (as you say). The unified LRU, as it currently is written, is definitely tied to these implementation details. I was trying to avoid a functorized LRU for simplicity, but (thinking out loud) if we move to that, perhaps I can encode the type in the key as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah hm, but having an extensible key type that support custom hashing/equality is a whole new world of pain... which is how I ended up thinking about "not sharing the hashtables" to sidestep the issue :P
To expand a bit on my comments, did you consider keeping the hashtables separate (since they cause type issues), but sharing the Lines 78 to 84 in 3222d26
The type (Note that this is just a thought that popped into my head! I don't think there's a strong incentive to do things differently as the current solution works nicely :) ) |
Ah, thanks for sharing your extra thoughts! I didn't think about changing the core LRU implementation too much since I wanted to keep options of switching it in the future (maybe a modified |
Previously, an `irmin-pack` store had one LRU per `Pack_store`, resulting in three LRUs corresponding to the three object stores: Commit, Node, and Content. This commit changes an `irmin-pack` store to only have one LRU that is shared by each object store. The motivation of this change is to make configuring the LRU size more intuitive for users of `irmin-pack`.
…min-pack, irmin-pack-tools, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.8.0) CHANGES: ### Added - **irmin** - Change behavior of `Irmin.Conf.key` to disallow duplicate key names by default. Add `allow_duplicate` optional argument to override. (mirage/irmin#2252, @metanivek) - **irmin-pack** - Add maximum memory as an alternative configuration option, `lru_max_memory`, for setting LRU capacity. (@metanivek, mirage/irmin#2254) ### Changed - **irmin** - Lower bound for `mtime` is now `2.0.0` (mirage/irmin#2166, @patricoferris) - **irmin-mirage-git** - Lower bound for `mirage-kv` is now `6.0.0` (mirage/irmin#2256, @metanivek) ### Fixed - **irmin-cli** - Changed `--store irf` to `--store fs` to align the CLI with what is published on the Irmin website (mirage/irmin#2243, @wyn)
Irmin 3.8 unifies 3 LRUs into 1 so we allow 3 more items into the unified LRU See mirage/irmin#2254
At a high-level, this PR:
irmin-pack
into oneWhen deciding how to count the memory usage of objects in the LRU, two paths were evaluated: using
Obj.reachable_words
or usingrepr
's size function. The former was too slow in replay benchmarks, so that latter is used. The size is used as-is for commits and contents, and a correction factor, based on benchmark observations, is applied to inode sizes.The definition of "large" for contents objects is 20kB. This seems like a reasonable value based on previous analysis of object size which indicated that less than 0.1% of contents objects are larger than 512B. Note: the previous weight based limit would allow any object < ~500kB into the LRU (based on Octez's default configuration of 5000 entry cap), but lowering the cap allows more objects into the cache.
Conclusions based on benchmarks (to be pushed to the benchmarks repo once I finish tidying the analysis and running a final bench):
cachecache
's LRU for this, but with some modifications to its code, it may provide better scaling. I ran its benchmarks against the modifiedirmin
LRU in this PR, and it does seem to scale better.The final benchmark I want to run is an 80mb, but here is some high-level analysis.Snippet of benchmark stats:
Here is memory usage comparisons. You can see:
entry-100k
and500mb
regress in performance (not shown isentry-60k
but it also regresses in perf)entry-30k
has the best performance (but 80mb is almost the same in perf and memory)entry-15k
has almost equivalent memory usage as3.7.1
but better performance