-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: stream JLAP repodata writes #891
Conversation
JLAP previously had up to three copies of repodata simultaneously: the read bytes, the in-memory Serde representation, and the to-be-written bytes. Avoid having multiple copies of repodata in-memory simultaneously by streaming repodata reads and writes, which should bring us down to just one in-memory Serde copy.
I have a tip for recreating a more heavyweight scenario! :) We have quite a lot of old repodata on GHCR. https://github.com/orgs/channel-mirrors/packages/container/package/conda-forge%2Flinux-64%2Frepodata.json is where it is with a date tag. You can pull the repodata using oras ( |
Awesome stuff! Were you able to check wether this improves performance or memory consumption? |
Still haven't actually tested this code besides I suspect we may be consuming extra memory because of parallel patching of I can proceed with my patches, but I don't believe they will be adequate for solving my personal memory issue. Purely from a selfish perspective, the best value way to deal with this problem for me is probably to just disable JLAP. |
I've decided to revert memmap changes and buffered reader changes and swap to std::mem::drop, since it is the cheapest to implement for the time I have left to spend on this particular issue. I doubt that my changes will make enough of a difference to unlock use of JLAP on a memory-constrained environment, I think it would be better to just wait for sharded repodata to make JLAP irrelevant than invest more time here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Thanks for doing this deep dive though!
I have one small suggestion but otherwise looks good to me!
JLAP previously had up to three copies of repodata simultaneously: the read bytes, the in-memory Serde representation, and the to-be-written bytes. Avoid having multiple copies of repodata in-memory simultaneously by streaming repodata reads and writes, which should bring us down to just one in-memory Serde copy.
Does anybody have tips for recreating a more heavyweight JLAP patching scenario for the purpose of testing this patch's impact? For now, I am going on green test-cases and good vibes...