-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: memmap support when loading models #225
Comments
Yep, that sounds reasonable. I haven't done this though. Could you please give me a pointer? Maybe the place where you do this in joblib? |
In joblib this is too complex because we have all the numpy buffers in the main pickle stream. In skops this is simpler, it would be a matter of:
|
Very good suggestion, thanks. I have no experience with memmapping, but if someone else takes a stab at this feature, feel free to ignore my questions :)
|
@ogrisel I have been thinking about this, and there's an issue because of which I've been putting this aside. When it comes to loading from multiple processes, each process would try to unzip the file, we'd need to check if it's already done and use that, which we might be able to figure out a way to do so, but not straightforward I think. When we unzip the files onto disk, they're gonna stay there and we never would clean them up. In the long run on non-linux machines this is kinda bad (at list on Windows I don't think temp files are ever cleaned up). This is the main issue I'm not sure how to solve. |
It would be possible to register those folders for automated garbage collection with the loky ressourcetracker. The standard library also has a ressource tracker in the concurrent futures module but if i remember correctly it does not come with a clean up function for folders. |
The class I had in mind is https://github.com/joblib/loky/blob/047d80623b7cf2d43500ed56ee0320b3e41c3f81/loky/backend/resource_tracker.py. However this is not considered public API, so maybe it's a bad idea to rely on this for skops. It also exists in the multiprocessing.resource_tracer module of the standard library, but we would need to register new cleanup function for Maybe the safest solution is to not try to be too clever and let the user handle the clean-up explicitly by themselves depending on their application lifecycle. For the original unzipping, maybe the
|
Make it possible to unzip and memmap the numpy arrays in a
.skops
file to make it possible to load models with their large parameter arrays in shared memory between several Python processes concurrently on a given host.This would make it possible to:
The text was updated successfully, but these errors were encountered: