-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serialized module fails if module is not in PYTHONPATH #123
Comments
Well, the extreme case of this is session pickling a whole set of modules, shipping them to a machine without the modules and a different OS, and expecting it all to be there. We could do it, but pickles would be large, so it would have to be another option. |
And if said module contains unpicklables, we are in trouble. |
Yeah, I don't know if you read my reply to the question on SO. I'm not sure if it's a good idea or not… |
Are we not pickling the full module for user modules anyway? I think we include module contents in the pickle. |
Currently, yes, unless diff is enabled. Not all of the classes are pickled, though, and function are pickled by reference, aren't they? |
Yes, standardly defined functions inside a module are pickled by reference -- as you can see with the
|
Currently, it's only possible to disable pickling by reference for classes. It would be fairly easy to extend this to modules, functions, and other objects… I believe, as |
After some deliberation, I think I categorize this as a feature request as opposed to a bug. I think |
Well, currently we have a sought of |
Yes, I agree… that's, in reality, where things seem to be headed. That could take some significant work. |
@sanbales: I originally built |
@mmckerns: Thank you for the pointer to |
I've hit this (functions pickling by reference) as a minor annoyance in some tests. I'm also shipping off pickled functions to computing clusters, and have some pytest test files where functions are defined. Effectively, I'm launching a separate (non-forked) process and the function is attempted to be unpickled there. However, pytest appears to load the test file without directly importing it or adding it's path to the pythonpath. So when the function is pickled, it references what appears to be a valid local module but actually isn't. Amusingly, this is only a problem with functions in the tests - dill appears to do the right thing in most other cases. I've worked around it just by adding the test directory to the remote instances launched as a part of the test, but definitely "Force reference=false" would be helpful for functions, as long as the module self-reference could be dropped. |
Hello, I'm having what seems to be a very similar issue. Can't pickle <class 'simulator_utilities.SimulationParameters'>: it's not found as simulator_utilities.SimulationParameters I decided to check the name parameter and the object.module parameter, when I run my code in the debugger, and in the application, and I get the same exact parameters, all four says 'simulator_utilities'. I'm at a loss, is there a known way to force this to work? I tried adding the directory containing simulator_utilities.py to my environment variables, but that didn't help.
This code is part of a class called GUI() thats in the simulator_utiltites.py file, and the class SimulationParameters is also in the simulator_utilities.py file. EDIT: So i tried a hacky solution right after posting this, and I really don't like this solution as it makes my code not portable to other computers, but i guess its okay for now. I just appended the directory that have simulator_utilities.py. Messy but oh well
|
May I propose a portable solution for this: let the user be responsible (at pickling time) for specifying which paths to be included at unpickling time. For more flexibility, the user can pass a function that takes care of this path manipulation. The function could involves |
@thisismygitrepo: Do you want the user, during the |
@mmckerns I have an object that I want to unpickle, but I can't achieve that before first changing directory to the repo from which it came from, or equivalently adding that to PYTHONPATH before unpickling. Thus, I, the user, am already responsible for doing this. Why can't we just formalize this contract between the user and the library at dump time? Lastly, I figured out a workaround
Could be too much for an average user. |
Your "package workaround" and stated rules about extending the PYTHONPATH for "scripts" are exactly what I designed the interface to be. So, this is good (because installing it as a package essentially puts the module on the PYTHONPATH). The design decision was that module dependencies need to be available somewhere on the PYTHONPATH. I think that's reasonable. I'm open to some setting that enables recursive module dependencies, or something similar, but it needs some discussion (and work if deemed worthwhile). |
After the recent changes to the
Regarding the idea of doing some operation prior to loading a pickle, it's perfectly possible to create composed pickle files with two or more pickle streams. It may be used to implement some kind of "load hook". Using that early example: import os
import dill
import foo
foo_path = os.path.dirname(foo.__spec__.origin)
def load_hook():
import sys
if foo_path not in sys.path:
sys.path.append(foo_path)
my_file = "./foo.pkl"
with open(my_file, 'wb') as file:
dill.dump(load_hook, file, recurse=True)
dill.dump(foo, file) At a different session in which import dill
my_file = "./foo.pkl"
with open(my_file, 'rb') as file:
load_hook = dill.load(file)
load_hook()
foo = dill.load(file)
print(foo) |
I know I'm restating this a bit, but to be clear... My expected usage pattern has been that if a module is to be dumped and shipped to be loaded elsewhere, that the module dependencies should either be already installed as packages, or the user can ship them with a service like |
I've just read the entire thread. Shipping whole modules is complicated because of dependencies and stuff. But there's already a partial solution for this in Standard Library that is zipped modules, which can hold entire packages in a single file. These can be imported directly with There's even an interesting possibility of concatenation a pickle file (e.g. from About the "load_hook" ideia, I'm thinking that the preload function's pickle stream should go after the object's stream, so that it would still read as a normal pickle file by Such a file would follow some format like:
To read the pickle executing the preload function before it, the reader must do roughly:
The stored stream size could be of stream (1) too. Maybe it's even better because it leaves space to change the file format in the future. |
Thinking about this "preload hook", I think I'll move the file-related stuff ( And here is a draft of an API proposal for the "preload hook" mechanism: import foo
import dill
# Create an object that needs some setup before unpickling.
obj = foo.SomeClass('example')
# Define the preload function.
def setup_foo()
#code: set up things in foo for loading obj
with open('obj.pkl', 'wb') as file:
dill.dump(obj, file, preload_hook=setup_foo) In a different session: import dill
with open('obj.pkl', 'rb') as file:
obj = dill.load(file, exec_preload=True) # load and call setup_foo(), then load obj Alternatively: import dill
with open('obj.pkl', 'rb') as file:
obj = dill.load(file) # just load obj, as if it was a common pickle file Some important design aspects are:
|
Haha! Look at what the top-rated unanswered question in SO (https://stackoverflow.com/questions/44560416/pickle-a-dynamically-imported-class) is about. Edit: also the second top-most: https://stackoverflow.com/questions/42613964/serializing-custom-modules-together-with-object-in-python |
@leogama: The reason we are still hung up is that we pulled an incomplete solution. When that's resolved or rolled back, we will release... we shouldn't be messing with anything else unless there's a good reason (e.g. the refactor is to correct a design flaw from leaking into a release). |
- virtualobj are loaded to __main__ unless the source package exists - this fixes an open issue with dill, uqfoundation/dill#123
- virtualobj are loaded to __main__ unless the source package exists - this fixes an open issue with dill, uqfoundation/dill#123
- virtualobj are loaded to __main__ unless the source package exists - this fixes an open issue with dill, uqfoundation/dill#123
Question from SO: http://stackoverflow.com/questions/31884640/does-the-dill-python-module-handle-importing-modules-when-sys-path-differs
Can I use dill to serialize and then load that module in a different process that has a different sys.path which doesn't include that module? Right now I get import failures:
Here's an example. I run this script where the foo.py module's path is in my sys.path:
Now, I run this script where I do not have foo.py's directory in my PYTHONPATH:
It fails with this stack trace:
So, if I need to have the same python path between the two processes, then what's the point of serializing a python module?
The text was updated successfully, but these errors were encountered: