Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of ufuncs breaks depending on import order #392

Closed
renatolfc opened this issue Nov 19, 2020 · 5 comments · Fixed by python/cpython#23403
Closed

Serialization of ufuncs breaks depending on import order #392

renatolfc opened this issue Nov 19, 2020 · 5 comments · Fixed by python/cpython#23403

Comments

@renatolfc
Copy link

While investigating why I was unable to load_session when I had some ufuncs from scipy.special, I noticed that dill fails to determine the module the function belongs to depending on the order of the imports:

On a fresh interpreter session (Python 3.8.6), if you load dill prior to loading scipy.special, you get:

>>> import dill
>>> from scipy.special import gdtrix
>>> dill.dumps(gdtrix)
b'\x80\x04\x95\x1a\x00\x00\x00\x00\x00\x00\x00\x8c\x0b__mp_main__\x94\x8c\x06gdtrix\x94\x93\x94.'

Notice the incorrect __mp_main__.

Now, if you invert the ordering of the imports:

>>> from scipy.special import gdtrix
>>> import dill
>>> dill.dumps(gdtrix)
b'\x80\x04\x95$\x00\x00\x00\x00\x00\x00\x00\x8c\x15scipy.special._ufuncs\x94\x8c\x06gdtrix\x94\x93\x94.'

Oddly enough, if we call pickle.whichmodule, it finds the correct module irrespective of import order:

>>> import pickle
>>> from scipy.special import gdtrix
>>> pickle.whichmodule(gdtrix, gdtrix.__name__)
'scipy.special._ufuncs'

Or:

>>> from scipy.special import gdtrix
>>> import pickle
>>> pickle.whichmodule(gdtrix, gdtrix.__name__)
'scipy.special._ufuncs'

The only way I can think of getting an incorrect module is to pass name=None to whichmodule:

>>> pickle.whichmodule(gdtrix, None)
'__main__'

Which might indicate that some cache is initialized when the module is loaded and it somehow fails to update for some types?

@renatolfc
Copy link
Author

renatolfc commented Nov 19, 2020

This gets even more interesting. Even though dill fails to serialize that ufunc, pickle (augmented by dill) serializes it to something different, but equally incorrect:

>>> import dill
>>> from scipy.special import gdtrix
>>> dill.dumps(gdtrix)
b'\x80\x04\x95\x1a\x00\x00\x00\x00\x00\x00\x00\x8c\x0b__mp_main__\x94\x8c\x06gdtrix\x94\x93\x94.'
>>> import pickle
>>> pickle.dumps(gdtrix)
b'\x80\x04\x95@\x00\x00\x00\x00\x00\x00\x00\x8c\nnumpy.core\x94\x8c\x12_ufunc_reconstruct\x94\x93\x94\x8c\x0b__mp_main__\x94\x8c\x06gdtrix\x94\x86\x94R\x94.'
>>> dill.dumps(gdtrix)
b'\x80\x04\x95\x1a\x00\x00\x00\x00\x00\x00\x00\x8c\x0b__mp_main__\x94\x8c\x06gdtrix\x94\x93\x94.'

@renatolfc
Copy link
Author

The cause seems to be the addition of the __mp_main__ module into sys.modules when importing dill.

This makes pickle.whichmodule return __mp_main__ (for which it doesn't have a guard) instead of the correct module. From a fresh interpreter session:

Python 3.8.6 (default, Sep 30 2020, 04:00:38)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> from scipy.special import gdtrix
>>> import sys
>>> list(sys.modules).index('scipy.special._ufuncs')
278
>>> list(sys.modules).index('__mp_main__')
101

Now, if we import dill after importing gdtrix, we get what seems to be the correct module:

Python 3.8.6 (default, Sep 30 2020, 04:00:38)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scipy.special import gdtrix
>>> import dill
>>> import sys
>>> list(sys.modules).index('__mp_main__')
266
>>> import pickle
>>> pickle.whichmodule(gdtrix, gdtrix.__name__)
'scipy.special._ufuncs'
>>> list(sys.modules).index('scipy.special._ufuncs')
252

Since pickle.whichmodule returns whichever module that contains the function and comes first in the module list.

@mmckerns
Copy link
Member

Nice find @renatolfc. Just in case you weren't aware of it... for diagnostics I often use dill.detect.trace(True) to turn on pickle tracing in dill.

@renatolfc
Copy link
Author

Thanks for the tip, @mmckerns. I wasn't aware of that.

@renatolfc
Copy link
Author

I don't think this can be solved from the dill side without monkeypatching or implementing dill's own version of pickle's whichmodule.

Due to that, and since the PR I submitted to the Python standard library was accepted, I'm closing this. The fix was merged on master and backported to 3.8 and 3.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants