Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling a Sympy lambdified expression causes built-in functions to be replaced with Numpy functions #167

Closed
nickderobertis opened this issue May 19, 2016 · 9 comments
Milestone

Comments

@nickderobertis
Copy link

I found some very strange behavior where using dill.dump to store a Sympy lambdified expression causes built-in functions such as min, max, abs, etc. to be replaced with Numpy functions of the same name.

from sympy import symbols, lambdify
import dill, inspect

def check_if_builtin(func):
    try:
        file = inspect.getsourcefile(func) #will throw TypeError for builtin
        return file
    except TypeError:
        return True



dill.settings['recurse'] = True #without this option, throws PicklingError

a, b, c = symbols("a b c")
expr = a + b + c
lambda_expr = lambdify([a, b, c], expr)

print(check_if_builtin(min))

dill.dump(lambda_expr, open('test.p', 'wb'))

print(check_if_builtin(min))

returns:

True
C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py

Naturally this breaks my code and I'm currently using a hackish fix to replace the builtins after pickling. Here is my setup:

Python 3.5.1
Windows 10 64-bit
dill: 0.2.5
sympy: 0.7.6.1
numpy: 1.11.0

@mmckerns
Copy link
Member

The reason is that sympy is doing something weird. It's populating globals with the numpy functions.

>>> from sympy import symbols, lambdify
>>> a,b,c = symbols("a b c")
>>> expr = a + b + c
>>> l_expr = lambdify([a, b, c], expr)
>>> l_expr.__globals__['min']
<function amin at 0x10521c1b8>

I'll have to see what can be done with this, but that seem to be the root of the issue. Maybe dill will have to check globals as opposed to __globals__ or something like that.

Note also that if you don't use dill.settings['recurse'] = True, it throws a different but similar error to #186.

@mmckerns
Copy link
Member

It seems the trigger is in dill.detect.globalvars:

>>> import dill
>>> from sympy import symbols, lambdify
>>> a,b,c = symbols("a b c")
>>> expr = a + b + c
>>> l_expr = lambdify([a, b, c], expr)
>>> min
<built-in function min>
>>> dill.detect.globalvars(l_expr, recurse=True, builtin=True)
{}
>>> min
<function amin at 0x11165e398>
>>> 

I'll keep digging.

@mmckerns
Copy link
Member

So it looks like the "offending" line in dill is the latter of the following (in globalvars):

globs = vars(getmodule(sum)) if builtin else {}
...
globs.update(getattr(orig_func, func_globals) or {})

This doesn't play well with sympy messing with the expression's __globals__.

@mmckerns
Copy link
Member

Oh, interesting... this could be a big part of the issue why the expressions __globals__ are numpy functions:

>>> import dill
>>> from sympy import symbols, lambdify
>>> a,b,c = symbols("a b c")
>>> expr = a + b + c
>>> l_expr = lambdify([a, b, c], expr)
>>> l_expr.__module__
'numpy'
>>>

A workaround for functions resulting from lamdify might be able to be applied to dill, because they seem to be able to be easily identified:

>>> l_expr.__doc__[:22] == 'Created with lambdify.'
True

Although, I'm not a huge fan of adding code that's specific to 3rd-party objects in dill, there is precedent to do so (numpy.ndarray objects, for example)... and the objects in sympy are in the upper tier of objects that should be considered.

I'm going to ping @asmeurer on this, just to be sure that what I'm seeing was done with purpose... as opposed to convenience (or otherwise). I'll have a think on how to proceed, especially after any input from @asmeurer.

@mmckerns
Copy link
Member

The fix definitely should be in dill, as getting the globalvars should not affect the actual globals. That's bad.

@asmeurer
Copy link

That's how lambdify works. Take something like

import sympy
x = sympy.Symbol('x')
f = lambdify(x, sympy.sin(x), 'numpy')

This is (roughly) equivalent to

f = eval("lambda x: sin(x)", globals={'sin': numpy.sin})

(the globals dict will also contain tons of other things, but only sin is needed).

In other words, lambdify converts a SymPy expression into a lambda function where the globals come from the namespace of the module in question (usually NumPy).

Note that the globals dict is not exactly numpy.__dict__. lambdify adds a custom translation dictionary, and custom translations can be supplied by the user as well.

@asmeurer
Copy link

We have complete control over the object that is returned. If dill cannot detect this generically we can add some kind of flag to the object. I would rather not rely on the contents of the docstring, as that could change (and we wouldn't consider it to be API breaking).

@mmckerns
Copy link
Member

@asmeurer: Ok, good to know, thanks. I think dill should be able to handle it generically, which I'll try to do first. Failing there, it'd be beneficial to be able to identify an object created with lambdify somehow (the __doc__ is all I could find after a cursory attempt).

mmckerns added a commit that referenced this issue Jan 25, 2017
…167)

git-svn-id: svn+ssh://svn.mystic.cacr.caltech.edu/pathos/dill@951 8bfda07e-5b16-0410-ab1d-fd04ec2748df
@mmckerns
Copy link
Member

fixed in cc76dab

@mmckerns mmckerns modified the milestone: dill-0.2.6 Feb 1, 2017
troyharvey added a commit to brstrat/dill that referenced this issue Jun 15, 2017
* tag '0.2.6': (71 commits)
  release dill-0.2.6
  fix: use correct pickler for arrays in other objects (see uqfoundation#163)
  fix: make a copy of result from vars so doesn't pollute globals (see uqfoundation#167)
  revert scripts to use .py extension (primarily for windows)
  don't use .py extension for scripts; edit MANIFEST accordingly
  completely handle namedtuple pickling
  Add fallback option for badly named namedtuples
  remove .py extension from scripts (see uqfoundation#189)
  fix bug so obj class can point to main
  allow for rare case where module's name attribute has been deleted
  remove workaround required for oddity in pypy __doc__ descriptor __name__
  changes for issue uqfoundation#175, including workaround for pypy __doc__ descriptor
  check for dictproxy in _getattr
  added ClassMethodDescriptorType to dill.objects and register it in dill.dill; modified special handling in _getattr for __prepare__ and __weakref__; fix _create_function when obj.__globals__ is None; modified test_detect and test_selected to reflect the above
  use sys.hexversion not dill.PY3 for __qualname__ condition in namedtuple test
  updated test_classdef so rename of namedtuple uses __qualname__
  convert test_detect and test_diff for pytest
  prepare test_module, test_moduledict, test_source, test_temp for pytest move dill_bugs to test_selected; prepare test_selected for pytest
  update test_classdef as pytest compatible
  Remove pytest dependency and fix test teardown
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants