Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.3.7 incorrectly pickles the class definition for module/class with the same name #634

Closed
TD22057 opened this issue Dec 18, 2023 · 7 comments
Labels
Milestone

Comments

@TD22057
Copy link

TD22057 commented Dec 18, 2023

This is a different bug than #628 and #604. When a class is defined in a file with the same name but imported into the package under that name (hiding the file name), then it's pickled with the full code block instead of just referencing the class name. Note that regular pickle has no problem with this case and performs correctly.

foo/__init__.py
    from .Bar import Bar

foo/Bar.py:
   class Bar:
      pass

then run

import foo
import dill

b1 = foo.Bar()
print( " in ID:", id( b1.__class__ ) )

s = dill.dumps( b1 )
b2 = dill.loads( s )

print( "out ID:", id( b2.__class__ ) )

and you will get (ID's will be different):

 in ID: 27881968
out ID: 29513904

If I add byref=True to the dump call, then I get the correct class ID but also get a warning:

 in ID: 25867200
.../lib/python3.11/site-packages/dill/_dill.py:412: PicklingWarning: Cannot locate reference to <class 'foo.Bar.Bar'>.
  StockPickler.save(self, obj, save_persistent_id)
out ID: 25867200

I think the issue is in _dill.py in the _locate_function() which isn't finding the definition properly. I'm guess this is because it tries to do an import instead of looking in sys.modules. If that function checked sys.modules for the module name, it would find the module and then could call getattr() on that to see the class definition.

@TD22057
Copy link
Author

TD22057 commented Jan 2, 2024

FYI - possible fix for you to evaluate is the following. It appears to fix the issue on my end and I believe it passes tests. But I don't understand the rational for why it was done this way in the first place so it would be good for you to look at. If you'd rather have this as a PR, let me know and I'll do that.

def _import_module(import_name, safe=False):
    try:
        # attempt to fix git bug #634.
        m = sys.modules.get(import_name)
        if m:
           return m
        #if import_name.startswith('__runtime__.'):
        #    return sys.modules[import_name]
        elif '.' in import_name:
            items = import_name.split('.')
            module = '.'.join(items[:-1])
            obj = items[-1]
        else:
            return __import__(import_name)
        return getattr(__import__(module, None, None, [obj]), obj)
    except (ImportError, AttributeError, KeyError):
        if safe:
            return None
        raise

@mmckerns
Copy link
Member

mmckerns commented Jan 3, 2024

Thanks for reporting, and the follow-up on this. I'll have a look... and yes, I'll make a PR if you don't.

@mmckerns mmckerns added the bug label Jan 3, 2024
@mmckerns mmckerns removed the bug label Jan 14, 2024
@mmckerns
Copy link
Member

Slight modification:

foo/__init__.py
    from .bar import bar, zap

foo/bar.py:
   class bar:
      pass

   class zap:
      pass

With the above, I don't think there's a difference for foo.bar.zap versus foo.bar.bar. So, it would seem the module/class having the same name doesn't matter. Also, I think this is the expected behavior -- dill makes a copy of the class instead of referencing it like pickle would. The warning is a bit annoying, but innocuous.

Can you tell me what you think the behavior should be?

@TD22057
Copy link
Author

TD22057 commented Jan 14, 2024

I'm not at a computer where I can check this. But I assume "with the above" means my hack/fix. That could be true - I didn't test every combination. The fix I proposed may not be correct and/or complete.

However that doesn't mean this isn't a bug and IMO it's a serious bug. The problem is that I have code that unpickles something and then says:

obj = dill.loads( data )
if isinstance( obj, foo.bar ):
   ...  do something

And the isinstance check will fail because dill made a clone of the class. So the unpickled object is no longer the same type as the object that was pickled. BTW regular pickle handles this case just fine - I don't know what it's doing but it correctly finds the package and unpickles to the original type.

@mmckerns
Copy link
Member

mmckerns commented Jan 14, 2024

Currently, the expected behavior is that dill makes a copy of the class, and doesn't pickle a reference unless it's requested that the class is pickled by reference. I initially thought that you were reporting that the bug was the warning being thrown, but if it's that dill is not pickling by reference... then this is the expected behavior.

This enables dill to deal with all sorts of different cases, such as the class being dynamically modified.

@TD22057
Copy link
Author

TD22057 commented Jan 14, 2024

Sorry - I didn't realize that was the desired behavior. I doubt it matters at this point but FYI in every use case we have (saving data, sending functions and data to parallel engines) it's really bad to make a copy of the class unless there is no other way to pickle it. I guess we'll try to figure out how to suppress that error and wrap the library to set byRef=True for all of our calls.

@TD22057 TD22057 closed this as completed Jan 14, 2024
@mmckerns
Copy link
Member

No worries. Also, those are the most common dill use cases, I believe. If you want classes to be pickled by reference in all cases, you can adjust the global settings. dill.settings['byref'] = True. -- then you won't need to use the keyword in the dump.

@mmckerns mmckerns added this to the dill-0.3.8 milestone Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants