Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session: improvements to documentation + handle unpickleable objects #527

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

leogama
Copy link
Contributor

@leogama leogama commented Jul 16, 2022

Edit: merged with #475.


Additional changes to session/module saving and restoring based on discussions at #525.

  • Created an _open() function that returns a file/stream context manager
  • Created an is_pickled_module() function to test whether a pickle file was created by dump_module() and if it's module is importable or not. (It currently returns True for modules saved with dill.dump(module, file).)
  • Added a refonfail option for dump_module() to save objects that fail to pickle by reference, rolling back the pickle stream and saving one of the parent objects by reference if necessary.

To-do

Documentation:

  • Explain the distinction between calling dill.dump(module, file) and dill.dump_module(file, module), and the assumptions and use cases for each alternative — probably in the package's main documentation, or in a separate session.py module's docstring
  • Describe the effects of passing the module argument to load_module()
  • Better describe the side effects and the usage of the returned value from load_module()
  • Document the function is_pickled_module(filename) (what about its name and signature?)

Code:

  • Make is_pickled_module() return False for modules saved by name (and for modules saved by value inside other objects)
  • Deal with unpickleable objects in modules (likely in a separate PR)
  • Deal with unpickleable variables in the module's namespace

@mmckerns ☝🏼 You may start drafting any of these if you feel inspired. Use the "suggested changes" in the review page or modify the code directly (adding new commits) if you are able to. I set the "allow editing by maintainers" option, although I don't know what it means...

@mmckerns mmckerns linked an issue Jul 16, 2022 that may be closed by this pull request
@mmckerns mmckerns added this to the dill-0.3.6 milestone Jul 16, 2022
@leogama
Copy link
Contributor Author

leogama commented Jul 17, 2022

@mmckerns: this docs page affirms that you can push commits directly to the PR branch on my fork's repo. Have you ever tried this procedure?

@mmckerns
Copy link
Member

Have you ever tried this procedure?

Yes. I'm fine with that approach, or fork-PR'ing your fork. Either way.

@leogama
Copy link
Contributor Author

leogama commented Jul 19, 2022

I may have successfully solved the problem with unpickleable objects. Added a refonfail option to dump_session(). I thought of calling it refunpickleable, but no one would get the name right. If you have any better idea...

I've overridden the save() method to trap failed picklings and fallback to saving the object (or one of its parents) by reference. It's tricky to revert everything up to where the fallback strategy succeeds (must take care of framing, memo, trace state, etc.), but it seems to be working. I didn't test it extensively with the Stdlib modules yet.

[Edit: changed this in last commit] Note: the framing trick probably has unhandled corner cases. It may be necessary to disable framing completely (by downgrading the protocol to a maximum of 3 or something else) when using this option.

@mmckerns
Copy link
Member

It may be necessary to disable framing completely (by downgrading the protocol to a maximum of 3 or something else) when using this option.

I could see this definitely being the case, as the framing used in higher protocols is a good bit more complex. I think the primary reason people will be using this functionality is to "save a session" -- hence the emphasis here should be on being able to save the session without failing... with pickle size a secondary concern (and pickling speed a distant third).

@mmckerns
Copy link
Member

Document the function is_module_pickle(filename) (what about its name and signature?)

I'd probably prefer: is_pickled_module(object) or is_pickled_module(file), where the former can take an object (including a file handle, or string file path) and the latter would expect a file handle or string file path.

@leogama
Copy link
Contributor Author

leogama commented Jul 21, 2022

It may be necessary to disable framing completely (by downgrading the protocol to a maximum of 3 or something else) when using this option.

I could see this definitely being the case, as the framing used in higher protocols is a good bit more complex. I think the primary reason people will be using this functionality is to "save a session" -- hence the emphasis here should be on being able to save the session without failing... with pickle size a secondary concern (and pickling speed a distant third).

This isn't really about complexity. The problem is that the current frame of an object being pickled may be committed and flushed to file before an unpickleable child object raises an exception and can't even be saved by reference, forcing its siblings that were pickled previously with success to be discarded and one of its parents to be saved by reference.

@leogama
Copy link
Contributor Author

leogama commented Jul 21, 2022

I'd probably prefer: is_pickled_module(object) or is_pickled_module(file), where the former can take an object (including a file handle, or string file path) and the latter would expect a file handle or string file path.

I've changed the name, but didn't understand the distinction you made. The parameter signature is identical to the those of the other *_module() functions.

@leogama leogama changed the title Session: improvements to documentation, code and auxiliary functions Session: improvements to documentation + handle unpickleable objects Jul 22, 2022
@leogama leogama mentioned this pull request Aug 5, 2022
@mmckerns
Copy link
Member

@leogama: looks like the latest merge of master left two save methods in Pickler. Can you merge the save methods?

@mmckerns
Copy link
Member

I'm closing this, as it's been merged with #475.

Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will complete review in #475.

@leogama leogama deleted the document-session branch August 22, 2022 15:30
@leogama leogama restored the document-session branch October 4, 2022 14:43
Other relevant changes in this commit:

- Complement and improve the documentation of _dill.py and session.py
- Rename the submodule 'logger' back to 'logging'.
- Now, the submodule 'logging' is meant to be used also as a proxy to the StdLib module of the same name.
- The level for pickling tracing is now 'logging.TRACE', a custom level between INFO and DEBUG
- New: if logging level is set to INFO or lower, the variables saved by reference, either by
  'refimported' or by 'refonfail', are listed.
- New internal functions: _is_imported_module(), _is_stdlib_module() and _module_package()
- New private submodule '_utils': new helper _open() context manager for opening streams
- More tests added to test_session.py
- Etc.
@leogama
Copy link
Contributor Author

leogama commented Oct 4, 2022

@mmckerns Let's reopen this one. I've updated the PR from the current state of #475

@mmckerns mmckerns reopened this Oct 4, 2022
@mmckerns mmckerns self-requested a review October 4, 2022 15:42
@mmckerns mmckerns modified the milestones: dill-0.3.6, dill-0.3.7 Oct 23, 2022
@mmckerns mmckerns modified the milestones: dill-0.3.7, dill-0.3.8 Feb 5, 2023
@mmckerns mmckerns modified the milestones: dill-0.3.8, dill-0.3.9 Nov 14, 2023
@mmckerns mmckerns modified the milestones: dill-0.3.9, dill-0.4.0 Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

some errors with dump_module and load_module
2 participants