Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump_session: add split_imports #101

Merged
merged 1 commit into from
May 31, 2015
Merged

dump_session: add split_imports #101

merged 1 commit into from
May 31, 2015

Conversation

abrasive
Copy link
Contributor

Setting split_imports=True will preprocess the module by groveling
through its top-level objects for things that are identical to those in
the other available modules. If it finds matches, it records the source
module and object names and then discards the objects during pickling.
On load, it then recovers the objects by "import x as y" using the
stored x, y.
This fixes issues with nasty unpickleable things in the depths of
libraries, as well as making the sessions smaller.

Fixes #79 and most of #78 - my pylab sessions start up with some dynamic GObject horribles in __main__.

@abrasive
Copy link
Contributor Author

Example:

ipython --pylab
...
In [2]: import dill
In [3]: foo = arange(10)
In [4]: del GPollableInputStream
In [5]: del GPollableOutputStream
In [6]: del GFileDescriptorBased
In [7]: del GInitiallyUnowned
In [8]: dill.dump_session('pylab.pkl', split_imports=True)

Loading - no surprises here:

In [1]: import dill
In [2]: dill.load_session('pylab.pkl')
In [3]: foo
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

@abrasive
Copy link
Contributor Author

This needs a better name than 'split_imports', I think, given that it ends up discarding most or all of the state of non-main modules. Works very well for the way I use IPython, but it's a substantial change in behaviour.
It could also be changed to split only modules from a whitelist and their children.

@mmckerns
Copy link
Member

@abrasive: Hmm.. thanks. This looks nice.

@matsjoyce: We broached this question in #66, but did not implement anything as a follow-up.
Also, how does this compare to #47 and related? (I know, it's been a year.)

@matsjoyce
Copy link
Contributor

Well, the use_diff bit should be able to handle this, but it may need tweaking and more testing (I may have time in a few weeks). I did a few quick tests, and it seems to mostly work, but something in the way ipython changes objects ends up with a few unpickleable objects landing. If I understand this patch correctly, its an extension of byref=True, so may fit in there quite well.

@mmckerns
Copy link
Member

I'm targeting a new release for end of June, and would like to get this and #86 and #100 sorted for it. I'll go about tagging issues soon, but if you don't think this would interfere with use_diff, I'm likely to pull it in after some interface manipulation. I'm starting to think kwds on dump (like byref) should be replaced with state variables on the pickler.

@mmckerns
Copy link
Member

@abrasive: I've reviewed this, and will accept it once any issues in this ticket are sorted out.

Could you make these mods?

  1. rename the 'split_importsflag tobyref`, and set the default to True
  2. make it 3.x compatible (e.g. iteritems doesn't exist in 3.x, etc)

I will likely rename the four helper functions, but they are fine as is for the moment.

@mmckerns
Copy link
Member

@matsjoyce: I might consider enabling this in general, not just for dump_session. The only real issue I see here is that this code might lose edits to objects (e.g. numpy.x = 1, which appear to be neglected as is).

@mmckerns
Copy link
Member

@matsjoyce: I think this can work semi-independely of #43 (and thus #47). It doesn't seem to remove any functionality as far as I can tell. Nothing I have tried fails with it and not without it, but if you can think of a failure case that does, let me know. It would have to be something like edits to imported objects being lost on dump_session.

@mmckerns
Copy link
Member

I take that back... It does appear it would remove some functionality, in roughly how I expected. If you import a pure python module and change it's __dict__, then it will lose those changes. The current version of dump_sessionpreserves changes (except for modules as noted in #43).

@abrasive
Copy link
Contributor Author

Amended as requested. I think having the parameter named 'byref' makes sense logically, but dump_session also deals with '_byref' which is confusing. I'll leave that up to you.

@mmckerns
Copy link
Member

@abrasive: Thanks, and true… but byref and _byref will be much more similar if if in the normal dump, we give this same type of behavior to modules. Currently, the contents of pure python modules are imported upon pickling, and there should be an option to just take the import string (i.e. byref). The only downside would be that you couldn't pickle a module with byref=True and class with byref=False inside of a dump_session.

@mmckerns
Copy link
Member

@abrasive: I was thinking something more like this, as it would allow iterators to be used for 2.x and 3.x:

def _module_map():
    items = 'items' if PY3 else 'iteritems'
    from collections import defaultdict
    modmap = defaultdict(list)
    for name, module in getattr(sys.modules, items)():
        if module is None:
            continue
        for objname, obj in getattr(module.__dict__, items)():
            modmap[objname].append((obj, name))
    return mod map

Would you mind? Then I can just pull and not have to edit it.

@abrasive
Copy link
Contributor Author

OK, done.

Setting byref=True will preprocess the module by groveling
through its top-level objects for things that are identical to those in
the other available modules. If it finds matches, it records the names
and discards the objects during pickling.
On load, it then recovers the objects by "import x as y" using the
stored x, y.
This fixes issues with nasty unpickleable things in the depths of
libraries, as well as making the sessions smaller.

Fixes #79 and most of #78.
mmckerns added a commit that referenced this pull request May 31, 2015
dump_session: add split_imports
@mmckerns mmckerns merged commit 4894e8f into uqfoundation:master May 31, 2015
@mmckerns
Copy link
Member

brilliant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dump_session fails with from numpy import *
3 participants