-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to pickle dynamically create modules #52
Conversation
What do you think @mrocklin? Seems like a good contribution. |
_, path, _ = imp.find_module(part, path) | ||
is_dynamic = False | ||
except ImportError: | ||
is_dynamic = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to pull this logic out into a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
LGTM but I think the commits could be all squashed together. |
Great contribution, thanks ! |
Sounds like you are already fixing a case that I am interested in. :) Namely handling |
5ff409c
to
9226866
Compare
I've extracted out the |
Would it be worth testing the |
3b8b90c
to
340d175
Compare
I had forgotten to pass
One possible way to address that issue is to peek at For my use case, Big Query API in python notebooks produces dynamically generated modules which I want to save for later. |
@jakirkham I can add a test for |
self.modules.add(obj) | ||
self.save_reduce(subimport, (obj.__name__,), obj=obj) | ||
if is_dynamic: | ||
self.save_reduce(dynamic_subimport, (obj.__name__, vars(obj)), obj=obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj=obj
here is actually important to enable memoization, but it doesn't ensure that the produced dynamic modules will be singletons. Example:
dynmod.x = 1
m1, m2 = pickle.depickle([dynmod, dynmod])
m1.x = 2 # m1 and m2 are identical, m2.x is also 2
m3 = pickle.depickle(dynmod)
# m3 is a different module altogether
My feeling at least is that having good test coverage of the handling of dynamic modules will help keep things sane. Again they will probably be copy-and-paste versions of your existing tests that are trimmed down a bit. Still, if something goes wrong here, it is nice to be able to rule out some simpler cases and having these already written out will make it easier to construct variants. Again just a thought. |
340d175
to
77f5b03
Compare
@jakirkham, I added tests for |
Add test cases, special case Ellipsis and NotImplemented. Use custom logic in lieu of imp.find_module to properly follow subimports. For example sklearn.tree was spuriously treated as a dynamic module.
77f5b03
to
e7341b6
Compare
we can always change that again later, and it would just be a test/dev dependency |
@rgbkrk the test would have to be of the following form then:
To ensure that |
What about doing something simple like testing Alternatively, I suppose you could create a dummy submodule in the |
Interesting fact, apparently Python 3.3+ fixed the pickling of |
Add ability to pickle dynamically create modules
Thank you @rodrigofarnhamsc! I finally noticed the squashing of commits, sorry that took a little while to come back to. |
Thanks. This is awesome. Any chance we might see this in a release any time soon? |
We were giving @mrocklin some time for #45 (see #46 (comment)), though we can push it out sooner. |
I don't plan to have a solution to that in the near future. I think we should release soon. |
No need to rush. I built a development copy of |
@mrocklin sounds good! |
Somehow it didn't show @mrocklin's comment while I was writing that. Either way. I'll let you guys decide. |
This brings in fixes and upgrades from the [cloudpickle](https://github.com/cloudpipe/cloudpickle) module, notably: * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80) * Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90) * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88) * Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85) * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72) * Allow pickling of builtin methods (cloudpipe/cloudpickle#57) * Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52) * Support method descriptor (cloudpipe/cloudpickle#46) * No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
## What changes were proposed in this pull request? Based on apache#18282 by rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle. Some notable changes: * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80) * Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90) * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88) * Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85) * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72) * Allow pickling of builtin methods (cloudpipe/cloudpickle#57) * Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52) * Support method descriptor (cloudpipe/cloudpickle#46) * No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32) * ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes. * Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96) * BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102) ## How was this patch tested? Existing PySpark unit tests + the unit tests from the cloudpickle project on their own. Author: Holden Karau <holden@us.ibm.com> Author: Kyle Kelley <rgbkrk@gmail.com> Closes apache#18734 from holdenk/holden-rgbkrk-cloudpickle-upgrades.
The old logic treated all modules the same, which would fail when unpickling.
In
save_module
detect whether the module has been dynamically created by following the chain of imports. Noteworthy is thatimp.find_module
doesn't work with submodules (examplesckit.tree
), so we actually have to split the module name and iterate over each piece.Dynamic modules are saved as dictionaries and reconstituted by
dynamic_subimport
function. While working on the test cases I discoveredNotImplemented
andEllipsis
also don't work properly (they are introduced into the test dynamic module byexec
). I've also addressed that.