Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pickling of builtin methods #57

Merged
merged 2 commits into from
Mar 9, 2016
Merged

Conversation

Futrell
Copy link
Contributor

@Futrell Futrell commented Mar 9, 2016

This resolves #56, which found that things like cloudpickle.dumps(itertools.chain.from_iterable) and cloudpickle.dumps(object.__new__) failed because these builtin methods do not have a __code__ attribute and pickle.whichmodule can't determine their module. The solution here is to intercept these things and send them to save_reduce.

I had to use slightly different solutions for Python 2.7 and >=3.4. I couldn't resolve it at all for 3.3: looks like doing so would require getting deep into the details of pickling. In 3.2 and 3.3, trying to unpickle the pickled method results in "NEWOBJ class argument has NULL tp_new error". So to avoid people pickling data and then being unable to unpickle it, the current version refuses to pickle these methods for those python versions.

@rgbkrk
Copy link
Member

rgbkrk commented Mar 9, 2016

to avoid people pickling data and then being unable to unpickle it, the current version refuses to pickle these methods for those python versions.

That seems like a sensible approach, fixing it for some. If people want this fixed for their versions, we'll be welcome to the next fix.

👍 thank you @Futrell

@rgbkrk
Copy link
Member

rgbkrk commented Mar 9, 2016

@mrocklin want to take a gander and if happy, make a patch release with this?

@mrocklin
Copy link
Contributor

mrocklin commented Mar 9, 2016

Seems fine to me.

rgbkrk added a commit that referenced this pull request Mar 9, 2016
Allow pickling of builtin methods
@rgbkrk rgbkrk merged commit 39044b7 into cloudpipe:master Mar 9, 2016
@rgbkrk
Copy link
Member

rgbkrk commented Mar 9, 2016

Cool, care to make the release this time too @mrocklin?

@Futrell - Thank you! I just added you to the org to help us maintain the project (feel free to decline). Either way, you're welcome here.

@mrocklin
Copy link
Contributor

mrocklin commented Mar 9, 2016

I'm happy to put it on my todo list. This particular change is fairly low priority for me though, so this might not happen by me in the immediate future.

rgbkrk added a commit to rgbkrk/spark that referenced this pull request Jun 12, 2017
This brings in fixes and upgrades from the [cloudpickle](https://github.com/cloudpipe/cloudpickle) module, notably:

* Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
* Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90)
* Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
* Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85)
* Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
* Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
* Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52)
* Support method descriptor (cloudpipe/cloudpickle#46)
* No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
ghost pushed a commit to dbtsai/spark that referenced this pull request Aug 22, 2017
## What changes were proposed in this pull request?

Based on apache#18282 by rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle.

Some notable changes:
* Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
* Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90)
* Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
* Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85)
* Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
* Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
* Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52)
* Support method descriptor (cloudpipe/cloudpickle#46)
* No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
* ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes.
* Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96)
* BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102)

## How was this patch tested?

Existing PySpark unit tests + the unit tests from the cloudpickle project on their own.

Author: Holden Karau <holden@us.ibm.com>
Author: Kyle Kelley <rgbkrk@gmail.com>

Closes apache#18734 from holdenk/holden-rgbkrk-cloudpickle-upgrades.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to pickle itertools.chain.from_iterable
3 participants