Skip to content

Conversation

@nikhaldi
Copy link
Contributor

Release 0.4.1 introduced a change to arguments of _fill_function
which meant that functions pickled before that couldn't be
unpickled anymore.

Addresses #126

Nik Haldimann added 2 commits October 31, 2017 14:47
Release 0.4.1 introduced a change to arguments of _fill_function
which meant that functions pickled before that couldn't be
unpickled anymore.

Addresses cloudpipe#126
I'm getting this error when unpickling with Python 3 (no stacktrace):

```
TypeError: an integer is required (got type str)
```
@codecov-io
Copy link

codecov-io commented Oct 31, 2017

Codecov Report

Merging #128 into master will increase coverage by 0.15%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #128      +/-   ##
==========================================
+ Coverage   83.94%   84.09%   +0.15%     
==========================================
  Files           2        2              
  Lines         517      522       +5     
  Branches       90       91       +1     
==========================================
+ Hits          434      439       +5     
  Misses         62       62              
  Partials       21       21
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 84% <100%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7ea597...2b8809c. Read the comment docs.

@nikhaldi
Copy link
Contributor Author

I'm not very happy I had to skip the tests for Python 3. My pickle/Python 3 knowledge is not deep, gladly taking suggestions for handling this better. Worst case, I could add separate set of tests for Python 3.

@ogrisel
Copy link
Contributor

ogrisel commented Oct 31, 2017

LGTM as it is. In the long run we should probably setup a folder of sample pickle files to check version compat for the latest 2 consecutive versions but I would rather not delay 0.4.2 for this.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 2, 2017

Actually, this does not address @pitrou's comment in #126 (comment): something pickled in 0.4.2 won't be loadable in 0.4.0.

But I don't see how it can be fixed.

@nikhaldi
Copy link
Contributor Author

nikhaldi commented Nov 2, 2017

Agreed, I don't think there's any way to restore forward compatibility now. Like I said elsewhere, ultimately I think you need to carry around a protocol version marker somewhere. Then you can at least give an explicit error if you get a pickle of a version that you can't support.

@nikhaldi
Copy link
Contributor Author

nikhaldi commented Nov 2, 2017

Actually, I take that back - there are ways to restore forward compatibility but they may be quite hacky. Basically, the module argument could be attached to one of the existing arguments to _fill_function in some way. I'll propose something on a different branch because I'm not sure it's really a good idea.

@pitrou
Copy link
Member

pitrou commented Nov 2, 2017

@nikhaldi or we bite the bullet and transform fill_function to take most of its arguments as a state dictionary, which makes it trivial to handle missing or unexpected data. I suspect more data may need to be added in the future... For example __qualname__ isn't currently replicated.

@nikhaldi
Copy link
Contributor Author

nikhaldi commented Nov 2, 2017

@pitrou yeah, obviously at some point you have to redesign fill_function.

Ok, here's a fix which actually makes pickles compatible with 0.4.0 again: https://github.com/nikhaldi/cloudpickle/compare/feat/function_pickle_compat...nikhaldi:feat/function_pickle_forward_compat?expand=1

It works but it's a hack. I could be convinced to include this or not, I don't have strong feelings either way.

@pitrou
Copy link
Member

pitrou commented Nov 2, 2017

@pitrou yeah, obviously at some point you have to redesign fill_function.

Then let's redesign it now. Since we just broke compatibility, we might as well break it soon again than wait for next year.

Ok, here's a fix which actually makes pickles compatible with 0.4.0 again

I'd rather avoid this :-)

updated_args = args[:-1] + (None, args[-1],)
return _fill_function_internal(*updated_args)
else:
return _fill_function_internal(*args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't even need the else here:

if len(args) == 5:
    # [...]
    return _fill_function_internal(*updated_args)
return _fill_function_internal(*args)

😊

@rgbkrk
Copy link
Member

rgbkrk commented Nov 2, 2017

I currently agree with merging this as is, with the caveat that we should remove the workaround at some point in the future.

@rgbkrk rgbkrk closed this Nov 2, 2017
@rgbkrk rgbkrk reopened this Nov 2, 2017
@rgbkrk
Copy link
Member

rgbkrk commented Nov 2, 2017

Whoops, accidentally closed and reopened, retriggering Travis.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 8, 2017

Ok merging this as it to release 0.4.2.

@ogrisel ogrisel merged commit 7d8c670 into cloudpipe:master Nov 8, 2017
@ogrisel
Copy link
Contributor

ogrisel commented Nov 8, 2017

Hum I had not read @pitrou's comment in #128 (comment). I will try to address it now.

@Peque
Copy link
Member

Peque commented Nov 8, 2017

@ogrisel Maybe an issue should be opened to revert this fix in future versions?

@HyukjinKwon
Copy link
Member

Thank you guys for fixing this ..

HyukjinKwon pushed a commit to HyukjinKwon/spark that referenced this pull request Mar 8, 2018
## What changes were proposed in this pull request?

The version of cloudpickle in PySpark was close to version 0.4.0 with some additional backported fixes and some minor additions for Spark related things.  This update removes Spark related changes and matches cloudpickle [v0.4.3](https://github.com/cloudpipe/cloudpickle/releases/tag/v0.4.3):

Changes by updating to 0.4.3 include:
* Fix pickling of named tuples cloudpipe/cloudpickle#113
* Built in type constructors for PyPy compatibility [here](cloudpipe/cloudpickle@d84980c)
* Fix memoryview support cloudpipe/cloudpickle#122
* Improved compatibility with other cloudpickle versions cloudpipe/cloudpickle#128
* Several cleanups cloudpipe/cloudpickle#121 and [here](cloudpipe/cloudpickle@c91aaf1)
* [MRG] Regression on pickling classes from the __main__ module cloudpipe/cloudpickle#149
* BUG: Handle instance methods of builtin types cloudpipe/cloudpickle#154
* Fix <span>#</span>129 : do not silence RuntimeError in dump() cloudpipe/cloudpickle#153

## How was this patch tested?

Existing pyspark.tests using python 2.7.14, 3.5.2, 3.6.3

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#20373 from BryanCutler/pyspark-update-cloudpickle-42-SPARK-23159.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants