Skip to content

Conversation

@ogrisel
Copy link
Contributor

@ogrisel ogrisel commented Oct 31, 2017

This is a fix for #123.

@ogrisel ogrisel added this to the 0.5 milestone Oct 31, 2017
@ogrisel ogrisel requested a review from pitrou October 31, 2017 09:09
@codecov-io
Copy link

codecov-io commented Oct 31, 2017

Codecov Report

Merging #127 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
+ Coverage   83.99%   84.08%   +0.08%     
==========================================
  Files           2        2              
  Lines         531      534       +3     
  Branches       96       97       +1     
==========================================
+ Hits          446      449       +3     
  Misses         63       63              
  Partials       22       22
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 83.99% <100%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d65ed3...6bf7406. Read the comment docs.

tox.ini Outdated
@@ -1,5 +1,5 @@
[tox]
envlist = py26, py27, py33, py34, py35, pypy, pypy3
envlist = py26, py27, py33, py34, py35,i py36, pypy, pypy3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the i do here? Thought that would make Travis fail, but it has not. 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, will fix

@Peque
Copy link
Member

Peque commented Oct 31, 2017

Would it make sense to define DEFAULT_PROTOCOL = pickle.HIGHEST_PROTOCOL on top and use that? To allow the user to change the default value more easily.

Not that I am interested, but maybe someone is? I can think of someone that "accidentally" stores serialized objects for a long time, then one day tries to run everything again and it does not work. At least they can import cloudpickle; cloudpickle.DEFAULT_SERIALIZER = X and everything should work as expected right?

@ogrisel
Copy link
Contributor Author

ogrisel commented Oct 31, 2017

It sounds reasonable. @pitrou @rgbkrk @mrocklin any opinion?

@pitrou
Copy link
Member

pitrou commented Oct 31, 2017

and everything should work as expected right?

Certainly not by early-binding the value as in def dumps(..., protocol=DEFAULT_PROTOCOL).

@Peque
Copy link
Member

Peque commented Oct 31, 2017

@pitrou Maybe def dumps(..., protocol=None) and then:

Pickler.__init__(self, file, protocol=None):
    if protocol is None:
        protocol = DEFAULT_PROTOCOL

?

Maybe there is a better way.

Anyway if that is too much just drop it, it was just an idea (and could simply be added to an issue for the future). 😂

@ogrisel
Copy link
Contributor Author

ogrisel commented Oct 31, 2017

This is what I am trying to do but that does not work under Python 2 for some reason.

@pitrou
Copy link
Member

pitrou commented Oct 31, 2017

I don't think there is much point in making DEFAULT_PROTOCOL settable. People should be able to pass an additional argument to Pickler or to dumps.

@ogrisel
Copy link
Contributor Author

ogrisel commented Oct 31, 2017

Alright. I still changed the code to support protocol=None and use it by default for compatibility with the API of the pickle module in the standard library.

tox.ini Outdated
@@ -1,5 +1,5 @@
[tox]
envlist = py26, py27, py33, py34, py35, pypy, pypy3
envlist = py26, py27, py33, py34, py35, py36, pypy, pypy3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably remove py26 and py33, since we don't support those versions anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ogrisel
Copy link
Contributor Author

ogrisel commented Oct 31, 2017

If we merge this and if we still plan to do a 0.4.2 release we should be careful to branch 0.4.X from the 0.4.1 tag and only put bugfixes in the 0.4.X branch and not the content of this PR.

@ogrisel
Copy link
Contributor Author

ogrisel commented Nov 2, 2017

I think I would like to merge this quickly and release 0.5.0 as the old protocol is actually an issue for PySpark and large messages.

@rgbkrk is it fine if we release 0.5.0 without having found a proper fix for #126 first?

@rgbkrk
Copy link
Member

rgbkrk commented Nov 2, 2017

@rgbkrk is it fine if we release 0.5.0 without having found a proper fix for #126 first?

I think that's fine.

@rgbkrk
Copy link
Member

rgbkrk commented Nov 2, 2017

Actually, we should have @nikhaldi weigh in since he took the effort of reporting the issue and attempting a fix + tests (this last part is super important to me).

@nikhaldi
Copy link
Contributor

nikhaldi commented Nov 2, 2017

If you're planning to put out 0.5.0 soon, I'm in favor of including my fix #128 as is.

Here's why: I think documentation & design work is needed to address the general issue of how compatibility across versions should work going forward. I think this is not trivial and I don't know who would do this work and when. I don't know the details of how cloudpickle is organized as a project, but extrapolating from others I assume it could be months out? During that time it would be kinder to users to have at least some backwards compatibility. You'll very likely have to fully break compatibility again when more fundamental versioning changes are made.

@Peque
Copy link
Member

Peque commented Nov 2, 2017

I would say, if I may jump in, include it for 0.4.2 (a fix release) and do not include it for 0.5.0. Don't need to leave a branch hanging for that release, you may simply add the fix, release and revert the fix.

As long as the version is 0.x.y I think:

  • The "0" should be considered as unstable (i.e.: anything can happen between versions, so if you want to be sure just fix the version you are using).
  • The "x" would be the major version, so between major versions backwards compatibility is not enforced.
  • The "y" is the minor version, which should at least try to not break backwards compatibility (but users should always remember that the "0" means it is still unstable).

Anyway there should be a "formal" policy about the finally chosen rules or release strategy.

@rgbkrk
Copy link
Member

rgbkrk commented Nov 2, 2017

I think this is not trivial and I don't know who would do this work and when. I don't know the details of how cloudpickle is organized as a project, but extrapolating from others I assume it could be months out?

Basically, if dask needs something then @mrocklin or @pitrou jump in to fix something. If pyspark needs it then @HyukjinKwon, @holdenk, or I will jump in. If IPython / ipyparallel need fixes, that'll be me or @minrk. If @quantopian needs it, @ssanderson or @llllllllll chime in. There are others too. Basically, the people who want something implement it.

There is no organized practice other than that there are several distributed systems libraries that rely on cloudpickle underneath. The primary maintainers run systems where we control the deployments of workers and servers such that versions are consistent.

Speaking a little more for my largely undefined role -- I tend to act as a steward of the project, helping people collaborate and come to conclusions, even if I'm merely a moderator.

During that time it would be kinder to users to have at least some backwards compatibility. You'll very likely have to fully break compatibility again when more fundamental versioning changes are made.

I definitely agree that we should be kinder in terms of backwards compatibility. This is the first time it has occurred.

@rgbkrk
Copy link
Member

rgbkrk commented Nov 2, 2017

I'm going to lean towards making the quick patch release as outlined by @Peque, so even with pip's very lean/awkward semantics around semver, people can pin right now within their deployments.

Would that be alright @ogrisel?

@pitrou
Copy link
Member

pitrou commented Nov 2, 2017

If it's possible, I'd rather have someone take the time (probably not much) to implement the suggestion in #128 (comment), so at least the breakage makes us more future-proof.

@ogrisel
Copy link
Contributor Author

ogrisel commented Nov 8, 2017

@pitrou's suggestion was implemented and merged in master and then released as 0.4.2. I rebased this PR on top of master. If CI is still green and people agree I can merge and release 0.5.0.

@rgbkrk rgbkrk merged commit 33f1062 into cloudpipe:master Nov 8, 2017
@ogrisel ogrisel deleted the highest-protocol branch November 8, 2017 21:40
@ogrisel
Copy link
Contributor Author

ogrisel commented Nov 8, 2017

Thanks all! 0.5.0 is online on PyPI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants