Description
OK, so Dask itself is now relatively robust to different versions of Python and compression.
However, as @jrbourbeau predicted, cloudpickle is not. This stops users from being able to do things like send along lambdas
future = client.submit(lambda x: x + 1, 10)
This turns out to be somewhat debilitating, For example, our basic example fails because top-level Pandas functions themselves are not reliably pickle-serializable (see pandas-dev/pandas#35611). I suspect that this happens in other cases in our examples as well.
So what to do?
Short term we could duplicate every software environment by Python version. Short term, if we're only supporting coiled/default
for quickstart purposes then this probably isn't horrible. We would set the default value for the configuration=
keyword to coiled/default
or coiled/default-37
depending on the Python version at import time, and then change the quickstart to coiled.Cluster()
and use that default.
We could stick with coiled install
and be explicit about requiring users to install things. I'm somewhat against this as a startup process. It's unpleasant.
Long term I'd like to see us increase hygiene in Dask and upstream about using pickle-serializable functions. This is a good driver for that. If we were very ambitious we could try to modify cloudpickle to be Python-version agnostic, but putting on my cloudpickle-maintainer hat I'll probably vote against that.
Other thoughts? @jrbourbeau ?