Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudpickle error after re-building pyspark-notebook with custom spark version #1200

Closed
sramirez opened this issue Dec 15, 2020 · 4 comments
Closed
Labels
tag:Upstream A problem with one of the upstream packages installed in the docker images

Comments

@sramirez
Copy link

sramirez commented Dec 15, 2020

  1. If you are reporting an issue with one of the existing images, please answer the questions below to help us troubleshoot the problem. Please be as thorough as possible.

What docker image you are using?

jupyter/pyspark-notebook:latest

What complete docker command do you run to launch the container (omitting sensitive values)?

docker build --rm --force-rm
-t jupyter/pyspark-notebook:spark-2.4.7 ./pyspark-notebook \
--build-arg spark_version=2.4.7 \
--build-arg hadoop_version=2.7
--build-arg spark_checksum=0F5455672045F6110B030CE343C049855B7BA86C0ECB5E39A075FF9D093C7F648DA55DED12E72FFE65D84C32DCD5418A6D764F2D6295A3F894A4286CC80EF478
--build-arg openjdk_version=8

docker run -it --rm -p 8889:8888 jupyter/pyspark-notebook:spark-2.4.7

What steps do you take once the container is running to reproduce the issue?

  1. Visit http://localhost:8888
  2. Start the kernel and sparkContext.

What do you expect to happen?

SparkContext starts correctly.

What actually happens?

/usr/local/spark/python/pyspark/cloudpickle.py in _make_cell_set_template_code()
124 )
125 else:
--> 126 return types.CodeType(
127 co.co_argcount,
128 co.co_kwonlyargcount,

TypeError: an integer is required (got type bytes)

It seems Python 3.8 is not compatible with Spark < v.3.0: apache/spark#26194

Is it there any chance to configure python version in the dockerfile?

@romainx
Copy link
Collaborator

romainx commented Dec 15, 2020

Hello @sramirez,

In fact you're right there is an incompatibility between Spark 2.4.x and Python 3.8. However it's unclear -- at least for me -- if it will be supported in one of the 2.4.x next release.
It's already possible to rebuild the base-notebook with a different python version.

ARG PYTHON_VERSION=default

Here is how to do it.

docker build --rm --force-rm \
    -t jupyter/base-notebook:python-3.7 ./base-notebook \
    --build-arg PYTHON_VERSION=3.7

However it will not be easy since you will have to rebuild all the lineage of images

  • base-notebook
  • minimal-notebook
  • scipy-notebook
  • pyspark-notebook

And you may encounter dependency problems ...
Another option is to use the latest image before the switch to Spark 3.0, see it's manifest here. However you will not be able to configure the Spark version (it was not possible previously I'm pretty sure).

Last solution is to build your own image by writing a docker file to downgrade Python. Not tested.

Sorry I am aware that none of the solutions proposed are satisfactory.

@romainx romainx added the tag:Upstream A problem with one of the upstream packages installed in the docker images label Dec 15, 2020
@sramirez
Copy link
Author

Yes, I supposed that was the best solution. I'm gonna test the latest image before the switch.

Thanks

@romainx
Copy link
Collaborator

romainx commented Dec 27, 2020

@sramirez Since this problem is not related to the image and related to upstream compatibility between Spark and Python, I'm closing this issue.
Feel free to leave a comment if you want to give more information or need more support from our side.
Best.

@WilliamWhispell
Copy link

Adding --build-arg PYTHON_VERSION=3.7 to the example in https://jupyter-docker-stacks.readthedocs.io/en/latest/using/specifics.html would help people avoid this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag:Upstream A problem with one of the upstream packages installed in the docker images
Projects
None yet
Development

No branches or pull requests

3 participants