-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloudpickle error after re-building pyspark-notebook with custom spark version #1200
Comments
Hello @sramirez, In fact you're right there is an incompatibility between Spark docker-stacks/base-notebook/Dockerfile Line 96 in 399cbb9
Here is how to do it. docker build --rm --force-rm \
-t jupyter/base-notebook:python-3.7 ./base-notebook \
--build-arg PYTHON_VERSION=3.7 However it will not be easy since you will have to rebuild all the lineage of images
And you may encounter dependency problems ... Last solution is to build your own image by writing a docker file to downgrade Python. Not tested. Sorry I am aware that none of the solutions proposed are satisfactory. |
Yes, I supposed that was the best solution. I'm gonna test the latest image before the switch. Thanks |
@sramirez Since this problem is not related to the image and related to upstream compatibility between Spark and Python, I'm closing this issue. |
Adding --build-arg PYTHON_VERSION=3.7 to the example in https://jupyter-docker-stacks.readthedocs.io/en/latest/using/specifics.html would help people avoid this issue. |
What docker image you are using?
jupyter/pyspark-notebook:latest
What complete docker command do you run to launch the container (omitting sensitive values)?
docker build --rm --force-rm
-t jupyter/pyspark-notebook:spark-2.4.7 ./pyspark-notebook \
--build-arg spark_version=2.4.7 \
--build-arg hadoop_version=2.7
--build-arg spark_checksum=0F5455672045F6110B030CE343C049855B7BA86C0ECB5E39A075FF9D093C7F648DA55DED12E72FFE65D84C32DCD5418A6D764F2D6295A3F894A4286CC80EF478
--build-arg openjdk_version=8
docker run -it --rm -p 8889:8888 jupyter/pyspark-notebook:spark-2.4.7
What steps do you take once the container is running to reproduce the issue?
What do you expect to happen?
SparkContext starts correctly.
What actually happens?
/usr/local/spark/python/pyspark/cloudpickle.py in _make_cell_set_template_code()
124 )
125 else:
--> 126 return types.CodeType(
127 co.co_argcount,
128 co.co_kwonlyargcount,
TypeError: an integer is required (got type bytes)
It seems Python 3.8 is not compatible with Spark < v.3.0: apache/spark#26194
Is it there any chance to configure python version in the dockerfile?
The text was updated successfully, but these errors were encountered: