Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Dec 11, 2019

What changes were proposed in this pull request?

This PR switches python to python3 in make-distribution.sh.

Why are the changes needed?

SPARK-29672 changed this

Does this PR introduce any user-facing change?

No

How was this patch tested?

N/A

@dongjoon-hyun
Copy link
Member

Thank you for preparing preview2, @wangyum .

# Delete the egg info file if it exists, this can cache older setup files.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
python setup.py sdist
python3 setup.py sdist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that preview2 is the first PySpark distribution packaged by Python3?

@dongjoon-hyun
Copy link
Member

cc @HyukjinKwon

@dongjoon-hyun
Copy link
Member

Hi, @wangyum . This may cause a corrupted Python package.

+ python3 setup.py sdist
Could not import pypandoc - required to package PySpark
/var/folders/p5/tzyffbvj0md4djphs4fym9zw0000gn/T/easy_install-0i5l5_jn/pypandoc-1.4/setup.py:16: DeprecationWarning: Due to possible ambiguity, 'convert()' is deprecated. Use 'convert_file()'  or 'convert_text()'.
  long_description = pypandoc.convert('README.md', 'rst')
Maybe try:

    brew install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------



!!! pandoc not found, long_description is bad, don't upload this to PyPI !!!

Could you check your python3 on spark-rm has pypandoc? For me, it seems not.

spark-rm@3e4e645813bb:/opt/spark-rm/output$ pip3 freeze
click==6.7
colorama==0.3.7
Jinja2==2.10
livereload==2.5.1
Markdown==2.6.9
MarkupSafe==1.0
mkdocs==0.16.3
pygobject==3.26.1
python-apt==1.6.4
PyYAML==3.12
six==1.11.0
tornado==4.5.3

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 11, 2019

I didn't run the release script fully. So, please double-check your generated python package with this patch.

At 2.2.3 release, I made this kind of mistake by myself. As you see in the following, it works, but looks ugly if there is no package description. So, I want that you can avoid this.

If it works in your side, there is no problem. I'm just suspecting python3 might not the one we installed by the following.

virtualenv -p python3 /opt/p35 && \
  . /opt/p35/bin/activate && \
  pip install $BASE_PIP_PKGS && \
  pip install $PIP_PKGS && \

@dongjoon-hyun
Copy link
Member

cc @gatorsmile

@SparkQA
Copy link

SparkQA commented Dec 11, 2019

Test build #115140 has finished for PR 26844 at commit d395904.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please

@wangyum
Copy link
Member Author

wangyum commented Dec 11, 2019

@dongjoon-hyun It seems we also need to update our Docker image. May be it should be:

diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile
index cc7da152c7b..eabf3a3b6c8 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -62,14 +62,14 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
   curl -sL https://deb.nodesource.com/setup_11.x | bash && \
   $APT_INSTALL nodejs && \
   # Install needed python packages. Use pip for installing packages (for consistency).
-  $APT_INSTALL libpython2.7-dev libpython3-dev python-pip python3-pip && \
-  pip install $BASE_PIP_PKGS && \
-  pip install $PIP_PKGS && \
+  $APT_INSTALL libpython3-dev python3-pip && \
+  pip3 install $BASE_PIP_PKGS && \
+  pip3 install $PIP_PKGS && \
   cd && \
   virtualenv -p python3 /opt/p35 && \
   . /opt/p35/bin/activate && \
-  pip install $BASE_PIP_PKGS && \
-  pip install $PIP_PKGS && \
+  pip3 install $BASE_PIP_PKGS && \
+  pip3 install $PIP_PKGS && \
   # Install R packages and dependencies used when building.
   # R depends on pandoc*, libssl (which are installed above).
   $APT_INSTALL r-base r-base-dev && \

or

diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile
index cc7da152c7b..a5af73cc25c 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -61,8 +61,16 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg ca-certificates && \
     pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev && \
   curl -sL https://deb.nodesource.com/setup_11.x | bash && \
   $APT_INSTALL nodejs && \
+  # Change default python to python3.6
+  update-alternatives --install /usr/bin/python python /usr/bin/python2.7 1 && \
+  update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2 && \
+  update-alternatives --set python /usr/bin/python3.6 && \
+  # Change default pip to pip3
+  update-alternatives --install /usr/bin/pip pip /usr/bin/pip 1 && \
+  update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 2 && \
+  update-alternatives --set pip /usr/bin/pip3 && \
   # Install needed python packages. Use pip for installing packages (for consistency).
-  $APT_INSTALL libpython2.7-dev libpython3-dev python-pip python3-pip && \
+  $APT_INSTALL libpython3-dev python3-pip && \
   pip install $BASE_PIP_PKGS && \
   pip install $PIP_PKGS && \
   cd && \

I will verify it later.

@dongjoon-hyun
Copy link
Member

Thank you, @wangyum . +1 for that!

@dongjoon-hyun
Copy link
Member

Do you want to do that separately? Then, I'll merge this first.

@dongjoon-hyun
Copy link
Member

BTW, I have a question. This is used for last preview. Is something changed?

@dongjoon-hyun
Copy link
Member

cc @jiangxb1987 since he was the release manager of 3.0-preview.

@wangyum
Copy link
Member Author

wangyum commented Dec 11, 2019

@wangyum
Copy link
Member Author

wangyum commented Dec 11, 2019

Do you want to do that separately? Then, I'll merge this first.

Yes. I'd like to do that separately.

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes

This will build Spark distribution along with Python pip and R packages. For more information on usage, run `./dev/make-distribution.sh --help`
This will build Spark distribution along with Python pip and R packages. (Note that build with Python pip package requires Python 3.6). For more information on usage, run `./dev/make-distribution.sh --help`
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, shall we revert this line since there is Python 3.7 and 3.8?
Oops, Never mind.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30211][INFRA] Switch python to python3 in make-distribution.sh [SPARK-30211][INFRA] Use python3 in make-distribution.sh Dec 11, 2019
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.
I updated the PR description because this doesn't fix Docker issue.
Instead, this will help dev/make-distribution first.

@dongjoon-hyun
Copy link
Member

It's merged now. Please proceed to the docker issue and ping me if you need my help.

@SparkQA
Copy link

SparkQA commented Dec 11, 2019

Test build #115145 has finished for PR 26844 at commit d395904.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Dec 11, 2019

FYI, make-distribution.sh is also used in the following Jenkins job. I triggered to make it sure that our Jenkins is also ready for this change. If it fails, we need to file a JIRA issue for that separately. Let's see.

@SparkQA
Copy link

SparkQA commented Dec 11, 2019

Test build #115154 has finished for PR 26844 at commit a3a7695.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum deleted the SPARK-30211 branch December 11, 2019 10:09
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes

This will build Spark distribution along with Python pip and R packages. For more information on usage, run `./dev/make-distribution.sh --help`
This will build Spark distribution along with Python pip and R packages. (Note that build with Python pip package requires Python 3.6). For more information on usage, run `./dev/make-distribution.sh --help`
Copy link
Member

@HyukjinKwon HyukjinKwon Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can just say "Note that build with Python pip package requires Python 3." without parentheses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants