Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25079][python] update python3 executable to 3.6.x #24266

Closed

Conversation

shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Apr 1, 2019

What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers.

NOTE: this will need to be backported to all active branches.

@HyukjinKwon
Copy link
Member

Thanks for taking a look for this @shaneknapp!

@SparkQA
Copy link

SparkQA commented Apr 2, 2019

Test build #104173 has finished for PR 24266 at commit 1d14800.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

huh. looks like pandas got downgraded for python2.7... i'll check the versions installed on the jenkins workers tomorrow.

we've never supported pandas for pypy, so those skipped tests are expected.

however, all of the 3.6 tests passed! woo!

@HyukjinKwon
Copy link
Member

Yea, that sounds all good. If I haven't missed something, Pandas of Python 2.7 was already low version too. Nice to see Python 3.6 got all passed. We have picked one version of Python 3.x so far so I don't think it matters the test coverage by this change - should be fine.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shaneknapp
Copy link
Contributor Author

ok, fixed the centos workers' python2.7/pandas versioning:

$ pssh -h centos_workers.txt -t 0 -i "export PATH=/home/anaconda/bin:$PATH; python2.7 -c 'import pandas; print pandas.__version__'"
[1] 11:10:37 [SUCCESS] amp-jenkins-worker-03
0.19.2
[2] 11:10:37 [SUCCESS] amp-jenkins-worker-02
0.19.2
[3] 11:10:37 [SUCCESS] amp-jenkins-worker-05
0.19.2
[4] 11:10:37 [SUCCESS] amp-jenkins-worker-04
0.19.2
[5] 11:10:37 [SUCCESS] amp-jenkins-worker-06
0.19.2
[6] 11:10:37 [SUCCESS] amp-jenkins-worker-01
0.19.2

will retrigger a test now.

@shaneknapp
Copy link
Contributor Author

test this please

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Apr 2, 2019

Test build #104220 has finished for PR 24266 at commit 1d14800.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

argh... spurious non-python failure.

@shaneknapp
Copy link
Contributor Author

test this please

dev/run-tests.py Outdated
@@ -546,7 +546,8 @@ def main():
hadoop_version = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", "hadoop2.7")
test_env = "amplab_jenkins"
# add path for Python3 in Jenkins if we're calling from a Jenkins machine
os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH")
# TODO(sknapp): s/py36/py3k before merging!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK by me; does this need to change before you merge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... this is so i can test against the py36 conda environment on the jenkins workers. right before i merge, i'll pause jenkins, update the conda env name, push my revert and then let jenkins build again.

folks will need to update their PRs post-merge as well.

i'm waiting on @BryanCutler and the arrow 0.12.1 release before doing any of this, however. probably early to mid next week.

there will be a detailed email going out to dev@ about this on monday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also reminds me that one day, i would love to go through everything and remove all mention of amplab. that project has been over since 2016! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick update: py36 will actually remain the path for this python env on master...

@SparkQA
Copy link

SparkQA commented Apr 3, 2019

Test build #104223 has finished for PR 24266 at commit 1d14800.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 3, 2019

Test build #104222 has finished for PR 24266 at commit 1d14800.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @shaneknapp . Looks nice~

Finished test(python3.6): pyspark.sql.dataframe (50s)
Finished test(python3.6): pyspark.sql.functions (47s)
Tests passed in 1088 seconds

BTW, from the log, I'm wondering if we can change pip installation test together. It's using python 3.5. If it's beyond of the scope of this PR, it's okay for me.

...
Testing pip installation with python 3.5
Using /tmp/tmp.yP79vT0xxQ for virtualenv
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /tmp/tmp.yP79vT0xxQ/3.5:

@shaneknapp
Copy link
Contributor Author

Hi, @shaneknapp . Looks nice~

Finished test(python3.6): pyspark.sql.dataframe (50s)
Finished test(python3.6): pyspark.sql.functions (47s)
Tests passed in 1088 seconds

BTW, from the log, I'm wondering if we can change pip installation test together. It's using python 3.5. If it's beyond of the scope of this PR, it's okay for me.

yep, definitely beyond the scope of this PR.

@shaneknapp
Copy link
Contributor Author

Test build #104223 has finished for PR 24266 at commit 1d14800.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

i'm really curious why this is suddenly popping up (from the console output of this build):

UnknownTimeZoneError: 'US/Pacific-New'

i noticed this yesterday, and went through and made sure that pytz was installed and at the same version across all workers... which it is. this failed on amp-jenkins-worker-05, but the next build passed on amp-jenkins-worker-02.

i'll poke around a bit more and see if i can't get to the bottom of things today... if not today, then next week as i'm out thursday-sunday.

@shaneknapp
Copy link
Contributor Author

test this please

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 3, 2019

Oh, the test failures are on the master branch. I agree that it's pytz issue.

  File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line 178, in timezone
    raise UnknownTimeZoneError(zone)
UnknownTimeZoneError: 'US/Pacific-New'

Locally, it works in my Mac (Python 2.7.10).

>>> from pytz import timezone
>>> import pytz
>>> timezone('US/Pacific-New')
<DstTzInfo 'US/Pacific-New' PST-1 day, 16:00:00 STD>

@HyukjinKwon , @BryanCutler . Could you take a look on the master branch Python UT failures (5692 ~ 5696) to unblock this PR?

@dongjoon-hyun
Copy link
Member

@SparkQA
Copy link

SparkQA commented Apr 3, 2019

Test build #104256 has finished for PR 24266 at commit 1d14800.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 4, 2019

So, hm, are all the tests failing already due to a Python update? I don't necessarily want to roll that back but is it a matter of updating pytz too everywhere? does it need to be updated separately for Python 2 vs 3?

@squito
Copy link
Contributor

squito commented Apr 5, 2019

btw I filed a separate jira on the US/Pacific-New thing: https://issues.apache.org/jira/browse/SPARK-27389. I couldn't figure out what was going on either ... It does seem like US/Pacific-New is some weird TZ that should never have really existed and isn't supported everywhere, but I can't figure out what has changed that is causing it to suddenly fail this.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Apr 7, 2019

Test build #104349 has finished for PR 24266 at commit 1d14800.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 10, 2019

Test build #104490 has finished for PR 24266 at commit 1d14800.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 10, 2019

omfg lol:

  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pytz/__init__.py", line 178, in timezone
    raise UnknownTimeZoneError(zone)
pytz.exceptions.UnknownTimeZoneError: 'US/Pacific-New'

(˚Õ˚)ر ~~~~╚╩╩╝

@BryanCutler
Copy link
Member

did this worker revolt against the fix?! I guess lets try this again

@BryanCutler
Copy link
Member

retest this please

@shaneknapp
Copy link
Contributor Author

python3.6 revolted. this is a "new" failure... did something change in the tests?

@BryanCutler
Copy link
Member

did something change in the tests?

Only #24306 but it has nothing to do with timestamps or timezones..

@shaneknapp
Copy link
Contributor Author

did something change in the tests?

Only #24306 but it has nothing to do with timestamps or timezones..

the tests pass when i run them manually on the ubuntu workers... i can't wait to reimage the centos boxes.

anyways, i'll patch up the centos workers later today and this should go away.

@SparkQA
Copy link

SparkQA commented Apr 11, 2019

Test build #104521 has finished for PR 24266 at commit 1d14800.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 12, 2019

Test build #104521 has finished for PR 24266 at commit 1d14800.

so, this build passed on amp-jenkins-worker-02, and failed on -05 (which was one of the workers that initially had this issue).

i just finished manually running these tests on both of these workers, as well as my ubuntu testing worker, and both the ubuntu box and -02 passed, while -05 failed (as expected).

the python 3.6 envs on all three machines are identical, which i confirmed by checking conda list --explicit and pip list.

i'm now resorting to strace to see if i can spot any system-level differences. :(

@dongjoon-hyun
Copy link
Member

Thank you so much for spending much time on this, @shaneknapp !

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 12, 2019

well, i got the tests to pass on worker-05:

sknapp@amp-jenkins-worker-05 US]$ sudo mv /usr/share/zoneinfo/US/Pacific-New /tmp

however, that tzdata file is present and in the same location on the other workers that AREN'T failing tests.

¯\_(ツ)_/¯

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 12, 2019

well, i got the tests to pass on worker-05:

sknapp@amp-jenkins-worker-05 US]$ sudo mv /usr/share/zoneinfo/US/Pacific-New /tmp

however, that tzdata file is present and in the same location on the other workers that AREN'T failing tests.

¯\_(ツ)_/¯

so, to test whether or not (re)moving the Pacific-New timezone data file to /tmp actually works on all machines, i did the same thing on -03:

sknapp@amp-jenkins-worker-03 US]$ sudo mv /usr/share/zoneinfo/US/Pacific-New /tmp

i then ran the python tests, and they failed.

seriously, i think my brain is about to (im|ex)plode.

edit:

to get this to run successfully on -03, i had to:

  1. manually edit anaconda/envs/py36/lib/python3.6/site-packages/pytz/__init__.py and add US/Pacific-New to both all_timezones and common_timezones
  2. recompile the package (python3.6 -m compileall .)
  3. created a symlink for Pacific-New to point at Pacific in anaconda/envs/py36/lib/python3.6/site-packages/pytz/zoneinfo/US.

for now i remain, and most likely for a very long time, completely confused.

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 13, 2019

Test build #104561 has finished for PR 24266 at commit 1d14800.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 13, 2019 via email

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104671 has finished for PR 24266 at commit ba77b41.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104680 has finished for PR 24266 at commit ba77b41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

I think it's good to go

@shaneknapp
Copy link
Contributor Author

I think it's good to go

yep, i think so too! let's wait for the build to finish...

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104700 has finished for PR 24266 at commit ba77b41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

Thank you so much, @shaneknapp

@dongjoon-hyun
Copy link
Member

Thank you, @shaneknapp and @HyukjinKwon .

Since this is an official Python 3.6 community support, can we have this on branch-2.4/2.3 too?
I monitored #24379 and #24380 , but I'm not sure which one will land on branch-2.4/2.3.

cc @dbtsai and @mattf-apache

@shaneknapp
Copy link
Contributor Author

Thank you, @shaneknapp and @HyukjinKwon .

Since this is an official Python 3.6 community support, can we have this on branch-2.4/2.3 too?
I monitored #24379 and #24380 , but I'm not sure which one will land on branch-2.4/2.3.

cc @dbtsai and @mattf-apache

since @HyukjinKwon jumped the gun a little bit and merged this to master, i'm going to try and accelerate the schedule and deploy the new python envs for these two branches today. if not today, then monday.

@shaneknapp shaneknapp deleted the updating-python3-executable branch April 19, 2019 16:47
@dongjoon-hyun
Copy link
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants