Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25079][python][branch-2.4] update python3 executable to 3.6.x #24379

Closed

Conversation

shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Apr 15, 2019

What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers revealed that 2.4 doesn't like python 3.6... :(

NOTE: this is just for branch-2.4

PLEASE DO NOT MERGE

@shaneknapp
Copy link
Contributor Author

@BryanCutler the test failures should be epic. ;)

@BryanCutler
Copy link
Member

If it's gonna fail, might as well be epic \m/

@shaneknapp
Copy link
Contributor Author

If it's gonna fail, might as well be epic \m/

hopefully it's not as epic as this:
https://www.usenix.org/conference/lisa16/conference-program/presentation/kuroda

@SparkQA
Copy link

SparkQA commented Apr 16, 2019

Test build #104601 has finished for PR 24379 at commit a35a2cf.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member

Well, it's not too bad.. at least it didn't explode. Was the Pandas version upgraded also?

@shaneknapp
Copy link
Contributor Author

Well, it's not too bad.. at least it didn't explode. Was the Pandas version upgraded also?

in this particular env, pandas is @ 0.24.2

@BryanCutler
Copy link
Member

Is it possible to keep pandas and pyarrow versions the same as before (0.19.2 and 0.8.0) for envs of branches 2.3/2.4 or do they need to share the same env as master?

The failures here have been fixed in master from various PRs, but not backported. It's possible to apply them, but it would take some time and could be a bit risky..

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 16, 2019

Is it possible to keep pandas and pyarrow versions the same as before (0.19.2 and 0.8.0) for envs of branches 2.3/2.4 or do they need to share the same env as master?

they need to share the same env as master (or we change all of the testing framework for all branches to create temporary python envs for each branch.... which isn't actually a horrible idea but a much bigger project).

regarding pandas 0.19.2, it seems that pandas 0.24.2 is the minimum according to conda forge?

(output below trimmed for readability)

$ conda install -c conda-forge pyarrow=0.12.1
<snip>
  added / updated specs:
    - pyarrow=0.12.1

<snip>

The following NEW packages will be INSTALLED:

  arrow-cpp          conda-forge/linux-64::arrow-cpp-0.12.1-py36h0e61e49_0
  mkl_fft            conda-forge/linux-64::mkl_fft-1.0.11-py36h14c3975_0
  mkl_random         conda-forge/linux-64::mkl_random-1.0.2-py36h637b7d7_2
  parquet-cpp        conda-forge/noarch::parquet-cpp-1.5.1-4
  pyarrow            conda-forge/linux-64::pyarrow-0.12.1-py36hbbcf98d_0

The following packages will be UPDATED:

  numpy                              1.11.3-py36h7e9f1db_12 --> 1.16.2-py36h7e9f1db_0
  numpy-base                         1.11.3-py36hde5b4d6_12 --> 1.16.2-py36hde5b4d6_0
  pandas                                 0.19.2-np111py36_1 --> 0.24.2-py36hf484d3e_0   <-----  NOOOOO!

i'll try and see if i can get pandas to 0.19.2, but it's looking to be kinda difficult. i hacked a conda spec file and manually set pandas to 0.19.2 and will run the tests against it and see what happens.

The failures here have been fixed in master from various PRs, but not backported. It's possible to apply them, but it would take some time and could be a bit risky..

i'll take time/risk vs not having any python test coverage for 2.3 and 2.4... but we'll need a commitment from dev@ to help make this stuff work.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 16, 2019

obligatory xkcd comic:

image

(i actually have this printed out and hanging in my exterior-facing office window)

@shaneknapp
Copy link
Contributor Author

ok... i got py36 + pandas 0.19.2 + pyarrow 0.12.1 to happily install in a test conda env, then i built a spark 2.4.2 dist and ran python/run-tests against the tarball.

pyspark.sql.tests + py36 + pandas 0.19.2 + pyarrow 0.12.1: 346 tests, errors=21
pyspark.sql.tests + py36 + pandas 0.24.2 + pyarrow 0.12.1: 346 tests, errors=24

the tests that passed w/pandas 0.19.2 (and failed w/0.24.2) are:

test_column_order (pyspark.sql.tests.GroupedMapPandasUDFTests)
test_complex_groupby (pyspark.sql.tests.GroupedMapPandasUDFTests)
test_udf_with_key (pyspark.sql.tests.GroupedMapPandasUDFTests)

the tests that failed for both versions of pandas:

test_createDataFrame_column_name_encoding (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_does_not_modify_input (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_respect_session_timezone (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_toggle (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_with_array_type (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_with_incorrect_schema (pyspark.sql.tests.ArrowTests)
test_createDataFrame_with_incorrect_schema (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_with_int_col_names (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_with_names (pyspark.sql.tests.EncryptionArrowTests)
test_createDataFrame_with_schema (pyspark.sql.tests.ArrowTests)
test_createDataFrame_with_schema (pyspark.sql.tests.EncryptionArrowTests)
test_null_conversion (pyspark.sql.tests.ArrowTests)
test_null_conversion (pyspark.sql.tests.EncryptionArrowTests)
test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
test_pandas_round_trip (pyspark.sql.tests.EncryptionArrowTests)
test_timestamp_dst (pyspark.sql.tests.EncryptionArrowTests)
test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
test_toPandas_arrow_toggle (pyspark.sql.tests.EncryptionArrowTests)
test_toPandas_respect_session_timezone (pyspark.sql.tests.ArrowTests)
test_toPandas_respect_session_timezone (pyspark.sql.tests.EncryptionArrowTests)
test_vectorized_udf_dates (pyspark.sql.tests.ScalarPandasUDFTests)

@BryanCutler
Copy link
Member

I think the 2.3/2.4 branches will have to stay with pyarrow 0.8.0 and pandas 0.19.2 for the tests to pass. There were a few fixes that were done in master to make later versions of pyarrow work. These could be applied to the branches, but would have to be done manually. The risk is that we could end up breaking support for pyarrow 0.8.0, which would be pretty bad.

@shaneknapp do you think you can try pyspark.sql.tests + py36 + pandas 0.19.2 + pyarrow 0.8.0 for the 2.4 branch? I think that should work. Sorry, I should have explained that better before..

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 16, 2019

I think the 2.3/2.4 branches will have to stay with pyarrow 0.8.0 and pandas 0.19.2 for the tests to pass. There were a few fixes that were done in master to make later versions of pyarrow work. These could be applied to the branches, but would have to be done manually. The risk is that we could end up breaking support for pyarrow 0.8.0, which would be pretty bad.

edit: i guess py36 does support pyarrow 0.8.0!

https://conda.anaconda.org/conda-forge/linux-64/pyarrow-0.8.0-py36_0.tar.bz2

@shaneknapp do you think you can try pyspark.sql.tests + py36 + pandas 0.19.2 + pyarrow 0.8.0 for the 2.4 branch? I think that should work. Sorry, I should have explained that better before..

sure. give me a couple of hours to make the env/run tests.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 16, 2019

I think the 2.3/2.4 branches will have to stay with pyarrow 0.8.0 and pandas 0.19.2 for the tests to pass. There were a few fixes that were done in master to make later versions of pyarrow work. These could be applied to the branches, but would have to be done manually. The risk is that we could end up breaking support for pyarrow 0.8.0, which would be pretty bad.

@shaneknapp do you think you can try pyspark.sql.tests + py36 + pandas 0.19.2 + pyarrow 0.8.0 for the 2.4 branch? I think that should work. Sorry, I should have explained that better before..

also, this means we'll need 3 python envs to test everything.

py27: master, 2.3, 2.4
py36 + pandas 0.19.2 + pyarrow 0.8.0: 2.3, 2.4
py36 + pandas 0.24.2 (or 0.19.2) + pyarrow 0.12.1: master

i'm not completely opposed, as this will most likely be easier than backporting and fixing the bugs in 2.3 and 2.4...

...but i would really like to stop at 3 python envs. it's already annoying enough as-is managing this stuff. :)

@BryanCutler
Copy link
Member

sure. give me a couple of hours to make the env/run tests.

Cool, thanks!

py36 + pandas 0.19.2 + pyarrow 0.8.0: 2.3, 2.4

This will be be best to avoid making patches and should work as is (fingers crossed)

py36 + pandas 0.24.2 (or 0.19.2) + pyarrow 0.12.1: master

I'm not sure how nice pandas 0.19.2 will play with others, so it might be a good idea to use 0.24.2 but maybe we should post to the dev list about this

@shaneknapp
Copy link
Contributor Author

I'm not sure how nice pandas 0.19.2 will play with others, so it might be a good idea to use 0.24.2 but maybe we should post to the dev list about this

tbh, i'm totally fine to go w/0.24.2 on master as all the tests are passing.

@shaneknapp
Copy link
Contributor Author

sure. give me a couple of hours to make the env/run tests.

Cool, thanks!

tests running now... once 2.4 is done, i'll re-build and test this against 2.3.

@shaneknapp
Copy link
Contributor Author

shaneknapp commented Apr 16, 2019

woot!

Tests passed in 1275 seconds

now on to 2.3

@shaneknapp
Copy link
Contributor Author

2.3 passed!

Tests passed in 1148 seconds

i'll get these new envs created on the workers and update both this PR and #24380 by EOW.

@BryanCutler
Copy link
Member

BryanCutler commented Apr 17, 2019 via email

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104673 has started for PR 24379 at commit 6a3b3c6.

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104676 has finished for PR 24379 at commit d17e8fc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104683 has finished for PR 24379 at commit d17e8fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104709 has finished for PR 24379 at commit d17e8fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-25079][python][branch-2.4] update python3 executable to 3.6.x [DO-NOT-MERGE][SPARK-25079][python][branch-2.4] update python3 executable to 3.6.x Apr 19, 2019
@HyukjinKwon
Copy link
Member

I added a prefix in the title cus I was about to merge .. :D

@shaneknapp
Copy link
Contributor Author

i was panicking as i was opening this PR, hoping that you didn't merge it (and the other one). ;)

@shaneknapp shaneknapp changed the title [DO-NOT-MERGE][SPARK-25079][python][branch-2.4] update python3 executable to 3.6.x [SPARK-25079][python][branch-2.4] update python3 executable to 3.6.x Apr 19, 2019
asfgit pushed a commit that referenced this pull request Apr 19, 2019
## What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

## How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers revealed that 2.4 doesn't like python 3.6...  :(

NOTE: this is just for branch-2.4

PLEASE DO NOT MERGE

Closes #24379 from shaneknapp/update-python-executable.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
@shaneknapp shaneknapp closed this Apr 19, 2019
@shaneknapp shaneknapp deleted the update-python-executable branch April 19, 2019 16:47
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10065/
Test FAILed.

@shaneknapp
Copy link
Contributor Author

ignore the failed build -- to be expected

kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

## How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers revealed that 2.4 doesn't like python 3.6...  :(

NOTE: this is just for branch-2.4

PLEASE DO NOT MERGE

Closes apache#24379 from shaneknapp/update-python-executable.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
## What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

## How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers revealed that 2.4 doesn't like python 3.6...  :(

NOTE: this is just for branch-2.4

PLEASE DO NOT MERGE

Closes apache#24379 from shaneknapp/update-python-executable.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

have jenkins test against python3.6 (instead of 3.4).

## How was this patch tested?

extensive testing on both the centos and ubuntu jenkins workers revealed that 2.4 doesn't like python 3.6...  :(

NOTE: this is just for branch-2.4

PLEASE DO NOT MERGE

Closes apache#24379 from shaneknapp/update-python-executable.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants