-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23292][TEST] always run python tests #20465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| self.assertEquals(types[5], 'datetime64[ns]') | ||
|
|
||
| @unittest.skipIf(not _have_old_pandas, "Old Pandas not installed") | ||
| def test_to_pandas_old(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how to test it in jenkins, as jenkins should always have pandas and pyarrow installed with required versions. I think we can only test it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jenkins does indeed have pandas and pyarrow installed, btw.
| self.assertTrue(isinstance(df.schema['d'].dataType, DateType)) | ||
|
|
||
| @unittest.skipIf(not _have_old_pandas, "Old Pandas not installed") | ||
| def test_create_dataframe_from_old_pandas(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Test build #86910 has finished for PR 20465 at commit
|
|
hmm, I think there are some values in having a way to run python tests without Arrow? I mean the test.py is not just for Jenkins but for everyone consuming the Spark release... unless we are saying Arrow is required now? And in Jenkins we shouldn't be skipping any of these tests anyway? Is there a reason we need to change that if Jenkins isn't affected (and if I recall there is a way to check if running under Jenkins - we could always make Arrow tests not skipped in Jenkins) |
|
@felixcheung jenkins is actually skipping those tests (see the failure of this pr). It makes sense to provide a way to allow developers to not run those tests. But, I'd prefer that we run those tests by default. So, we can make sure that jenkins is doing the right thing. |
I agree, but the more important thing is to make sure jenkins runs everything, so that we can be confident about our release. |
|
sure, that's ok, I think we can revisit later (ie. next release) if we want to add an env switch or something to make them optional |
I guess the RISELab boxes will need some updates... |
|
Looking back at when pyarrow was last upgraded in #19884, pandas and pyarrow were upgraded on all workers for python 3, but there were maybe some concerns or difficulties with upgrading for python 2 and pypy environments at that time. That is why the above failure is from python 2 with an older version of pandas. |
|
Yup, there was a related discussion already. See this #19884 (comment) and #19884 (comment). We shouldn't run this for now. Also these are technically not hard dependencies. |
|
So, jenkins jobs run those tests with python3? If so, I feel better because those tests are not completely skipped in Jenkins. If it is hard to make them run with python 2. Let’s have a log to explicitly show if we are going to run tests using pandas/pyarrow, which will help us confirm if they get exercised with python 3 in Jenkins or not.
|
|
Yes, the tests are being run with python3. I do prefer to have these conditional skips removed because sometimes it is hard to tell if everything passed or was just skipped. But since pandas and pyarrow are optional dependencies, there should be some way for the user to skip with an environment variable or something. At the very least, being able to verify they were run in a log would be good. |
|
Yup, explicitly logging sounds fine for now so that we can easily check.
To be clear, I think it's more because our own testing script doesn't show the skipped tests output from unittests in the console. Also, I think it's more because we couldn't make sure Pandas and Arrow were installed properly in testing env, Jenkins but not because we skip tests related with extra dependencies when they are not installed. Making them as required dependencies is a big deal IMHO. FYI, I tried to install PyArrow with PyPy last time and I failed. I wonder if we can easily install it. |
|
Also, if we should go in this way, I think we should enable some tests with PyPy too if I understood correctly and there isn't another problem I maybe missed: spark/dev/sparktestsupport/modules.py Line 457 in 9623a98
At least, PyPy in my local has |
|
I agree that pandas and pyarrow should not be a hard requirement for users, and this is what it is today: PySpark only throws exception when users try to use pandas related functions without pandas/pyarrow installed. My proposal is, pandas and pyarrow should be a hard requirement for our jenkins, to make sure the features are well tested. If there is a way to prove that py3 tests run well, and the environment issue is hard to fix, then maybe we can deal with it later, after 2.3. |
|
Thank you for bearing with me @cloud-fan. I agree with it. BTW, are you working on the logging thing BTW? I was thinking the simplest way to check is just print out once if PyArrow / Pandas are installed or not. |
If this is a goal, I think another simple way is just to use an env set in Jenkins and throw an exception if both PyArrow or Pandas are not installed in the future. To be more specific, like .. Lines 28 to 32 in 9623a98
|
|
I've not worked on the logging stuff yet, feel free to take it, thanks! |
|
@cloud-fan, will try it. Thank you sincerely. |
|
The logging approach has been merged and I'm closing this one. |
What changes were proposed in this pull request?
We should not skip any tests, otherwise we can't trust the jenkins report.
This PR make the pandas related tests always be run, so that we can be confident about pandas related features in Spark.
How was this patch tested?
N/A