-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743
Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743
Conversation
… cluster shape to change Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
build |
@@ -129,6 +129,9 @@ else | |||
export PYSP_TEST_spark_ui_showConsoleProgress='false' | |||
export PYSP_TEST_spark_sql_session_timeZone='UTC' | |||
export PYSP_TEST_spark_sql_shuffle_partitions='12' | |||
# prevent cluster shape to change - and fail quicker rather than retry | |||
export PYSP_TEST_spark_task_maxFailures='1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these aren't getting applied when using the spark-submit way to run , correct? Seems like we should be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Are jenkins/spark-tests.sh
and then run_pyspark_from_build.sh
the only two places that trigger the tests? If so I'd need to change spark-tests.sh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those are the only 2 in this repo
so I was randomly looking at some integration builds and noticed they had task failures. I'm wonder what all of our builds will fail when we enable this. I guess we can merge and see, but I wouldn't merge on a friday. |
Yeah if they have task failures, but the job succeeded (i.e. not XFAIL) then that should be a failed test as far as I can tell. But absolutely agree:
|
Moved to draft to prevent accidental merges until Monday. |
build |
Will merge this in now and keep an eye out for new failures. |
Signed-off-by: Alessandro Bellina abellina@nvidia.com
Closes: #2698
This should make tests fail quicker instead of sometimes succeeding on a re-attempt. We have test failure that is triggered when the shape of the cluster changes, to gain a new executor or when an executor is getting removed between cpu and gpu sessions (#2477).