Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743

abellina · 2021-06-18T15:53:33Z

Signed-off-by: Alessandro Bellina abellina@nvidia.com

Closes: #2698

This should make tests fail quicker instead of sometimes succeeding on a re-attempt. We have test failure that is triggered when the shape of the cluster changes, to gain a new executor or when an executor is getting removed between cpu and gpu sessions (#2477).

… cluster shape to change Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

jlowe · 2021-06-18T15:58:35Z

build

tgravescs · 2021-06-18T16:01:28Z

integration_tests/run_pyspark_from_build.sh

@@ -129,6 +129,9 @@ else
    export PYSP_TEST_spark_ui_showConsoleProgress='false'
    export PYSP_TEST_spark_sql_session_timeZone='UTC'
    export PYSP_TEST_spark_sql_shuffle_partitions='12'
+    # prevent cluster shape to change - and fail quicker rather than retry
+    export PYSP_TEST_spark_task_maxFailures='1'


these aren't getting applied when using the spark-submit way to run , correct? Seems like we should be consistent

Nice catch. Are jenkins/spark-tests.sh and then run_pyspark_from_build.sh the only two places that trigger the tests? If so I'd need to change spark-tests.sh.

I think those are the only 2 in this repo

tgravescs · 2021-06-18T16:03:16Z

so I was randomly looking at some integration builds and noticed they had task failures. I'm wonder what all of our builds will fail when we enable this. I guess we can merge and see, but I wouldn't merge on a friday.

abellina · 2021-06-18T16:17:56Z

Yeah if they have task failures, but the job succeeded (i.e. not XFAIL) then that should be a failed test as far as I can tell. But absolutely agree:

but I wouldn't merge on a friday.

abellina · 2021-06-18T16:18:57Z

Moved to draft to prevent accidental merges until Monday.

abellina · 2021-06-18T17:21:25Z

build

abellina · 2021-06-21T17:58:33Z

Will merge this in now and keep an eye out for new failures.

abellina added 2 commits June 18, 2021 10:47

Make spark.task.maxFailures=1 to prevent hidden successes causing the…

f3f1ae1

… cluster shape to change Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

Disable dynamic allocation by default

ef8f20b

jlowe added the test Only impacts tests label Jun 18, 2021

abellina changed the title ~~Make spark.task.maxFailures=1 to prevent hidden successes causing the…~~ Disable dynamicAllocation and set maxFailures to 1 in integration tests Jun 18, 2021

jlowe previously approved these changes Jun 18, 2021

View reviewed changes

tgravescs reviewed Jun 18, 2021

View reviewed changes

abellina marked this pull request as draft June 18, 2021 16:18

Change spark-tests.sh as well

c8ee7c9

abellina dismissed jlowe’s stale review via c8ee7c9 June 18, 2021 16:24

tgravescs approved these changes Jun 18, 2021

View reviewed changes

abellina marked this pull request as ready for review June 21, 2021 14:47

abellina requested review from GaryShen2008, NvTimLiu and revans2 as code owners June 21, 2021 14:47

revans2 approved these changes Jun 21, 2021

View reviewed changes

abellina merged commit 3a4b55c into NVIDIA:branch-21.08 Jun 21, 2021

abellina deleted the debug/set_max_task_failures_to_1 branch June 21, 2021 17:58

pxLi mentioned this pull request Jun 22, 2021

[BUG] new integration test failures w/ maxFailures=1 #2772

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743

Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743

abellina commented Jun 18, 2021 •

edited

Loading

jlowe commented Jun 18, 2021

tgravescs Jun 18, 2021

abellina Jun 18, 2021

tgravescs Jun 18, 2021

tgravescs commented Jun 18, 2021

abellina commented Jun 18, 2021

abellina commented Jun 18, 2021

abellina commented Jun 18, 2021

abellina commented Jun 21, 2021

Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743

Disable dynamicAllocation and set maxFailures to 1 in integration tests #2743

Conversation

abellina commented Jun 18, 2021 • edited Loading

jlowe commented Jun 18, 2021

tgravescs Jun 18, 2021

Choose a reason for hiding this comment

abellina Jun 18, 2021

Choose a reason for hiding this comment

tgravescs Jun 18, 2021

Choose a reason for hiding this comment

tgravescs commented Jun 18, 2021

abellina commented Jun 18, 2021

abellina commented Jun 18, 2021

abellina commented Jun 18, 2021

abellina commented Jun 21, 2021

abellina commented Jun 18, 2021 •

edited

Loading