-
Notifications
You must be signed in to change notification settings - Fork 972
Closed
Labels
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the bug
We found a Spark Application hanged at the final stage

Rerun the Application got the same result.
Affects Version(s)
master
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
2023-08-02 19:54:59 CST DAGScheduler INFO - ShuffleMapStage 4 (sql at SparkSQLExecute.java:17) finished in 279.363 s
2023-08-02 19:54:59 CST YarnClusterScheduler INFO - Removed TaskSet 4.0, whose tasks have all completed, from pool default
2023-08-02 19:54:59 CST DAGScheduler INFO - looking for newly runnable stages
2023-08-02 19:54:59 CST DAGScheduler INFO - running: Set()
2023-08-02 19:54:59 CST DAGScheduler INFO - waiting: Set()
2023-08-02 19:54:59 CST DAGScheduler INFO - failed: Set()
2023-08-02 19:54:59 CST YarnAllocator INFO - Resource profile 0 doesn't exist, adding it
2023-08-02 19:54:59 CST YarnAllocator INFO - Driver requested a total number of 1 executor(s) for resource profile id: 0.
2023-08-02 19:54:59 CST YarnClusterSchedulerBackend INFO - Requesting to kill executor(s) 99, 90, 84, 57, 63, 39, 30, 45, 66, 2, 72, 5, 48, 33, 69, 27, 54, 60, 15, 42, 21, 71, 92, 86, 24, 74, 89, 95, 53, 41, 83, 56, 17, 1, 44, 50, 23, 38, 4, 26, 11, 32, 82, 97, 29, 20, 85, 79, 70, 64, 91, 46, 94, 73, 67, 88, 34, 28, 6, 40, 55, 76, 49, 61, 43, 9, 22, 58, 3, 10, 25, 93, 81, 75, 13
2023-08-02 19:54:59 CST YarnClusterSchedulerBackend INFO - Actual list of executor(s) to be killed is 99, 90, 84, 57, 63, 39, 30, 45, 66, 2, 72, 5, 48, 33, 69, 27, 54, 60, 15, 42, 21, 71, 92, 86, 24, 74, 89, 95, 53, 41, 83, 56, 17, 1, 44, 50, 23, 38, 4, 26, 11, 32, 82, 97, 29, 20, 85, 79, 70, 64, 91, 46, 94, 73, 67, 88, 34, 28, 6, 40, 55, 76, 49, 61, 43, 9, 22, 58, 3, 10, 25, 93, 81, 75, 13
2023-08-02 19:54:59 CST ApplicationMaster$AMEndpoint INFO - Driver requested to kill executor(s) 99, 90, 84, 57, 63, 39, 30, 45, 66, 2, 72, 5, 48, 33, 69, 27, 54, 60, 15, 42, 21, 71, 92, 86, 24, 74, 89, 95, 53, 41, 83, 56, 17, 1, 44, 50, 23, 38, 4, 26, 11, 32, 82, 97, 29, 20, 85, 79, 70, 64, 91, 46, 94, 73, 67, 88, 34, 28, 6, 40, 55, 76, 49, 61, 43, 9, 22, 58, 3, 10, 25, 93, 81, 75, 13.
2023-08-02 19:54:59 CST YarnAllocator INFO - Resource profile 0 doesn't exist, adding it
2023-08-02 19:54:59 CST ExecutorAllocationManager INFO - Executors 99,90,84,57,63,39,30,45,66,2,72,5,48,33,69,27,54,60,15,42,21,71,92,86,24,74,89,95,53,41,83,56,17,1,44,50,23,38,4,26,11,32,82,97,29,20,85,79,70,64,91,46,94,73,67,88,34,28,6,40,55,76,49,61,43,9,22,58,3,10,25,93,81,75,13 removed due to idle timeout.
2023-08-02 19:55:00 CST YarnClusterSchedulerBackend INFO - Requesting to kill executor(s) 65
2023-08-02 19:55:00 CST YarnClusterSchedulerBackend INFO - Actual list of executor(s) to be killed is 65
2023-08-02 19:55:00 CST ApplicationMaster$AMEndpoint INFO - Driver requested to kill executor(s) 65.
2023-08-02 19:55:00 CST YarnAllocator INFO - Resource profile 0 doesn't exist, adding it
2023-08-02 19:55:00 CST ExecutorAllocationManager INFO - Executors 65 removed due to idle timeout.
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - Store config: spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes to previousStage, original value: 128M
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - For final stage: set spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes = 256M.
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - Store config: spark.sql.adaptive.skewJoin.skewedPartitionFactor to previousStage, original value: 4
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - For final stage: set spark.sql.adaptive.skewJoin.skewedPartitionFactor = 5.
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - Store config: spark.sql.adaptive.advisoryPartitionSizeInBytes to previousStage, original value: 8MB
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - For final stage: set spark.sql.adaptive.advisoryPartitionSizeInBytes = 384MB.
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - Store config: spark.sql.adaptive.coalescePartitions.minPartitionNum to previousStage, original value: __INTERNAL_UNSET_CONFIG_TAG__
2023-08-02 19:55:00 CST FinalStageConfigIsolation INFO - For final stage: set spark.sql.adaptive.coalescePartitions.minPartitionNum = 1.
2023-08-02 19:55:00 CST ShufflePartitionsUtil INFO - For shuffle(2), advisory target size: 402653184, actual target size 21915537.
2023-08-02 19:55:00 CST FinalStageResourceManager INFO - The snapshot of current executors view, active executors: 100, min executor: 1, target executors: 1, has benefits: true
2023-08-02 19:55:01 CST YarnClusterSchedulerBackend INFO - Requesting to kill executor(s) 51, 19
2023-08-02 19:55:01 CST YarnClusterSchedulerBackend INFO - Actual list of executor(s) to be killed is 51, 19
2023-08-02 19:55:01 CST ApplicationMaster$AMEndpoint INFO - Driver requested to kill executor(s) 51, 19.
2023-08-02 19:55:01 CST YarnAllocator INFO - Resource profile 0 doesn't exist, adding it
2023-08-02 19:55:01 CST ExecutorAllocationManager INFO - Executors 51,19 removed due to idle timeout.
2023-08-02 19:55:02 CST FinalStageResourceManager INFO - Request to kill executors, total count 99, [88, 42, 77, 79, 2, 75, 81, 6, 15, 90, 28, 43, 63, 64, 14, 93, 70, 21, 56, 34, 10, 33, 11, 65, 61, 57, 35, 18, 3, 7, 20, 17, 32, 30, 68, 29, 86, 24, 47, 52, 38, 54, 41, 8, 9, 60, 40, 74, 4, 82, 100, 72, 45, 69, 36, 12, 46, 58, 95, 80, 44, 87, 55, 53, 5, 23, 26, 22, 97, 85, 96, 66, 59, 16, 84, 37, 48, 50, 51, 67, 39, 78, 62, 49, 71, 25, 13, 83, 89, 73, 31, 91, 19, 1, 99, 92, 94, 98, 27].
2023-08-02 19:55:02 CST YarnClusterSchedulerBackend INFO - Requesting to kill executor(s) 88, 42, 77, 79, 2, 75, 81, 6, 15, 90, 28, 43, 63, 64, 14, 93, 70, 21, 56, 34, 10, 33, 11, 65, 61, 57, 35, 18, 3, 7, 20, 17, 32, 30, 68, 29, 86, 24, 47, 52, 38, 54, 41, 8, 9, 60, 40, 74, 4, 82, 100, 72, 45, 69, 36, 12, 46, 58, 95, 80, 44, 87, 55, 53, 5, 23, 26, 22, 97, 85, 96, 66, 59, 16, 84, 37, 48, 50, 51, 67, 39, 78, 62, 49, 71, 25, 13, 83, 89, 73, 31, 91, 19, 1, 99, 92, 94, 98, 27
2023-08-02 19:55:02 CST YarnClusterSchedulerBackend INFO - Actual list of executor(s) to be killed is 77, 14, 35, 18, 7, 68, 47, 52, 8, 100, 36, 12, 80, 87, 96, 59, 16, 37, 78, 62, 31, 98
2023-08-02 19:55:02 CST YarnAllocator INFO - Resource profile 0 doesn't exist, adding it
2023-08-02 19:55:02 CST YarnAllocator INFO - Driver requested a total number of 0 executor(s) for resource profile id: 0.
2023-08-02 19:55:02 CST ApplicationMaster$AMEndpoint INFO - Driver requested to kill executor(s) 77, 14, 35, 18, 7, 68, 47, 52, 8, 100, 36, 12, 80, 87, 96, 59, 16, 37, 78, 62, 31, 98.Kyuubi Server Configurations
No response
Kyuubi Engine Configurations
No response
Additional context
Spark DRA was enabled and spark.dynamicAllocation.minExecutors was set to 1
Are you willing to submit PR?
- Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
- No. I cannot submit a PR at this time.