-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle. #31648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def testConcat(inputs: String*): Unit = { | ||
| val expected = if (inputs.contains(null)) null else inputs.mkString | ||
| checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow) | ||
| checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewer : clean up unused code for code simplifications.
Method signature for checkEvaluation as follow:
checkEvaluation(expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test change related to the code change?
| metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = { | ||
| val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent( | ||
| handle.shuffleId, _ => new OpenHashSet[Long](16)) | ||
| mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more convenient review:
ShuffleMapTask#runTask
val mapId = if (SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)) { partitionId } else context.taskAttemptId()
SortShuffleManager#getWriter
val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent( handle.shuffleId, _ => new OpenHashSet[Long](16)) mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }
SortShuffleManager#unregisterShuffle
Option(taskIdMapsForShuffle.remove(shuffleId)).foreach { mapTaskIds => mapTaskIds.iterator.foreach { mapTaskId => shuffleBlockResolver.removeDataByMap(shuffleId, mapTaskId) } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuanyuanking do you have an opinion here? I think you wrote this piece of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pinging me, I'm reviewing #31664
|
gentle ping @srowen , thanks for taking a look. |
|
Can one of the admins verify this patch? |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is open vs branch-3.0 - needs to be master
| metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = { | ||
| val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent( | ||
| handle.shuffleId, _ => new OpenHashSet[Long](16)) | ||
| mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xuanyuanking do you have an opinion here? I think you wrote this piece of the code.
|
The existing tests are not catching this issue - can you add something to ensure we test for this problem ? |
|
+CC @otterc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in SortShuffleManager looks good to me. However, we should add a UT to cover this. Also, I don't quite get how the change in StringExpressionsSuite.scala is related to this bug fix.
| def testConcat(inputs: String*): Unit = { | ||
| val expected = if (inputs.contains(null)) null else inputs.mkString | ||
| checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow) | ||
| checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test change related to the code change?
The change in |
i will add the UT for this problem. |
|
@yikf, can you open a PR against master branch? |
|
open a PR #31664 against master branch, close this PR for branch 3.0 |
What changes were proposed in this pull request?
Fixed an issue where data could not be cleaned up when unregisterShuffle.
Why are the changes needed?
While we use the old shuffle fetch protocol, we use partitionId as mapId in the ShuffleBlockId construction,but we use
context.taskAttemptId()as mapId that it is cached intaskIdMapsForShufflewhen wegetWriter[K, V].where data could not be cleaned up when unregisterShuffle ,because we remove a shuffle's metadata from the
taskIdMapsForShuffle's mapIds, the mapId iscontext.taskAttemptId()instead of partitionId.Does this PR introduce any user-facing change?
yes
How was this patch tested?
exist test.