Skip to content

Conversation

@yikf
Copy link
Contributor

@yikf yikf commented Feb 25, 2021

What changes were proposed in this pull request?

Fixed an issue where data could not be cleaned up when unregisterShuffle.

Why are the changes needed?

While we use the old shuffle fetch protocol, we use partitionId as mapId in the ShuffleBlockId construction,but we use context.taskAttemptId() as mapId that it is cached in taskIdMapsForShuffle when we getWriter[K, V].

where data could not be cleaned up when unregisterShuffle ,because we remove a shuffle's metadata from the taskIdMapsForShuffle's mapIds, the mapId is context.taskAttemptId() instead of partitionId.

Does this PR introduce any user-facing change?

yes

How was this patch tested?

exist test.

def testConcat(inputs: String*): Unit = {
val expected = if (inputs.contains(null)) null else inputs.mkString
checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow)
checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewer : clean up unused code for code simplifications.
Method signature for checkEvaluation as follow:
checkEvaluation(expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test change related to the code change?

metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = {
val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent(
handle.shuffleId, _ => new OpenHashSet[Long](16))
mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more convenient review:

ShuffleMapTask#runTask
val mapId = if (SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)) { partitionId } else context.taskAttemptId()

SortShuffleManager#getWriter
val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent( handle.shuffleId, _ => new OpenHashSet[Long](16)) mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }

SortShuffleManager#unregisterShuffle
Option(taskIdMapsForShuffle.remove(shuffleId)).foreach { mapTaskIds => mapTaskIds.iterator.foreach { mapTaskId => shuffleBlockResolver.removeDataByMap(shuffleId, mapTaskId) } }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanyuanking do you have an opinion here? I think you wrote this piece of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pinging me, I'm reviewing #31664

@yikf
Copy link
Contributor Author

yikf commented Feb 25, 2021

gentle ping @srowen , thanks for taking a look.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is open vs branch-3.0 - needs to be master

metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = {
val mapTaskIds = taskIdMapsForShuffle.computeIfAbsent(
handle.shuffleId, _ => new OpenHashSet[Long](16))
mapTaskIds.synchronized { mapTaskIds.add(context.taskAttemptId()) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanyuanking do you have an opinion here? I think you wrote this piece of the code.

@mridulm
Copy link
Contributor

mridulm commented Feb 25, 2021

The existing tests are not catching this issue - can you add something to ensure we test for this problem ?

@mridulm
Copy link
Contributor

mridulm commented Feb 25, 2021

+CC @otterc

Copy link
Contributor

@otterc otterc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in SortShuffleManager looks good to me. However, we should add a UT to cover this. Also, I don't quite get how the change in StringExpressionsSuite.scala is related to this bug fix.

def testConcat(inputs: String*): Unit = {
val expected = if (inputs.contains(null)) null else inputs.mkString
checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow)
checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test change related to the code change?

@yikf
Copy link
Contributor Author

yikf commented Feb 26, 2021

The change in SortShuffleManager looks good to me. However, we should add a UT to cover this. Also, I don't quite get how the change in StringExpressionsSuite.scala is related to this bug fix.

The change in StringExpressionsSuite.scala is unrelated to this bug fix, for code simplifications only.

@yikf
Copy link
Contributor Author

yikf commented Feb 26, 2021

The existing tests are not catching this issue - can you add something to ensure we test for this problem ?

i will add the UT for this problem.

@HyukjinKwon HyukjinKwon changed the title [SPARK-34541][core]Fixed an issue where data could not be cleaned up when unregisterShuffle. [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle. Feb 26, 2021
@HyukjinKwon
Copy link
Member

@yikf, can you open a PR against master branch?

@yikf
Copy link
Contributor Author

yikf commented Feb 26, 2021

open a PR #31664 against master branch, close this PR for branch 3.0

@yikf yikf closed this Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants