-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36242][CORE] Ensure spill file closed before set success = true
in ExternalSorter.spillMemoryIteratorToDisk
method
#33460
Conversation
Test build #141402 has finished for PR 33460 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #141409 has finished for PR 33460 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExternalAppendOnlyMap
does the right thing.
Looks like we missed it in ExternalSorter
.
+CC @Ngone51
I would prefer if we had a test for this btw @LuciferYang.
How can I create a successful
Do you have any suggestions for this? @mridulm @HyukjinKwon @Ngone51 Or do we need more refactoring work to make this method easy to test? |
Mock |
Thx ~ |
ioe.getMessage.equals(errorMessage) | ||
// The `TempShuffleBlock` create by diskBlockManager | ||
// will remain before SPARK-36242 | ||
assert(!spillFilesCreated(0).exists()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mridulm @Ngone51 @HyukjinKwon before this pr, assert(!spillFilesCreated(0).exists())
will failed, I tested it manually.
ExternalSorterSpillSuite.this.spillFilesCreated.apply(0).exists() was true
ScalaTestFailureLocation: org.apache.spark.util.collection.ExternalSorterSpillSuite at (ExternalSorterSpillSuite.scala:137)
org.scalatest.exceptions.TestFailedException: ExternalSorterSpillSuite.this.spillFilesCreated.apply(0).exists() was true
at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at org.apache.spark.util.collection.ExternalSorterSpillSuite.$anonfun$new$1(ExternalSorterSpillSuite.scala:137)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check this case with Scala 2.13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check this case with Scala 2.13
Manual test passed
ExternalSorterSuite
relies on 'LocalSparkContext', so I added a new test file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work!
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141483 has finished for PR 33460 at commit
|
private var taskContext: TaskContext = _ | ||
|
||
override protected def beforeEach(): Unit = { | ||
tempDir = util.Utils.createTempDir(null, "test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import Utils
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another Utils
in org.apache.spark.util.collection
package, so util.Utils
is used here.
a57e76a change to import Utils
directly and rename it to UUtils
Do you have any other suggestions for the naming of UUtils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Looks fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok~
import org.apache.spark.storage.TempShuffleBlockId | ||
import java.util.UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to the imports group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141530 has finished for PR 33460 at commit
|
…ue` in `ExternalSorter.spillMemoryIteratorToDisk` method ### What changes were proposed in this pull request? The main change of this pr is move `writer.close()` before `success = true` to ensure spill file closed before set `success = true` in `ExternalSorter.spillMemoryIteratorToDisk` method. ### Why are the changes needed? Avoid setting `success = true` first and then failure of close spill file ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass the Jenkins or GitHub Action - Add a new Test case to check `The spill file should not exists if writer close fails` Closes #33460 from LuciferYang/external-sorter-spill-close. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yi.wu <yi.wu@databricks.com> (cherry picked from commit f61d599) Signed-off-by: yi.wu <yi.wu@databricks.com>
Thanks, merged to master/3.2. |
+1, LGTM. Thank you, @LuciferYang and all! |
BTW, this looks like a bug fix. Could you make a backport to branch-3.1/3.0 please? |
@dongjoon-hyun OK, I'll do it as soon as possible |
…ue` in `ExternalSorter.spillMemoryIteratorToDisk` method ### What changes were proposed in this pull request? The main change of this pr is move `writer.close()` before `success = true` to ensure spill file closed before set `success = true` in `ExternalSorter.spillMemoryIteratorToDisk` method. ### Why are the changes needed? Avoid setting `success = true` first and then failure of close spill file ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass the Jenkins or GitHub Action - Add a new Test case to check `The spill file should not exists if writer close fails` Closes apache#33460 from LuciferYang/external-sorter-spill-close. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yi.wu <yi.wu@databricks.com>
…ue` in `ExternalSorter.spillMemoryIteratorToDisk` method ### What changes were proposed in this pull request? The main change of this pr is move `writer.close()` before `success = true` to ensure spill file closed before set `success = true` in `ExternalSorter.spillMemoryIteratorToDisk` method. ### Why are the changes needed? Avoid setting `success = true` first and then failure of close spill file ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass the Jenkins or GitHub Action - Add a new Test case to check `The spill file should not exists if writer close fails` Closes apache#33460 from LuciferYang/external-sorter-spill-close. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yi.wu <yi.wu@databricks.com>
@dongjoon-hyun |
thx ~ @Ngone51 @mridulm @dongjoon-hyun |
What changes were proposed in this pull request?
The main change of this pr is move
writer.close()
beforesuccess = true
to ensure spill file closed before setsuccess = true
inExternalSorter.spillMemoryIteratorToDisk
method.Why are the changes needed?
Avoid setting
success = true
first and then failure of close spill fileDoes this PR introduce any user-facing change?
No
How was this patch tested?
The spill file should not exists if writer close fails