Commit f24bb43
committed
[SPARK-40459][K8S]
### What changes were proposed in this pull request?
This PR aims to ignore `FileExistsException` during `recoverDiskStore` processing.
### Why are the changes needed?
Although `recoverDiskStore` is already wrapped by `tryLogNonFatalError`, a single file recovery exception should not block the whole `recoverDiskStore` .
https://github.com/apache/spark/blob/5938e84e72b81663ccacf0b36c2f8271455de292/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleExecutorComponents.scala#L45-L47
```
org.apache.commons.io.FileExistsException: ...
at org.apache.commons.io.FileUtils.requireAbsent(FileUtils.java:2587)
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2305)
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2283)
at org.apache.spark.storage.DiskStore.moveFileToBlock(DiskStore.scala:150)
at org.apache.spark.storage.BlockManager$TempFileBasedBlockStoreUpdater.saveToDiskStore(BlockManager.scala:487)
at org.apache.spark.storage.BlockManager$BlockStoreUpdater.$anonfun$save$1(BlockManager.scala:407)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1445)
at org.apache.spark.storage.BlockManager$BlockStoreUpdater.save(BlockManager.scala:380)
at org.apache.spark.storage.BlockManager$TempFileBasedBlockStoreUpdater.save(BlockManager.scala:490)
at org.apache.spark.shuffle.KubernetesLocalDiskShuffleExecutorComponents$.$anonfun$recoverDiskStore$14(KubernetesLocalDiskShuffleExecutorComponents.scala:95)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at org.apache.spark.shuffle.KubernetesLocalDiskShuffleExecutorComponents$.recoverDiskStore(KubernetesLocalDiskShuffleExecutorComponents.scala:91)
```
### Does this PR introduce _any_ user-facing change?
No, this will improve the recover rate.
### How was this patch tested?
Pass the CIs.
Closes #37903 from dongjoon-hyun/SPARK-40459.
Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>recoverDiskStore should not stop by existing recomputed files1 parent 5938e84 commit f24bb43
File tree
1 file changed
+4
-0
lines changed- resource-managers/kubernetes/core/src/main/scala/org/apache/spark/shuffle
1 file changed
+4
-0
lines changedLines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
95 | 97 | | |
96 | 98 | | |
97 | 99 | | |
| 100 | + | |
| 101 | + | |
98 | 102 | | |
99 | 103 | | |
100 | 104 | | |
| |||
0 commit comments