-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17465] [Spark Core] Inappropriate memory management in org.apache.spark.storage.MemoryStore may lead to memory leak
#15022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lMemoryForThisTask and releasePendingUnrollMemoryForThisTask method.
|
Please ignore the failure of AppVeyor for this. This is supposed to be ran only for R changes only in master branch but it seems it gose something wrong. This should fail for branch-1.6 as |
|
Cc @shivaram it seems something went wrong. I will try to figure out this as soon as I can use my laptop. |
|
That seems correct; I'm not sure if @andrewor14 is still available to comment? |
|
This change looks good to me as well, although I wonder whether there's an assertion or unit test that we can add to detect this bug or keep things from regressing in case this is ever refactored. I suppose that one invariant is that I wouldn't necessarily block merging this on the addition of new tests / asserts right now but it would be great to add them if they're low-cost and it's easy to do so. |
|
Jenkins, this is ok to test. |
|
It looks like we might want to pre-emptively make a similar change in master / branch-2.0, since it looks like there's still a potential leak here in case the unroll memory for the task is 0:
|
|
Test build #65154 has finished for PR 15022 at commit
|
|
Rather, it looks like the 2.x patch is going to just be a subset of this one, so I can merge this into both branches. |
|
Actually, I think that I spot a simple way to add a test, similar to the memory-leak-detection tests that we have for BlockInfoManager in Spark 2.x. Let me test out my idea and I'll post the diff here or will submit a PR to your PR. |
|
Actually, I spot one more step to make this really robust: I think we also need to call Can you add a call to
|
|
Thank @JoshRosen for your reply!
That's right. I will make the change later. |
…for more robustness.
|
Test build #65200 has finished for PR 15022 at commit
|
|
Jenkins retest this please |
|
Test build #65307 has finished for PR 15022 at commit
|
|
LGTM so I'm going to merge this into branch-1.6. @saturday-shi, do you mind closing this pull request now that I've merged it? GitHub won't auto-closed merged PRs which aren't opened against the master branch, so you'll have to do it. I'm going to port a subset of this change into the master branch and may add additional tests there. I don't think there's much risk of this regressing in branch-1.6 given that we're really conservative about making changes in that branch and it's unlikely any of the memory management code there will be modified again soon. |
…che.spark.storage.MemoryStore` may lead to memory leak ## What changes were proposed in this pull request? The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not. If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map. See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465). Author: Xing SHI <shi-kou@indetail.co.jp> Closes #15022 from saturday-shi/SPARK-17465.
| Utils.tryLogNonFatalError { | ||
| // Release memory used by this thread for unrolling blocks | ||
| SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask() | ||
| SparkEnv.get.blockManager.memoryStore.releasePendingUnrollMemoryForThisTask() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ended up removing the concept of "pending unroll memory" in Spark 2.x so this line will be omitted from the PR that I'm going to open against master.
…che.spark.storage.MemoryStore` may lead to memory leak The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not. If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map. See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465). Author: Xing SHI <shi-kou@indetail.co.jp> Closes #15022 from saturday-shi/SPARK-17465.
|
Actually I managed to cherry-pick the compatible subset of the changes into master, so the PR will be auto-closed by that. |
…che.spark.storage.MemoryStore` may lead to memory leak ## What changes were proposed in this pull request? The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not. If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map. See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465). Author: Xing SHI <shi-kou@indetail.co.jp> Closes apache#15022 from saturday-shi/SPARK-17465. (cherry picked from commit a447cd8)
…che.spark.storage.MemoryStore` may lead to memory leak The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not. If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map. See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465). Author: Xing SHI <shi-kou@indetail.co.jp> Closes apache#15022 from saturday-shi/SPARK-17465.
What changes were proposed in this pull request?
The expression like
if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)in methodreleaseUnrollMemoryForThisTaskandreleasePendingUnrollMemoryForThisTaskshould be called after release memory operation, whatevermemoryToReleaseis > 0 or not.If the memory of a task has been set to 0 when calling a
releaseUnrollMemoryForThisTaskor areleasePendingUnrollMemoryForThisTaskmethod, the key in the memory map corresponding to that task will never be removed from the hash map.See the details in SPARK-17465.