Skip to content

Conversation

@saturday-shi
Copy link

What changes were proposed in this pull request?

The expression like if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId) in method releaseUnrollMemoryForThisTask and releasePendingUnrollMemoryForThisTask should be called after release memory operation, whatever memoryToRelease is > 0 or not.

If the memory of a task has been set to 0 when calling a releaseUnrollMemoryForThisTask or a releasePendingUnrollMemoryForThisTask method, the key in the memory map corresponding to that task will never be removed from the hash map.

See the details in SPARK-17465.

…lMemoryForThisTask and releasePendingUnrollMemoryForThisTask method.
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Sep 9, 2016

Please ignore the failure of AppVeyor for this. This is supposed to be ran only for R changes only in master branch but it seems it gose something wrong. This should fail for branch-1.6 as appveyor.yml and the related script are currently only merged into master.

@HyukjinKwon
Copy link
Member

Cc @shivaram it seems something went wrong. I will try to figure out this as soon as I can use my laptop.

@srowen
Copy link
Member

srowen commented Sep 9, 2016

That seems correct; I'm not sure if @andrewor14 is still available to comment?

@JoshRosen
Copy link
Contributor

This change looks good to me as well, although I wonder whether there's an assertion or unit test that we can add to detect this bug or keep things from regressing in case this is ever refactored.

I suppose that one invariant is that unrollMemoryMap.keys.size and pendingUnrollMemoryMap.keys.size should both be upper-bounded by the number of executor cores (since that's the maximum number of concurrent tasks).

I wouldn't necessarily block merging this on the addition of new tests / asserts right now but it would be great to add them if they're low-cost and it's easy to do so.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

It looks like we might want to pre-emptively make a similar change in master / branch-2.0, since it looks like there's still a potential leak here in case the unroll memory for the task is 0:

if (unrollMemoryMap(taskAttemptId) == 0) {

@SparkQA
Copy link

SparkQA commented Sep 9, 2016

Test build #65154 has finished for PR 15022 at commit 720f2ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Rather, it looks like the 2.x patch is going to just be a subset of this one, so I can merge this into both branches.

@JoshRosen
Copy link
Contributor

Actually, I think that I spot a simple way to add a test, similar to the memory-leak-detection tests that we have for BlockInfoManager in Spark 2.x. Let me test out my idea and I'll post the diff here or will submit a PR to your PR.

@JoshRosen
Copy link
Contributor

JoshRosen commented Sep 9, 2016

Actually, I spot one more step to make this really robust: I think we also need to call releasePendingUnrollMemoryForThisTask at the end of Task in order to be absolutely sure this memory will be released during error cases.

Can you add a call to releasePendingUnrollMemoryForThisTask to

SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask()
?

@saturday-shi
Copy link
Author

Thank @JoshRosen for your reply!

Actually, I spot one more step to make this really robust: I think we also need to call releasePendingUnrollMemoryForThisTask at the end of Task in order to be absolutely sure this memory will be released during error cases.

That's right. I will make the change later.
And it is really helpful if you have an good idea of adding a test to avoid regressing. I just worrying about that I don't have an easy way to check if the problem relapsed or not.

@SparkQA
Copy link

SparkQA commented Sep 10, 2016

Test build #65200 has finished for PR 15022 at commit 6b11fe8.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Sep 13, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Sep 13, 2016

Test build #65307 has finished for PR 15022 at commit 6b11fe8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

LGTM so I'm going to merge this into branch-1.6.

@saturday-shi, do you mind closing this pull request now that I've merged it? GitHub won't auto-closed merged PRs which aren't opened against the master branch, so you'll have to do it.

I'm going to port a subset of this change into the master branch and may add additional tests there. I don't think there's much risk of this regressing in branch-1.6 given that we're really conservative about making changes in that branch and it's unlikely any of the memory management code there will be modified again soon.

asfgit pushed a commit that referenced this pull request Sep 14, 2016
…che.spark.storage.MemoryStore` may lead to memory leak

## What changes were proposed in this pull request?

The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not.

If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map.

See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465).

Author: Xing SHI <shi-kou@indetail.co.jp>

Closes #15022 from saturday-shi/SPARK-17465.
Utils.tryLogNonFatalError {
// Release memory used by this thread for unrolling blocks
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask()
SparkEnv.get.blockManager.memoryStore.releasePendingUnrollMemoryForThisTask()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ended up removing the concept of "pending unroll memory" in Spark 2.x so this line will be omitted from the PR that I'm going to open against master.

asfgit pushed a commit that referenced this pull request Sep 14, 2016
…che.spark.storage.MemoryStore` may lead to memory leak

The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not.

If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map.

See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465).

Author: Xing SHI <shi-kou@indetail.co.jp>

Closes #15022 from saturday-shi/SPARK-17465.
@JoshRosen
Copy link
Contributor

Actually I managed to cherry-pick the compatible subset of the changes into master, so the PR will be auto-closed by that.

@asfgit asfgit closed this in bb32294 Sep 14, 2016
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Sep 19, 2016
…che.spark.storage.MemoryStore` may lead to memory leak

## What changes were proposed in this pull request?

The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not.

If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map.

See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465).

Author: Xing SHI <shi-kou@indetail.co.jp>

Closes apache#15022 from saturday-shi/SPARK-17465.

(cherry picked from commit a447cd8)
wgtmac pushed a commit to wgtmac/spark that referenced this pull request Sep 19, 2016
…che.spark.storage.MemoryStore` may lead to memory leak

The expression like `if (memoryMap(taskAttemptId) == 0) memoryMap.remove(taskAttemptId)` in method `releaseUnrollMemoryForThisTask` and `releasePendingUnrollMemoryForThisTask` should be called after release memory operation, whatever `memoryToRelease` is > 0 or not.

If the memory of a task has been set to 0 when calling a `releaseUnrollMemoryForThisTask` or a `releasePendingUnrollMemoryForThisTask` method, the key in the memory map corresponding to that task will never be removed from the hash map.

See the details in [SPARK-17465](https://issues.apache.org/jira/browse/SPARK-17465).

Author: Xing SHI <shi-kou@indetail.co.jp>

Closes apache#15022 from saturday-shi/SPARK-17465.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants