Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Sep 30, 2019

What changes were proposed in this pull request?

This patch proposes to fix the issue that storage memory is not decreasing even block is removed in BlockManager. Originally the issue is found while removed broadcast doesn't reflect the storage memory on driver/executors.

AppStatusListener expects the value of memory in events on block update as "delta" so that it adjusts driver/executors' storage memory based on delta, but when removing block BlockManager reports the delta as 0, so the storage memory is not decreased. BlockManager.dropFromMemory deals with this correctly, so some of path of freeing memory has been updated correctly.

Why are the changes needed?

The storage memory in metrics in AppStatusListener is now out of sync which lets end users easy to confuse as memory leak is happening.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Modified UTs. Also manually tested via running simple query repeatedly and observe executor page of Spark UI to see the value of storage memory is decreasing as well.

Please refer the description of SPARK-29055 to get simple reproducer.

@SparkQA
Copy link

SparkQA commented Sep 30, 2019

Test build #111601 has finished for PR 25973 at commit 703b33b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Sep 30, 2019

AppStatusListener expects the value of memory in events on block update as "delta"

That won't be true when #25779 is pushed. Does that fix this issue too?

@vanzin
Copy link
Contributor

vanzin commented Sep 30, 2019

Actually my fix is for cached blocks, broadcast blocks would also need something similar for everything to be fixed.

Anyway, I think fixing AppStatusListener is the right thing, instead of changing the meaning of things in the event.

@HeartSaVioR
Copy link
Contributor Author

That won't be true when #25779 is pushed.

Sorry I forgot #25779 while working. Investigation started from seeking memory leak on driver and I was too focused about fixing the issue without trying to look back. I'll take a look at the change being proposed in #25779 (Looks like it's about to be finished and expected to be merged soon though.)

Actually my fix is for cached blocks, broadcast blocks would also need something similar for everything to be fixed.

Got it. Would you like to deal with it by yourself, or let me deal with this?

@vanzin
Copy link
Contributor

vanzin commented Oct 1, 2019

After investigating more things around my original change, this LGTM. It's good to have all block updates contain consistent information (delta vs. actual values).

There's still SPARK-29319 but that's a separate issue.

Merging to master / 2.4.

@vanzin vanzin closed this in a4601cb Oct 1, 2019
vanzin pushed a commit that referenced this pull request Oct 1, 2019
…k is removed from BlockManager

This patch proposes to fix the issue that storage memory is not decreasing even block is removed in BlockManager. Originally the issue is found while removed broadcast doesn't reflect the storage memory on driver/executors.

AppStatusListener expects the value of memory in events on block update as "delta" so that it adjusts driver/executors' storage memory based on delta, but when removing block BlockManager reports the delta as 0, so the storage memory is not decreased. `BlockManager.dropFromMemory` deals with this correctly, so some of path of freeing memory has been updated correctly.

The storage memory in metrics in AppStatusListener is now out of sync which lets end users easy to confuse as memory leak is happening.

No.

Modified UTs. Also manually tested via running simple query repeatedly and observe executor page of Spark UI to see the value of storage memory is decreasing as well.

Please refer the description of [SPARK-29055](https://issues.apache.org/jira/browse/SPARK-29055) to get simple reproducer.

Closes #25973 from HeartSaVioR/SPARK-29055.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(cherry picked from commit a4601cb)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@HeartSaVioR
Copy link
Contributor Author

Thanks for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the SPARK-29055 branch October 1, 2019 19:27
@puneetloya
Copy link

Is there gonna be a 2.4.5 release soon?

@HeartSaVioR
Copy link
Contributor Author

@puneetloya

  • Spark 2.4.4 released (Sep 01, 2019)
  • Spark 2.4.3 released (May 08, 2019)

so maybe not. You may want to ask about this in the mailing list if you are affected by the origin bug and it is so critical for you to fix it.

@puneetloya
Copy link

Will post in the mailing list. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants