-
Notifications
You must be signed in to change notification settings - Fork 170
[BUG] Fix potential memory leak when encountering disk unhealthy #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@xianjingfeng @jerqi PTAL . Does this is a critical bug? I found the shuffle-server is always oom when encountering disk unhealthy(like reaching the highwatermark). |
|
Could you explain why this cause memory leak? |
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
============================================
+ Coverage 58.01% 58.73% +0.72%
- Complexity 1361 1581 +220
============================================
Files 171 193 +22
Lines 9006 10843 +1837
Branches 787 947 +160
============================================
+ Hits 5225 6369 +1144
- Misses 3449 4099 +650
- Partials 332 375 +43
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
It releases memory but not release the data reference. Right? |
It seems yes. I think it is the reason of #164 |
xianjingfeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
roryqi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. LGTM, thanks @zuston @xianjingfeng
|
Maybe this PR should be cherry-picked to release-0.6.0-RC2. |
You can cherry-pick this pr to branch-0.6. And reply the mail to the release manager in the dev@uniffle.apache.org. |
Thanks @zuston for the fix. Yes I think this is an important fix, I'll backport this into branch-0.6 and start a v0.6.0-rc3 release. |
### What changes were proposed in this pull request? Fix potential memory leak when encountering disk unhealthy ### Why are the changes needed? When encountering disk unhealthy and exceed the timeout of pendingDataShuffleFlushEvent, it will release memory. But in current codebase, it wont release the data reference and cause the memory leak. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need.
What changes were proposed in this pull request?
Fix potential memory leak when encountering disk unhealthy
Why are the changes needed?
When encountering disk unhealthy and exceed the timeout of pendingDataShuffleFlushEvent, it will release memory. But in current codebase, it wont release the data reference and cause the memory leak.
Does this PR introduce any user-facing change?
No
How was this patch tested?