Skip to content

Conversation

@xianjingfeng
Copy link
Member

What changes were proposed in this pull request?

1.In server side, if requireBufferId not found when send data, thrown an exception.
2.In client side, if fail to send data, require buffer again.

Why are the changes needed?

We found shuffle server which under high load is easy encounter java.lang.OutOfMemoryError: Java heap space even we allocate more jvm heap memory and less rss.server.buffer.capacity #76

Does this PR introduce any user-facing change?

No

How was this patch tested?

Have already verify in our production enviroment.

@jerqi
Copy link
Contributor

jerqi commented Aug 17, 2022

Could we add a ut for this pr?

@codecov-commenter
Copy link

codecov-commenter commented Aug 17, 2022

Codecov Report

Merging #159 (a6ed5ba) into master (fd8ccdd) will increase coverage by 0.76%.
The diff coverage is 13.33%.

@@             Coverage Diff              @@
##             master     #159      +/-   ##
============================================
+ Coverage     57.17%   57.93%   +0.76%     
- Complexity     1201     1233      +32     
============================================
  Files           150      153       +3     
  Lines          8178     8250      +72     
  Branches        773      771       -2     
============================================
+ Hits           4676     4780     +104     
+ Misses         3256     3226      -30     
+ Partials        246      244       -2     
Impacted Files Coverage Δ
...ava/org/apache/uniffle/common/util/RetryUtils.java 71.42% <0.00%> (ø)
...pache/uniffle/server/ShuffleServerGrpcService.java 0.98% <0.00%> (-0.03%) ⬇️
...he/uniffle/common/exception/NotRetryException.java 50.00% <50.00%> (ø)
...storage/handler/impl/DataSkippableReadHandler.java 81.25% <0.00%> (-3.13%) ⬇️
...he/uniffle/server/storage/LocalStorageManager.java 61.53% <0.00%> (-0.37%) ⬇️
.../org/apache/uniffle/server/ShuffleTaskManager.java 63.96% <0.00%> (-0.14%) ⬇️
...le/common/rpc/MonitoringServerTransportFilter.java 0.00% <0.00%> (ø)
...ava/org/apache/uniffle/server/ShuffleTaskInfo.java 91.66% <0.00%> (ø)
.../org/apache/uniffle/common/config/RssBaseConf.java 91.86% <0.00%> (+0.06%) ⬆️
...org/apache/uniffle/server/ShuffleFlushManager.java 78.53% <0.00%> (+0.12%) ⬆️
... and 12 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

int retryNum = 0;
while (retryNum < maxRetryAttempts) {
try {
long requireId = requirePreAllocation(size, request.getRetryMax(), request.getRetryIntervalMax());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse RetryUtils?

}

private SendShuffleDataResponse doSendData(SendShuffleDataRequest rpcRequest) {
private SendShuffleDataResponse doSendData(String appId, int size, RssSendShuffleDataRequest request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method doXXXXX usually have one rpc operation. Could we encapsulate the behavior in the other places?

@jerqi jerqi changed the title Disallow sendShuffleData if requireBufferId expired [ISSUE-76] Disallow sendShuffleData if requireBufferId expired Aug 18, 2022
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, This is an important pr, so cc @colinmjj

@jerqi jerqi requested a review from colinmjj August 18, 2022 09:42
@colinmjj
Copy link

To avoid possible OOM in server side, LGTM, +1

@jerqi jerqi merged commit 06022c1 into apache:master Aug 19, 2022
@jerqi
Copy link
Contributor

jerqi commented Aug 19, 2022

Merged. Thanks @xianjingfeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants