Skip to content

Conversation

@xianjingfeng
Copy link
Member

What changes were proposed in this pull request?

Assign partition again if registerShuffleServers failed

Why are the changes needed?

If registerShuffleServers failed, task will fail and then application will failed

Does this PR introduce any user-facing change?

No

How was this patch tested?

Already added

@jerqi
Copy link
Contributor

jerqi commented Aug 1, 2022

  1. We also support mr client, could we also support the feature in the mr client?
  2. Could you add a integration-test case?

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2022

Codecov Report

Merging #115 (ac9c161) into master (deb7fe4) will increase coverage by 0.03%.
The diff coverage is 27.27%.

@@             Coverage Diff              @@
##             master     #115      +/-   ##
============================================
+ Coverage     56.52%   56.56%   +0.03%     
- Complexity     1183     1191       +8     
============================================
  Files           149      150       +1     
  Lines          8019     8048      +29     
  Branches        767      771       +4     
============================================
+ Hits           4533     4552      +19     
- Misses         3242     3249       +7     
- Partials        244      247       +3     
Impacted Files Coverage Δ
...apache/hadoop/mapreduce/v2/app/RssMRAppMaster.java 0.00% <0.00%> (ø)
...ava/org/apache/uniffle/common/util/RetryUtils.java 71.42% <71.42%> (ø)
.../apache/uniffle/coordinator/ClientConfManager.java 91.54% <0.00%> (-1.41%) ⬇️
...org/apache/uniffle/server/ShuffleFlushManager.java 78.40% <0.00%> (+1.70%) ⬆️
.../apache/uniffle/common/exception/RssException.java 50.00% <0.00%> (+50.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@xianjingfeng
Copy link
Member Author

  1. Could you add a integration-test case?

I don't know how to write this ut, it is difficult to stop shuffle server when registerShuffleServers.Do you have a good idea?

@jerqi
Copy link
Contributor

jerqi commented Aug 1, 2022

You can follow the steps:

  1. assign shuffle server for clients
  2. stop the shuffle server
  3. register shuffle servers
  4. retry and assign other servers.
    You can see https://github.com/apache/incubator-uniffle/blob/351275d532bdf61e12bd4efd3f8f8a68b091c67d/integration-test/common/src/test/java/org/apache/uniffle/test/ShuffleWithRssClientTest.java#L144

# Conflicts:
#	client-mr/src/main/java/org/apache/hadoop/mapreduce/v2/app/RssMRAppMaster.java
#	client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java
#	client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
#	client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java
@xianjingfeng xianjingfeng reopened this Aug 2, 2022
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work! Thanks for your contribution! @xianjingfeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants