-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27637][Shuffle][FOLLOW-UP]For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533 #25469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
gentle ping @cloud-fan @felixcheung |
|
ok to test |
|
Test build #109174 has finished for PR 25469 at commit
|
|
@cloud-fan strange test result of Unit test. The shown failed Unit test can be run correctly in my computer |
|
retest this please |
|
@AngersZhuuuu ok, the other prs hit the same errors, so they are not related to this pr. |
Got it, thanks. |
|
Test build #109183 has finished for PR 25469 at commit
|
|
@maropu passed test but seem failed due to some strange reason |
|
retest this please |
|
Test build #109199 has finished for PR 25469 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
In pr #24533 , it prevent retry to a removed Executor.
In my test, I can't catch exceptions from
new OneForOneBlockFetcher(client, appId, execId, blockIds, listener, transportConf, tempFileManager).start()And I check the code carefully, method start() will handle exception of IOException in it's retry logical, won't throw it out. until it meet maxRetry times or meet exception that is not IOException.
And if we meet the situation that when we fetch block , the executor is dead, when we rerun
RetryingBlockFetcher.BlockFetchStarter.createAndStart()we may failed when we create a transport client to dead executor. it will throw a IOException.
We should catch this IOException.
Why are the changes needed?
Old solution not comprehensive. Didn't cover more case.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existed Unit Test