Skip to content

Conversation

@Chopinxb
Copy link

@Chopinxb Chopinxb commented Aug 6, 2018

What changes were proposed in this pull request?

(In the PR, I propose to use Alluxio to help store shuffle data in order to improve the stability of complicated OLAP task.
Motivation
In original ways, when there is a shuffle fetch failure (NodeManager(shuffle service) crashed), spark will rerun previous stage to reproduce shuffle data. This way works well, but in some cases we cannot accept the recalculation price.
In this PR, when there is a shuffle fetch failure , reduce will retry fetch shuffle data from Alluxio to avoid recalculation
Usage

  1. Enable this feature in spark-default.conf.
    spark.alluxio.shuffle.enabled ture

How was this patch tested?

manual tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jerryshao
Copy link
Contributor

I believe such kind of PR requires SPIP and community discussion first.

@srowen srowen mentioned this pull request Nov 10, 2018
@asfgit asfgit closed this in a3ba3a8 Nov 11, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#21766
Closes apache#21679
Closes apache#21161
Closes apache#20846
Closes apache#19434
Closes apache#18080
Closes apache#17648
Closes apache#17169

Add:
Closes apache#22813
Closes apache#21994
Closes apache#22005
Closes apache#22463

Add:
Closes apache#15899

Add:
Closes apache#22539
Closes apache#21868
Closes apache#21514
Closes apache#21402
Closes apache#21322
Closes apache#21257
Closes apache#20163
Closes apache#19691
Closes apache#18697
Closes apache#18636
Closes apache#17176

Closes apache#23001 from wangyum/CloseStalePRs.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants