[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data #22005

Chopinxb · 2018-08-06T08:56:17Z

What changes were proposed in this pull request?

(In the PR, I propose to use Alluxio to help store shuffle data in order to improve the stability of complicated OLAP task.
Motivation
In original ways, when there is a shuffle fetch failure (NodeManager(shuffle service) crashed), spark will rerun previous stage to reproduce shuffle data. This way works well, but in some cases we cannot accept the recalculation price.
In this PR, when there is a shuffle fetch failure , reduce will retry fetch shuffle data from Alluxio to avoid recalculation
Usage

Enable this feature in spark-default.conf.
spark.alluxio.shuffle.enabled ture

How was this patch tested?

manual tests

AmplabJenkins · 2018-08-06T08:58:03Z

Can one of the admins verify this patch?

jerryshao · 2018-08-13T02:45:41Z

I believe such kind of PR requires SPIP and community discussion first.

Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>

XiaoBang added 3 commits August 6, 2018 13:10

use alluxio to improve stability of shuffle

a4371cf

update style

6565988

update style

20cabe1

Merge branch 'master' into spark-shuffle-alluxio

b6fde21

srowen mentioned this pull request Nov 10, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data #22005

[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data #22005

Uh oh!

Chopinxb commented Aug 6, 2018

Uh oh!

AmplabJenkins commented Aug 6, 2018

Uh oh!

jerryshao commented Aug 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data #22005

[SPARK-16817][CORE][WIP] Use Alluxio to improve stability of shuffle by replication of shuffle data #22005

Uh oh!

Conversation

Chopinxb commented Aug 6, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Aug 6, 2018

Uh oh!

jerryshao commented Aug 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants