SPARK-3358: [EC2] Switch back to HVM instances for m3.X. #2244

pwendell · 2014-09-03T03:35:17Z

During regression tests of Spark 1.1 we discovered perf issues with
PVM instances when running PySpark. This reverts a change added in #1156
which changed the default type for m3 instances to PVM.

During regression tests of Spark 1.1 we discovered perf issues with PVM instances when running PySpark. This reverts a change added in apache#1156 which changed the default type for m3 instances to PVM.

JoshRosen · 2014-09-03T03:42:23Z

This looks good to me, especially since the m3.* instances used HVM AMIs in 1.0.2.

shivaram · 2014-09-03T03:45:12Z

Ah interesting. One more thing is that m3 doesn't mount the SSDs by default (there was a recent spark_ec2.py change to fix this). The regression could have been due to using EBS instead of SSDs for shuffle ?

JoshRosen · 2014-09-03T04:01:55Z

I observed a large performance difference on a microbenchmark that only called os.fork() in Python, plus the script in SPARK-3333 didn't move much data during the shuffle (since the RDD only contained 3 items total), so I think it's more likely that the performance difference is due to the virtualization technique than the disks. Also, the cross-version comparisons were run on the same m3 nodes, so they should have both been using the same disk setup.

pwendell · 2014-09-03T04:28:05Z

@shivaram yeah we tested this including the SSD fix. We were able to narrow it down fairly closely to os.fork() issues, which others have documented have issues with certain instance types.

pwendell · 2014-09-03T04:29:53Z

Okay guys I'm pulling this in for a new RC hopefully everyone is okay with it.

During regression tests of Spark 1.1 we discovered perf issues with PVM instances when running PySpark. This reverts a change added in #1156 which changed the default type for m3 instances to PVM. Author: Patrick Wendell <pwendell@gmail.com> Closes #2244 from pwendell/ec2-hvm and squashes the following commits: 1342d7e [Patrick Wendell] SPARK-3358: [EC2] Switch back to HVM instances for m3.X.

shivaram · 2014-09-03T04:38:51Z

Sounds good. Nice find on os.fork !

During regression tests of Spark 1.1 we discovered perf issues with PVM instances when running PySpark. This reverts a change added in apache#1156 which changed the default type for m3 instances to PVM. Author: Patrick Wendell <pwendell@gmail.com> Closes apache#2244 from pwendell/ec2-hvm and squashes the following commits: 1342d7e [Patrick Wendell] SPARK-3358: [EC2] Switch back to HVM instances for m3.X.

SPARK-3358: [EC2] Switch back to HVM instances for m3.X.

1342d7e

During regression tests of Spark 1.1 we discovered perf issues with PVM instances when running PySpark. This reverts a change added in apache#1156 which changed the default type for m3 instances to PVM.

asfgit closed this in c64cc43 Sep 3, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SPARK-3358: [EC2] Switch back to HVM instances for m3.X. #2244

SPARK-3358: [EC2] Switch back to HVM instances for m3.X. #2244

Uh oh!

pwendell commented Sep 3, 2014

Uh oh!

JoshRosen commented Sep 3, 2014

Uh oh!

shivaram commented Sep 3, 2014

Uh oh!

JoshRosen commented Sep 3, 2014

Uh oh!

pwendell commented Sep 3, 2014

Uh oh!

pwendell commented Sep 3, 2014

Uh oh!

shivaram commented Sep 3, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SPARK-3358: [EC2] Switch back to HVM instances for m3.X. #2244

SPARK-3358: [EC2] Switch back to HVM instances for m3.X. #2244

Uh oh!

Conversation

pwendell commented Sep 3, 2014

Uh oh!

JoshRosen commented Sep 3, 2014

Uh oh!

shivaram commented Sep 3, 2014

Uh oh!

JoshRosen commented Sep 3, 2014

Uh oh!

pwendell commented Sep 3, 2014

Uh oh!

pwendell commented Sep 3, 2014

Uh oh!

shivaram commented Sep 3, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants