[SPARK-11583][Scheduler][Core] Make MapStatus use less memory uage #9559

yaooqinn · 2015-11-09T06:37:09Z

Disscuss about this pr is at https://issues.apache.org/jira/browse/SPARK-11583

In the resolved issue https://issues.apache.org/jira/browse/SPARK-11271, as I said, using BitSet can save ≈20% memory usage compared to RoaringBitMap.
For a spark job contains quite a lot of tasks, 20% seems a drop in the ocean.
Essentially, BitSet uses long[]. For example a BitSet[200k] = long[3125].
So if we use a HashSet[Int] to store reduceId (when non-empty blocks are dense,use reduceId of empty blocks; when sparse, use non-empty ones).
For dense cases: if HashSetInt.size < BitSet[totalBlockNum], I use MapStatusTrackingNoEmptyBlocks
For sparse cases: if HashSetInt.size < BitSet[totalBlockNum], I use MapStatusTrackingEmptyBlocks
sparse case, 299/300 are empty
sc.makeRDD(1 to 30000, 3000).groupBy(x=>x).top(5)
dense case, no block is empty
sc.makeRDD(1 to 9000000, 3000).groupBy(x=>x).top(5)

SparkQA · 2015-11-09T06:59:21Z

Test build #2020 has finished for PR 9559 at commit cb4bce5.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2015-11-09T07:36:58Z

retest plz

watermen · 2015-11-09T08:49:02Z

retest this please

rxin · 2015-11-09T17:51:55Z

Can you use the OpenHashSet in Spark?

andrewor14 · 2015-11-10T00:44:47Z

retest this please

SparkQA · 2015-11-10T00:50:56Z

Test build #45450 has finished for PR 9559 at commit e1d1106.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-11-10T00:56:14Z

core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala

indent these by 2 more spaces

yaooqinn · 2015-11-10T01:18:45Z

@rxin OpenHashSet replaces HashSet

yaooqinn · 2015-11-10T01:19:26Z

@andrewor14 Thanks for your advices

squito · 2015-11-10T17:17:38Z

core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala

isSparse is the wrong name, I think -- both cases are sparse, its a question of whether or not you are storing the empty blocks.

squito · 2015-11-10T17:19:56Z

Sean has raised some important higher level questions on the jira -- I'd like us to resolve the discussion there before moving forward on this.

andrewor14 · 2015-11-12T20:41:03Z

@yaooqinn Can you close this now that you have the new patch?

yaooqinn · 2015-11-13T03:04:36Z

OK, close this pr and see #9661

YAOQIN and others added 3 commits November 7, 2015 15:04

map status smaller

79353eb

Let MapStatus be smaller in sparse/tense cases

492adeb

Let MapStatus be smaller in sparse/tense cases 1

cb4bce5

repair scalastyle check failures

d17e989

yaooqinn changed the title ~~[SPARK-11583] Make MapStatus use less memory uage~~ [SPARK-11583][Scheduler][Core] Make MapStatus use less memory uage Nov 9, 2015

simliar classes merged

e1d1106

use OpenHashSet instead of scala's HashSet

8582522

andrewor14 reviewed Nov 10, 2015
View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala Outdated

Copy link

Contributor

andrewor14 Nov 10, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent these by 2 more spaces

fix style

e6fb687

squito reviewed Nov 10, 2015
View reviewed changes

fix ser/der bug

4fc89ad

yaooqinn mentioned this pull request Nov 12, 2015

[SPARK-11583] [Core]MapStatus Using RoaringBitmap More Properly #9661

Closed

yaooqinn closed this Nov 13, 2015

[SPARK-11583][Scheduler][Core] Make MapStatus use less memory uage #9559

[SPARK-11583][Scheduler][Core] Make MapStatus use less memory uage #9559

Uh oh!

Conversation

yaooqinn commented Nov 9, 2015

Uh oh!

SparkQA commented Nov 9, 2015

Uh oh!

yaooqinn commented Nov 9, 2015

Uh oh!

watermen commented Nov 9, 2015

Uh oh!

rxin commented Nov 9, 2015

Uh oh!

andrewor14 commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

andrewor14 Nov 10, 2015

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Nov 10, 2015

Uh oh!

yaooqinn commented Nov 10, 2015

Uh oh!

squito Nov 10, 2015

Choose a reason for hiding this comment

Uh oh!

squito commented Nov 10, 2015

Uh oh!

andrewor14 commented Nov 12, 2015

Uh oh!

yaooqinn commented Nov 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants