Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Nov 9, 2015

Disscuss about this pr is at https://issues.apache.org/jira/browse/SPARK-11583

In the resolved issue https://issues.apache.org/jira/browse/SPARK-11271, as I said, using BitSet can save ≈20% memory usage compared to RoaringBitMap.
For a spark job contains quite a lot of tasks, 20% seems a drop in the ocean.
Essentially, BitSet uses long[]. For example a BitSet[200k] = long[3125].
So if we use a HashSet[Int] to store reduceId (when non-empty blocks are dense,use reduceId of empty blocks; when sparse, use non-empty ones).
For dense cases: if HashSetInt.size < BitSet[totalBlockNum], I use MapStatusTrackingNoEmptyBlocks
For sparse cases: if HashSetInt.size < BitSet[totalBlockNum], I use MapStatusTrackingEmptyBlocks
sparse case, 299/300 are empty
sc.makeRDD(1 to 30000, 3000).groupBy(x=>x).top(5)
dense case, no block is empty
sc.makeRDD(1 to 9000000, 3000).groupBy(x=>x).top(5)

@SparkQA
Copy link

SparkQA commented Nov 9, 2015

Test build #2020 has finished for PR 9559 at commit cb4bce5.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 9, 2015

retest plz

@watermen
Copy link
Contributor

watermen commented Nov 9, 2015

retest this please

@yaooqinn yaooqinn changed the title [SPARK-11583] Make MapStatus use less memory uage [SPARK-11583][Scheduler][Core] Make MapStatus use less memory uage Nov 9, 2015
@rxin
Copy link
Contributor

rxin commented Nov 9, 2015

Can you use the OpenHashSet in Spark?

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #45450 has finished for PR 9559 at commit e1d1106.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent these by 2 more spaces

@yaooqinn
Copy link
Member Author

@rxin OpenHashSet replaces HashSet

@yaooqinn
Copy link
Member Author

@andrewor14 Thanks for your advices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSparse is the wrong name, I think -- both cases are sparse, its a question of whether or not you are storing the empty blocks.

@squito
Copy link
Contributor

squito commented Nov 10, 2015

Sean has raised some important higher level questions on the jira -- I'd like us to resolve the discussion there before moving forward on this.

@andrewor14
Copy link
Contributor

@yaooqinn Can you close this now that you have the new patch?

@yaooqinn
Copy link
Member Author

OK, close this pr and see #9661

@yaooqinn yaooqinn closed this Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants