[SPARK-24917][CORE] make chunk size configurable #21933

vincent-grosbois · 2018-07-31T16:22:35Z

Add an option in Spark configuration to change the chunk size (which is by default 4 Mb).

This would allow to bypass the issue mentionned in SPARK-24917 by allowing users to define larger chunks.
Explanation:
currently, using netty < 4.1.28 (before this patch netty/netty@9b95b8e), sending a ChunkedByteBuffer with more than 16 chunks over the network will trigger a "merge" of all the blocks into one big transient array that is then sent over the network. This is problematic as the total memory for all chunks can be high (2GB) and this would then trigger an allocation of 2GB to merge everything, which will create OOM errors.
A possibility to bypass this netty behavior is to make sure that they data is never split into more than 16 chunks. One way to do this is to create bigger chunks, which is currently fixed to 4MB. In this commit I'm allowing users to define bigger chunk sizes for their job, which allowed us to bypass this OOM error.

What changes were proposed in this pull request?

I'm introducing a configuration parameter to define the chunk size

How was this patch tested?

Tested on several spark jobs, the changes actually work and generates "chunks" of the indicated size

Add an option in Spark configuration to change the chunk size (which is by default 4 Mb). This would allow to bypass the issue mentionned in SPARK-24917 when fetching large partitions (a bit less than 2 Gb)

holdensmagicalunicorn · 2018-07-31T16:22:37Z

@vincent-grosbois, thanks! I am a bot who has found some folks who might be able to help with the review:@JoshRosen, @rxin and @vanzin

HyukjinKwon · 2018-08-01T04:00:09Z

ok to test

SparkQA · 2018-08-01T07:50:37Z

Test build #93867 has finished for PR 21933 at commit 0251bd5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-08-02T02:43:42Z

Can you add [SPARK-24917][CORE] in the title? Also, you need to describe more in the description about this issue; what does this pr solve?

vincent-grosbois · 2018-08-02T15:11:31Z

Hello, I updated the description and title

maropu · 2018-08-03T01:10:42Z

retest this please

SparkQA · 2018-08-03T04:54:54Z

Test build #94080 has finished for PR 21933 at commit 0251bd5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-03T04:57:46Z

retest this please

SparkQA · 2018-08-03T07:05:01Z

Test build #94100 has finished for PR 21933 at commit 0251bd5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-08-03T07:05:01Z

Test build #94098 has finished for PR 21933 at commit 0251bd5.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-03T07:31:18Z

retest this please

SparkQA · 2018-08-03T12:03:06Z

Test build #94114 has finished for PR 21933 at commit 0251bd5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-03T13:41:29Z

core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala

  // Whether to compress shuffle output temporarily spilled to disk
  private[this] val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true)
+  // Size of the chunks to be used in the ChunkedByteBuffer
+  private[this] val chunkSizeMb = conf.getSizeAsMb("spark.memory.chunkSize", "4m").toInt


The name spark.memory.chunkSize looks too generic.
How about spark.memory.serializer.chunkSize or others?

I renamed it to spark.memory.serializer.chunkSize

rename rename spark.memory.chunkSize to spark.memory.serializer.chunkSize

SparkQA · 2018-08-03T18:09:12Z

Test build #94138 has finished for PR 21933 at commit e2961eb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-07T05:55:28Z

retest this please

SparkQA · 2018-08-07T07:05:02Z

Test build #94346 has finished for PR 21933 at commit e2961eb.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-08-07T08:38:21Z

retest this please

SparkQA · 2018-08-07T13:12:08Z

Test build #94358 has finished for PR 21933 at commit e2961eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-08-08T09:38:46Z

LGTM
cc @JoshRosen @vanzin

HyukjinKwon · 2018-08-10T06:40:15Z

core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala

  // Whether to compress shuffle output temporarily spilled to disk
  private[this] val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true)
+  // Size of the chunks to be used in the ChunkedByteBuffer
+  private[this] val chunkSizeMb = conf.getSizeAsMb("spark.memory.serializer.chunkSize", "4m").toInt


Why don't we do byteStringAsBytes and remove 1024 * 1024 below?

@vincent-grosbois WDTY about this?

HyukjinKwon · 2018-08-10T06:41:27Z

cc @squito too.

HyukjinKwon

The issue and fix looks coherent and good to me.

squito · 2018-08-10T17:43:52Z

Thanks for detailed analysis @vincent-grosbois . I agree with everything, but as you noted you won't hit this particular issue anymore with ChunkedByteBufferFileRegion. Is there another use-case?

vincent-grosbois · 2018-08-11T16:16:07Z

Not really! This would be a useful features to backport in all spark branches that will never benefit from an upgrade to netty 4.1.28. However I think for spark 2.4 and the master branch, the netty dependency will eventually be bumped to 4.1.28 or more, which would make this commit useless...
feel free to drop this PR

srowen · 2018-10-25T00:57:18Z

Yeah this can be closed; we updated to 4.1.30

Closes apache#22567 Closes apache#18457 Closes apache#21517 Closes apache#21858 Closes apache#22383 Closes apache#19219 Closes apache#22401 Closes apache#22811 Closes apache#20405 Closes apache#21933 Closes apache#22819 from srowen/ClosePRs. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

[SPARK-24917] make chunk size configurable

0251bd5

Add an option in Spark configuration to change the chunk size (which is by default 4 Mb). This would allow to bypass the issue mentionned in SPARK-24917 when fetching large partitions (a bit less than 2 Gb)

vincent-grosbois mentioned this pull request Jul 31, 2018

[SPARK-24917] make chunk size configurable criteo-forks/spark#77

Merged

vincent-grosbois changed the title ~~[SPARK-24917] make chunk size configurable~~ [SPARK-24917][CORE] make chunk size configurable Aug 2, 2018

kiszk reviewed Aug 3, 2018

View reviewed changes

rename spark.memory.chunkSize

e2961eb

rename rename spark.memory.chunkSize to spark.memory.serializer.chunkSize

HyukjinKwon reviewed Aug 10, 2018

View reviewed changes

HyukjinKwon approved these changes Aug 10, 2018

View reviewed changes

wangyum mentioned this pull request Oct 24, 2018

[BUILD] Close stale PRs #22819

Closed

asfgit closed this in 65c653f Oct 25, 2018

[SPARK-24917][CORE] make chunk size configurable #21933

[SPARK-24917][CORE] make chunk size configurable #21933

Uh oh!

Conversation

vincent-grosbois commented Jul 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

holdensmagicalunicorn commented Jul 31, 2018

Uh oh!

HyukjinKwon commented Aug 1, 2018

Uh oh!

SparkQA commented Aug 1, 2018

Uh oh!

maropu commented Aug 2, 2018

Uh oh!

vincent-grosbois commented Aug 2, 2018

Uh oh!

maropu commented Aug 3, 2018

Uh oh!

SparkQA commented Aug 3, 2018

Uh oh!

kiszk commented Aug 3, 2018

Uh oh!

SparkQA commented Aug 3, 2018

Uh oh!

SparkQA commented Aug 3, 2018

Uh oh!

kiszk commented Aug 3, 2018

Uh oh!

SparkQA commented Aug 3, 2018

Uh oh!

kiszk Aug 3, 2018

Choose a reason for hiding this comment

Uh oh!

vincent-grosbois Aug 3, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 3, 2018

Uh oh!

kiszk commented Aug 7, 2018

Uh oh!

SparkQA commented Aug 7, 2018

Uh oh!

HyukjinKwon commented Aug 7, 2018

Uh oh!

SparkQA commented Aug 7, 2018

Uh oh!

kiszk commented Aug 8, 2018

Uh oh!

HyukjinKwon Aug 10, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Aug 13, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Aug 10, 2018

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

squito commented Aug 10, 2018

Uh oh!

vincent-grosbois commented Aug 11, 2018

Uh oh!

srowen commented Oct 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

vincent-grosbois commented Jul 31, 2018 •

edited

Loading