Skip to content

Conversation

@vincent-grosbois
Copy link

@vincent-grosbois vincent-grosbois commented Jul 31, 2018

Add an option in Spark configuration to change the chunk size (which is by default 4 Mb).

This would allow to bypass the issue mentionned in SPARK-24917 by allowing users to define larger chunks.
Explanation:
currently, using netty < 4.1.28 (before this patch netty/netty@9b95b8e), sending a ChunkedByteBuffer with more than 16 chunks over the network will trigger a "merge" of all the blocks into one big transient array that is then sent over the network. This is problematic as the total memory for all chunks can be high (2GB) and this would then trigger an allocation of 2GB to merge everything, which will create OOM errors.
A possibility to bypass this netty behavior is to make sure that they data is never split into more than 16 chunks. One way to do this is to create bigger chunks, which is currently fixed to 4MB. In this commit I'm allowing users to define bigger chunk sizes for their job, which allowed us to bypass this OOM error.

What changes were proposed in this pull request?

I'm introducing a configuration parameter to define the chunk size

How was this patch tested?

Tested on several spark jobs, the changes actually work and generates "chunks" of the indicated size

Add an option in Spark configuration to change the chunk size (which is by default 4 Mb).

This would allow to bypass the issue mentionned in SPARK-24917 when fetching large partitions (a bit less than 2 Gb)
@holdensmagicalunicorn
Copy link

@vincent-grosbois, thanks! I am a bot who has found some folks who might be able to help with the review:@JoshRosen, @rxin and @vanzin

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Aug 1, 2018

Test build #93867 has finished for PR 21933 at commit 0251bd5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Aug 2, 2018

Can you add [SPARK-24917][CORE] in the title? Also, you need to describe more in the description about this issue; what does this pr solve?

@vincent-grosbois vincent-grosbois changed the title [SPARK-24917] make chunk size configurable [SPARK-24917][CORE] make chunk size configurable Aug 2, 2018
@vincent-grosbois
Copy link
Author

Hello, I updated the description and title

@maropu
Copy link
Member

maropu commented Aug 3, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94080 has finished for PR 21933 at commit 0251bd5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Aug 3, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94100 has finished for PR 21933 at commit 0251bd5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94098 has finished for PR 21933 at commit 0251bd5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Aug 3, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94114 has finished for PR 21933 at commit 0251bd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Whether to compress shuffle output temporarily spilled to disk
private[this] val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true)
// Size of the chunks to be used in the ChunkedByteBuffer
private[this] val chunkSizeMb = conf.getSizeAsMb("spark.memory.chunkSize", "4m").toInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name spark.memory.chunkSize looks too generic.
How about spark.memory.serializer.chunkSize or others?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to spark.memory.serializer.chunkSize

rename rename spark.memory.chunkSize to spark.memory.serializer.chunkSize
@SparkQA
Copy link

SparkQA commented Aug 3, 2018

Test build #94138 has finished for PR 21933 at commit e2961eb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Aug 7, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Aug 7, 2018

Test build #94346 has finished for PR 21933 at commit e2961eb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 7, 2018

Test build #94358 has finished for PR 21933 at commit e2961eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Aug 8, 2018

LGTM
cc @JoshRosen @vanzin

// Whether to compress shuffle output temporarily spilled to disk
private[this] val compressShuffleSpill = conf.getBoolean("spark.shuffle.spill.compress", true)
// Size of the chunks to be used in the ChunkedByteBuffer
private[this] val chunkSizeMb = conf.getSizeAsMb("spark.memory.serializer.chunkSize", "4m").toInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we do byteStringAsBytes and remove 1024 * 1024 below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincent-grosbois WDTY about this?

@HyukjinKwon
Copy link
Member

cc @squito too.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue and fix looks coherent and good to me.

@squito
Copy link
Contributor

squito commented Aug 10, 2018

Thanks for detailed analysis @vincent-grosbois . I agree with everything, but as you noted you won't hit this particular issue anymore with ChunkedByteBufferFileRegion. Is there another use-case?

@vincent-grosbois
Copy link
Author

Not really! This would be a useful features to backport in all spark branches that will never benefit from an upgrade to netty 4.1.28. However I think for spark 2.4 and the master branch, the netty dependency will eventually be bumped to 4.1.28 or more, which would make this commit useless...
feel free to drop this PR

@srowen
Copy link
Member

srowen commented Oct 25, 2018

Yeah this can be closed; we updated to 4.1.30

@asfgit asfgit closed this in 65c653f Oct 25, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#22567
Closes apache#18457
Closes apache#21517
Closes apache#21858
Closes apache#22383
Closes apache#19219
Closes apache#22401
Closes apache#22811
Closes apache#20405
Closes apache#21933

Closes apache#22819 from srowen/ClosePRs.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants