-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file #32401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
261c9fc
update
Ngone51 7a70312
update test
Ngone51 248cf2b
add comment
Ngone51 a432694
update commitAllPartitions to include checksums
Ngone51 dab8c4d
use final for checksumEnabled
Ngone51 cd25f8a
remove dead code
Ngone51 ebd1c80
move sorter null set to finally
Ngone51 9257f2b
combine if conditions in removeDataByMap
Ngone51 52610ca
simplify the getChecksums in ExternalSorter
Ngone51 3942aa1
add brackets to setChecksum
Ngone51 7aa5c5c
fix mima
Ngone51 3788e77
add enabled suffix
Ngone51 627c597
change version to 3.3.0
Ngone51 dae2dca
rename to SHUFFLE_CHECKSUM_ENABLED
Ngone51 c82710e
add ShuffleChecksumHelper to support different checksum algo
Ngone51 d6998c2
add algo to file extension
Ngone51 9745479
update comment
Ngone51 48de152
remove unused import
Ngone51 c28e7e7
update ShuffleChecksumHelper
Ngone51 69250ff
only set checksum for partitionPairsWriter when enabled
Ngone51 7102e7f
set checksum in ShufflePartitionPairsWriter's constructor
Ngone51 02f074a
remove unused mapId&shuffleId
Ngone51 dcc8dde
insensitive cheksum algo
Ngone51 ac8025f
refactor existingChecksums
Ngone51 1e9876d
remove unsued shuffleId/mapId
Ngone51 1573fb6
use uppercase
Ngone51 5446bb1
rename to testhelper
Ngone51 0908076
pick the checksum by index file ext
Ngone51 09f9015
remove unused import
Ngone51 62b479e
pull empty checksum as a static final value
Ngone51 d71283c
pull empty checksum value into a static final value
Ngone51 302ca76
fix unsafe write due to shuffleId/map
Ngone51 b81e556
update comment
Ngone51 eeb3ef7
add doc for ShuffleChecksumHelper
Ngone51 bf3ca61
remove unncessary partitionChecksums
Ngone51 d8c70dc
add since for ShuffleChecksumBlockId
Ngone51 71a2ef6
handle error of checksum file delete
Ngone51 63df5cd
update comment
Ngone51 11adda4
update writeMetadataFileAndCommit
Ngone51 4da61d9
fix
Ngone51 8d54e38
fix compile
Ngone51 dfbe4b6
add license
Ngone51 230648c
fix mimia
Ngone51 68af8c4
fix java lint
Ngone51 caaf76d
fix doc
Ngone51 8fc7193
fix test
Ngone51 4bdde58
fix java lint
Ngone51 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
100 changes: 100 additions & 0 deletions
100
core/src/main/java/org/apache/spark/shuffle/checksum/ShuffleChecksumHelper.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.shuffle.checksum; | ||
Ngone51 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| import java.util.zip.Adler32; | ||
| import java.util.zip.CRC32; | ||
| import java.util.zip.Checksum; | ||
|
|
||
| import org.apache.spark.SparkConf; | ||
| import org.apache.spark.SparkException; | ||
| import org.apache.spark.annotation.Private; | ||
| import org.apache.spark.internal.config.package$; | ||
| import org.apache.spark.storage.ShuffleChecksumBlockId; | ||
|
|
||
| /** | ||
| * A set of utility functions for the shuffle checksum. | ||
| */ | ||
| @Private | ||
| public class ShuffleChecksumHelper { | ||
|
||
|
|
||
| /** Used when the checksum is disabled for shuffle. */ | ||
| private static final Checksum[] EMPTY_CHECKSUM = new Checksum[0]; | ||
| public static final long[] EMPTY_CHECKSUM_VALUE = new long[0]; | ||
|
|
||
| public static boolean isShuffleChecksumEnabled(SparkConf conf) { | ||
| return (boolean) conf.get(package$.MODULE$.SHUFFLE_CHECKSUM_ENABLED()); | ||
| } | ||
|
|
||
| public static Checksum[] createPartitionChecksumsIfEnabled(int numPartitions, SparkConf conf) | ||
| throws SparkException { | ||
| if (!isShuffleChecksumEnabled(conf)) { | ||
| return EMPTY_CHECKSUM; | ||
| } | ||
|
|
||
| String checksumAlgo = shuffleChecksumAlgorithm(conf); | ||
| return getChecksumByAlgorithm(numPartitions, checksumAlgo); | ||
| } | ||
|
|
||
| private static Checksum[] getChecksumByAlgorithm(int num, String algorithm) | ||
| throws SparkException { | ||
| Checksum[] checksums; | ||
| switch (algorithm) { | ||
| case "ADLER32": | ||
| checksums = new Adler32[num]; | ||
| for (int i = 0; i < num; i ++) { | ||
| checksums[i] = new Adler32(); | ||
| } | ||
| return checksums; | ||
|
|
||
| case "CRC32": | ||
| checksums = new CRC32[num]; | ||
| for (int i = 0; i < num; i ++) { | ||
| checksums[i] = new CRC32(); | ||
| } | ||
| return checksums; | ||
|
|
||
| default: | ||
| throw new SparkException("Unsupported shuffle checksum algorithm: " + algorithm); | ||
| } | ||
| } | ||
|
|
||
| public static long[] getChecksumValues(Checksum[] partitionChecksums) { | ||
mridulm marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| int numPartitions = partitionChecksums.length; | ||
| long[] checksumValues = new long[numPartitions]; | ||
| for (int i = 0; i < numPartitions; i ++) { | ||
| checksumValues[i] = partitionChecksums[i].getValue(); | ||
| } | ||
| return checksumValues; | ||
| } | ||
|
|
||
| public static String shuffleChecksumAlgorithm(SparkConf conf) { | ||
| return conf.get(package$.MODULE$.SHUFFLE_CHECKSUM_ALGORITHM()); | ||
| } | ||
|
|
||
| public static Checksum getChecksumByFileExtension(String fileName) throws SparkException { | ||
| int index = fileName.lastIndexOf("."); | ||
| String algorithm = fileName.substring(index + 1); | ||
| return getChecksumByAlgorithm(1, algorithm)[0]; | ||
| } | ||
|
|
||
| public static String getChecksumFileName(ShuffleChecksumBlockId blockId, SparkConf conf) { | ||
| // append the shuffle checksum algorithm as the file extension | ||
| return String.format("%s.%s", blockId.name(), shuffleChecksumAlgorithm(conf)); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't think the current shuffle API provides enough abstraction to do checksum. I'm OK with this change as the shuffle API is still private, but we should revisit the shuffle API later, so that checksum can be done at the shuffle implementation side.
The current issue I see is, Spark writes local spill files and then asks the shuffle implementation to "transfer" the spill files. Then Spark has to do checksum by itself during spill file writing, to reduce the perf overhead.
We can discuss it later.