-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-17293. First packet data + checksum size will be set to 516 bytes when writing to a new block. #6368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s when writing to a new block.
|
@Hexiaoqiao @tomscut @ayushtkn @zhangshuyan0 Sir, could you please help me review this code when you have free time?Thanks ahead. |
|
💔 -1 overall
This message was automatically generated. |
...op-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
Show resolved
Hide resolved
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@Hexiaoqiao @zhangshuyan0 @slfan1989 Sir, have updated this PR, added an unit test. could you please help me review this pr when you are free? Thanks~ |
zhangshuyan0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Leave some small suggestions.
| fos.write(buf); | ||
| fos.hflush(); | ||
| loop++; | ||
| Assert.assertNotEquals(crc32c.getBytesPerChecksum() + crc32c.getChecksumSize(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more appropriate to precisely specify the expected packetSize here.
Outside the while loop:
int chunkSize = crc32c.getBytesPerChecksum() + crc32c.getChecksumSize();
int packetContentSize = (dfsConf.getInt(DFS_CLIENT_WRITE_PACKET_SIZE_KEY, DFS_CLIENT_WRITE_PACKET_SIZE_DEFAULT) - PacketHeader.PKT_MAX_HEADER_LEN)/chunkSize*chunkSize;
And here:
Assert.assertEquals(((DFSOutputStream) fos.getWrappedStream()).packetSize, packetContentSize);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sir, thanks for this valuable suggestion. Will fix it soon.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java
Outdated
Show resolved
Hide resolved
|
This PR has corrected the size of the first packet in a new block, which is great. However, due to the original logical problem in Lines 531 to 543 in 27ecc23
Line540, when we pass blockSize - getStreamer().getBytesCurBlock() to computePacketChunkSize as the first parameter, computePacketChunkSize is likely to cause the data that could have been sent in one data packet to be split into two data packets and sent.
|
0a43db0 to
af8a23b
Compare
|
@zhangshuyan0 Sir, Have modified according to your review opinions. Please take a look when you have free time. Thanks a lot~ |
|
💔 -1 overall
This message was automatically generated. |
af8a23b to
da1ea49
Compare
da1ea49 to
68292c7
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
Sir, very nice catch. I think below code may resolve the problem you found. Please take a look when you are free. if (!getStreamer().getAppendChunk()) {
int psize = 0;
if (blockSize == getStreamer().getBytesCurBlock()) {
psize = writePacketSize;
} else if (blockSize - getStreamer().getBytesCurBlock() + PacketHeader.PKT_MAX_HEADER_LEN
< writePacketSize ) {
psize = (int)(blockSize - getStreamer().getBytesCurBlock()) + PacketHeader.PKT_MAX_HEADER_LEN;
} else {
psize = (int) Math
.min(blockSize - getStreamer().getBytesCurBlock(), writePacketSize);
}
computePacketChunkSize(psize, bytesPerChecksum);
} |
68292c7 to
96da2c8
Compare
Thank you very much for investing your time in fixing these bugs. The above fixes did not take |
@zhangshuyan0 Sir, Agree with you, let's discuss this issue in the new PR. The failed tests were all passed in my local. |
zhangshuyan0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1.
|
Committed to trunk. Thanks for your contribution @hfutatzhanghb . |
@zhangshuyan0 Sir, Thanks a lot for your reviewing. |
|
🎊 +1 overall
This message was automatically generated. |
…s when writing to a new block. (apache#6368). Contributed by farmmamba. Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
Description of PR
First packet size(data length + checksum length) will be set to 516 bytes when writing to a new block. It is an unnecessary and wrong behavior.
In method computePacketChunkSize, the parameters psize and csize would be (0, 512)when writting to a new block.
It should better use writePacketSize to improve the speed of sending data.