-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-27805][Connectors/ORC] bump orc version to 1.7.5 #19844
Conversation
@JingsongLi Could you help review this pr? |
I have submitted two pr to ORC community. I will refactor this part of the code after they are merged. |
@lirui-apache Please take a look. |
For a record, to reviewers, ORC-1198 is already shipped via Apache ORC 1.7.5 and included in this PR. |
Sure, we will take a look :) |
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/writer/OrcBulkWriterTest.java
Show resolved
Hide resolved
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/EncryptionProvider.java
Outdated
Show resolved
Hide resolved
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/writer/OrcBulkWriterTest.java
Outdated
Show resolved
Hide resolved
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/writer/OrcBulkWriterTest.java
Outdated
Show resolved
Hide resolved
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/writer/OrcBulkWriterTest.java
Outdated
Show resolved
Hide resolved
* This class is designed to not close the underlying flink stream to avoid exceptions when | ||
* checkpointing. | ||
*/ | ||
public class HadoopNoCloseStream extends FSDataOutputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit scary, @gyfora?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujiawinds could you please clarify why we need the HadoopNoCloseStream
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morhidi Because BulkWriter
needs to rely on a flink FSDataOutputStream
, and ORC writer uses hadoop FSDataOutputStream
. So I wrapped flink FSDataOutputStream
.
Additionally, BulkWriter closes the underlying ORC writer stream at checkpoint, which will cause flink to throw a ClosedChannelException
if the close action is passed to flink FSDataOutputStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Is this covered in a test case somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR contains reasonable changes, added some minor comments
3c23605
to
f03fcbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To Apache Flink community, @mbalassi , @gyfora , @morhidi and the author, @liujiawinds of this PR.
- As you see, this patch's encryption part is not a part of official Apache ORC.
- ORC-1200 is not accepted by Apache ORC community yet and not reviewed properly. So, Apache ORC community doesn't provide any backward compatibility for this encryption part and still reserves all rights to change in the future. Please consider this patch as some 3rd party approach to hack those part.
Personally, I'd like to recommend you to remove Encryption part from this PR completely.
Since the encryption is not part of the official Apache ORC, +1 for removing this from the PR |
Also, cc @williamhyun since he works as a release manager of Apache ORC 1.8.0. |
@dongjoon-hyun @MartijnVisser Encryption part has been removed from this PR. |
@@ -48,14 +49,30 @@ public class OrcBulkWriterTest { | |||
|
|||
@Test | |||
public void testOrcBulkWriter() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using @ParameterizedTest @EnumSource(CompressionKind.class) would be more elegant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Hi, could you review this once more, @mbalassi , @gyfora , @morhidi , @MartijnVisser ? |
@liujiawinds are you still working on this one? Happy to take over if its up for grabs |
@pgaref Feel free to take over this. |
Thank you, @pgaref and @liujiawinds . |
Surpassed by #22481 |
What is the purpose of the change
In order to use new features (zstd compression, column encryption etc.) in 1.6.x and 1.7.x.
Release Notes
Brief change log
PhysicalFsWriter
for files to create aPhysicalWriterImpl
for streamsPhysicalWriterImpl
.Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation