-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31327][SQL] Write Spark version into Avro file metadata #28102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #120725 has finished for PR 28102 at commit
|
external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java
Show resolved
Hide resolved
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriter.scala
Show resolved
Hide resolved
|
Thank you for pinging me, @cloud-fan . |
| this.mAvroFileWriter.create(writerSchema, outputStream); | ||
| } | ||
|
|
||
| public void write(AvroKey<T> record, NullWritable ignore) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan . Do we need to check the effect in terms of performance because this is an additional wrapper technically? Oh, never mind.
external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java
Show resolved
Hide resolved
|
Test build #120749 has finished for PR 28102 at commit
|
| class SparkAvroKeyRecordWriter<T> extends RecordWriter<AvroKey<T>, NullWritable> | ||
| implements Syncable { | ||
| private final DataFileWriter<T> mAvroFileWriter; | ||
| public SparkAvroKeyRecordWriter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix this?
/home/jenkins/workspace/SparkPullRequestBuilder@3/external/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:65: Redundant 'public' modifier.
| Map<String, String> metadata) throws IOException { | ||
| this.mAvroFileWriter = new DataFileWriter(dataModel.createDatumWriter(writerSchema)); | ||
| for (Map.Entry<String, String> entry : metadata.entrySet()) { | ||
| this.mAvroFileWriter.setMeta(entry.getKey(), entry.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it is better to check if conflicting with reserved meta by isReservedMeta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already checked in the method setMeta.
| override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { | ||
| new Path(path) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we don't change this? Keeping a blank line between methods is actually a valid style (https://github.com/databricks/scala-style-guide#blank-lines-vertical-whitespace)
|
Test build #120758 has finished for PR 28102 at commit
|
|
The last commit just adds blank lines. Thanks for the review, merging to master/3.0! |
### What changes were proposed in this pull request? Write Spark version into Avro file metadata ### Why are the changes needed? The version info is very useful for backward compatibility. This is also done in parquet/orc. ### Does this PR introduce any user-facing change? no ### How was this patch tested? new test Closes #28102 from cloud-fan/avro. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 6b1ca88) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
Test build #120769 has finished for PR 28102 at commit
|
Write Spark version into Avro file metadata The version info is very useful for backward compatibility. This is also done in parquet/orc. no new test Closes apache#28102 from cloud-fan/avro. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Write Spark version into Avro file metadata The version info is very useful for backward compatibility. This is also done in parquet/orc. no new test Closes apache#28102 from cloud-fan/avro. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? Write Spark version into Avro file metadata ### Why are the changes needed? The version info is very useful for backward compatibility. This is also done in parquet/orc. ### Does this PR introduce any user-facing change? no ### How was this patch tested? new test Closes apache#28102 from cloud-fan/avro. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Write Spark version into Avro file metadata
Why are the changes needed?
The version info is very useful for backward compatibility. This is also done in parquet/orc.
Does this PR introduce any user-facing change?
no
How was this patch tested?
new test