Skip to content

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

This PR attaches codec extension to avro datasource files.

 part-00000-2d4a2c78-a62a-4f7d-a286-5572dcdefade-c000.zstandard.avro
 part-00000-74c04de5-c991-4a40-8740-8d472f4ce2ec-c000.avro
 part-00000-965d0e93-9f86-40f9-8544-d71d14cc9787-c000.xz.avro
 part-00002-965d0e93-9f86-40f9-8544-d71d14cc9787-c000.snappy.avro

Why are the changes needed?

Feature parity with parquet and orc file sources, which is useful to differentiate compression codecs of Avro files

Does this PR introduce any user-facing change?

No, this more likely belong to underlying data storage layer

How was this patch tested?

new unit tests

Was this patch authored or co-authored using generative AI tooling?

no

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Ya, this looks consistent with other formats. Thank you, @yaooqinn .

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.

@yaooqinn yaooqinn deleted the SPARK-46746 branch January 18, 2024 01:39
@yaooqinn
Copy link
Member Author

Thank you very much @dongjoon-hyun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants