Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect invalid (unsupported) compression types when parsing #4637

Merged
merged 5 commits into from
Dec 17, 2022

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Dec 15, 2022

Which issue does this PR close?

Closes #4636.

Rationale for this change

Before the change:

❯ create external table t1 stored as csv with header row compression type zzz location "../datafusion/core/tests/example.csv";
Execution("Unknown FileCompressionType ZZZ")

After the change:

❯ create external table t1 stored as csv with header row compression type zzz location "../datafusion/core/tests/example.csv";  🤔 Invalid statement: sql parser error: Unsupported file compression type ZZZ

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as draft December 15, 2022 07:58
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner labels Dec 15, 2022
Comment on lines +89 to +101
/// Gzip-ed file
pub const GZIP: Self = Self { variant: GZIP };

/// Bzip2-ed file
pub const BZIP2: Self = Self { variant: BZIP2 };

/// Xz-ed file (liblzma)
pub const XZ: Self = Self { variant: XZ };

/// Uncompressed file
pub const UNCOMPRESSED: Self = Self {
variant: UNCOMPRESSED,
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are a little tricky, but can minimize the code change.

@@ -66,7 +63,7 @@ pub struct CreateExternalTable {
/// Option to not error if table already exists
pub if_not_exists: bool,
/// File compression type (GZIP, BZIP2, XZ)
pub file_compression_type: String,
pub file_compression_type: CompressionTypeVariant,
Copy link
Contributor Author

@HaoYang670 HaoYang670 Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strengthening the type can let us find errors earlier.

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 marked this pull request as ready for review December 16, 2022 03:44
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @HaoYang670

The clippy CI check has failed on this PR, but I think that is because a new version of Rust was released -- if you merge from main it should be solved I think

@@ -1487,7 +1488,7 @@ pub struct CreateExternalTable {
/// SQL used to create the table, if available
pub definition: Option<String>,
/// File compression type (GZIP, BZIP2, XZ)
pub file_compression_type: String,
pub file_compression_type: CompressionTypeVariant,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let sql = "CREATE EXTERNAL TABLE t STORED AS CSV COMPRESSION TYPE ZZZ LOCATION 'blahblah'";
expect_parse_error(
sql,
"sql parser error: Unsupported file compression type ZZZ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Dec 17, 2022

I took the liberty of merging up from master to resolve another conflict and plan to merge this PR when CI passes

@HaoYang670
Copy link
Contributor Author

Thanks @alamb !

@alamb alamb merged commit c2f199a into apache:master Dec 17, 2022
@HaoYang670 HaoYang670 deleted the 4636_file_compression_type branch December 17, 2022 11:58
@ursabot
Copy link

ursabot commented Dec 17, 2022

Benchmark runs are scheduled for baseline = 067d044 and contender = c2f199a. c2f199a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Returns error too late when parsing invalid file compression type.
3 participants