-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] support describe file #4995
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xiaoyong-z -- this looks like a good start. I would like to see the implementation for describe file be the same as describe table if possible, and I have left some comments to that effect.
CREATE external table aggregate_simple(c1 real, c2 double, c3 boolean) STORED as CSV WITH HEADER ROW LOCATION 'tests/data/aggregate_simple.csv'; | ||
|
||
query C1 | ||
DESCRIBE aggregate_simple; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test works on master -- I don't think it needs the changes in this PR
(arrow_dev) alamb@MacBook-Pro-8:~/Software/arrow-datafusion/datafusion/core$ datafusion-cli
DataFusion CLI v16.0.0
❯ CREATE external table aggregate_simple(c1 real, c2 double, c3 boolean) STORED as CSV WITH HEADER ROW LOCATION 'tests/data/aggregate_simple.csv';
0 rows in set. Query took 0.005 seconds.
❯
DESCRIBE aggregate_simple;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| c1 | Float32 | NO |
| c2 | Float64 | NO |
| c3 | Boolean | NO |
+-------------+-----------+-------------+
3 rows in set. Query took 0.002 seconds.
❯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no sqllogictests to test describe, so i add some tests here, now i unify the describe table and describe file, and i think it's necessary to add some tests to this.
3d39dfe
to
97aa48e
Compare
To know the columns/types in a file, users has to create an external table, and describe the table. Sometimes the infer schema is wrong for creating table, to make it right, user need to drop the table, and recreate a table with the specify schema. To solve this problem, we add describe file interface in datafusion-clie, With the Describe File, user can know the infer schema is wrong before creating the table. Syntax: Describe file_path, Example: DESCRIBE 'tests/data/aggregate_simple_pipe.csv'; Return: column_name data_type is_nullable c1 Float32 NO c2 Float64 NO c3 Boolean NO Signed-off-by: xyz <a997647204@gmail.com>
97aa48e
to
2344422
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me -- thank you @xiaoyong-z ❤️
Benchmark runs are scheduled for baseline = 9f498bb and contender = ab00bc1. ab00bc1 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
To know the columns/types in a file, users has to create an external table, and describe the table. Sometimes the infer schema is wrong for creating table, to make it right, user need to drop the table, and recreate a table with the specify schema. To solve this problem, we add describe file interface in datafusion-clie, With the Describe File, user can know the infer schema is wrong before creating the table.
Syntax:
Describe file_path,
Example:
DESCRIBE 'tests/data/aggregate_simple_pipe.csv';
Return:
column_name data_type is_nullable
c1 Float32 NO
c2 Float64 NO
c3 Boolean NO
Signed-off-by: xyz a997647204@gmail.com
Which issue does this PR close?
Closes #4913
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?