Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Describe FILE in datafusion-cli #4913

Closed
xiaoyong-z opened this issue Jan 15, 2023 · 3 comments · Fixed by #4995
Closed

Support Describe FILE in datafusion-cli #4913

xiaoyong-z opened this issue Jan 15, 2023 · 3 comments · Fixed by #4995
Labels
enhancement New feature or request

Comments

@xiaoyong-z
Copy link
Contributor

xiaoyong-z commented Jan 15, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently, dafausion support 'create external table xxx stored as xxx location FILE and select * from FILE.
I plan to add support for describe file, because

  1. To know the columns/types in a file, users has to create an external table, and describe the table.
  2. sometimes the infer schema is wrong for creating table, to make it right, user need to drop the table, and recreate a table with the specify schema. With the Describe File, user can know the infer schema is wrong before creating the table.

Describe the solution you'd like
Add support for ' DESCRIBE 'test.parquet' ' in datafusion;

Describe alternatives you've considered

Additional context

DuckDB currently supports using the following sql to get the infer type from a parquet file
'DESCRIBE SELECT * FROM 'test.parquet''

@xiaoyong-z xiaoyong-z added the enhancement New feature or request label Jan 15, 2023
@xiaoyong-z
Copy link
Contributor Author

Is this feature planned? @alamb
If no one works on this, i will contribute to this issue

@xiaoyong-z xiaoyong-z changed the title Support describe FILE in datafusion-cli Support Describe FILE in datafusion-cli Jan 15, 2023
@alamb
Copy link
Contributor

alamb commented Jan 15, 2023

Hi @xiaoyong-z -- I don't know of anyone working on this

Note datafusion already supports describe and selecting from parquet files (in datafusion-cli -- #4850 tracks supporting it more generally and I think that @matthewwillian said he was looking into)

I actually would have expected describe <foo.parquet> to work as it should simply be working using the existing table provider machinery. However, when I tried it:

cargo run datafusion-cli

It did not:

❯ describe '/Users/alamb/Software/duckdb-polr/tools/juliapkg/data/invoice.parquet';
0 rows in set. Query took 0.002 seconds.
❯ select * from  '/Users/alamb/Software/duckdb-polr/tools/juliapkg/data/invoice.parquet' limit 10;
+-----------+------------+-----------------------------------------------+-------------------------+-------------+--------------+----------------+-------------------+-------+
| InvoiceId | CustomerId | InvoiceDate                                   | BillingAddress          | BillingCity | BillingState | BillingCountry | BillingPostalCode | Total |
+-----------+------------+-----------------------------------------------+-------------------------+-------------+--------------+----------------+-------------------+-------+
| 1         | 2          | 2007-01-01T00:00:00 (Unknown Time Zone 'UTC') | Theodor-Heuss-Straße 34 | Stuttgart   |              | Germany        | 70174             | 1.98  |
| 2         | 4          | 2007-01-02T00:00:00 (Unknown Time Zone 'UTC') | Ullevålsveien 14        | Oslo        |              | Norway         | 0171              | 3.95  |
| 3         | 8          | 2007-01-03T00:00:00 (Unknown Time Zone 'UTC') | Grétrystraat 63         | Brussels    |              | Belgium        | 1000              | 5.94  |
| 4         | 14         | 2007-01-06T00:00:00 (Unknown Time Zone 'UTC') | 8210 111 ST NW          | Edmonton    | AB           | Canada         | T6G 2C7           | 8.91  |
| 5         | 23         | 2007-01-11T00:00:00 (Unknown Time Zone 'UTC') | 69 Salem Street         | Boston      | MA           | USA            | 2113              | 13.85 |
| 6         | 37         | 2007-01-19T00:00:00 (Unknown Time Zone 'UTC') | Berger Straße 10        | Frankfurt   |              | Germany        | 60316             | 0.99  |
| 7         | 38         | 2007-02-01T00:00:00 (Unknown Time Zone 'UTC') | Barbarossastraße 19     | Berlin      |              | Germany        | 10779             | 1.98  |
| 8         | 40         | 2007-02-01T00:00:00 (Unknown Time Zone 'UTC') | 8, Rue Hanovre          | Paris       |              | France         | 75002             | 1.98  |
| 9         | 42         | 2007-02-02T00:00:00 (Unknown Time Zone 'UTC') | 9, Place Louis Barthou  | Bordeaux    |              | France         | 33000             | 3.95  |
| 10        | 46         | 2007-02-03T00:00:00 (Unknown Time Zone 'UTC') | 3 Chatham Street        | Dublin      | Dublin       | Ireland        |                   | 5.94  |
+-----------+------------+-----------------------------------------------+-------------------------+-------------+--------------+----------------+-------------------+-------+
10 rows in set. Query took 0.003 seconds.

@alamb
Copy link
Contributor

alamb commented Jan 15, 2023

I bet you could implement this fairly simply by special casing such a descirbe prediate (I think now it runs a query against the schema information table0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants