Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement select * from 'foo.parquet' support #125

Open
Tracked by #150
alamb opened this issue Sep 9, 2024 · 11 comments · May be fixed by #152
Open
Tracked by #150

Implement select * from 'foo.parquet' support #125

alamb opened this issue Sep 9, 2024 · 11 comments · May be fixed by #152

Comments

@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

I want queries like this to work:

select * from '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet

This works great in datafusion-cli:

datafusion-cli -c "select * from '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet' limit 10"
DataFusion CLI v41.0.0
+---------------------+------------+---------------------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+--------------------+--------------+----+-----------+------------------------------------------------+--------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+-------------+------------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+----------------------+------+
| WatchID             | JavaEnable | Title                                                                                       | GoodEvent | EventTime  | EventDate | CounterID | ClientIP   | RegionID | UserID             | CounterClass | OS | UserAgent | URL                                            | Referer                              | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID        | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP   | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash          | URLHash              | CLID |
+---------------------+------------+---------------------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+--------------------+--------------+----+-----------+------------------------------------------------+--------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+-------------+------------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+----------------------+------+
| 8993561131065960301 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373344372 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373331008      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 551752182  | 0            | 0       | 0           | 0             | 0        | 5        | 1373318120     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 6635318252456702850 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373344834 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | 4�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373331459      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 490305020  | 0            | 0       | 0           | 0             | 0        | 5        | 1373318410     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 6072156798303264550 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373345083 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373331642      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 261581801  | 0            | 0       | 0           | 0             | 0        | 5        | 1373318651     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 6676230879155791167 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373345868 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373332322      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 124030311  | 0            | 0       | 0           | 0             | 0        | 5        | 1373319379     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 5817482884949783786 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346325 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373332643      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 70501440   | 0            | 0       | 0           | 0             | 0        | 5        | 1373319704     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | h1             |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | -4713643203787022327 | 0    |
| 9086578174639119203 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346328 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznodolskiy.irr.ru/r/lyrics/snastree | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373332646      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 103556130  | 0            | 0       | 0           | 0             | 0        | 5        | 1373319706     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | 5420047383373475333  | 0    |
| 8538774668558945189 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346894 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-temno-bolshaybachkala | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | 4�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373333201      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 489021314  | 0            | 0       | 0           | 0             | 0        | 5        | 1373320366     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | 5530112846577889149  | -4713643203787022327 | 0    |
| 6450996617047587792 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373346895 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznodolskiy.irr.ru/r/lyrics/snastree | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373333202      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 1062047245 | 0            | 0       | 0           | 0             | 0        | 5        | 1373320367     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | 5420047383373475333  | 0    |
| 8447568139699209362 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373347543 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznoe/glovodsk                       | http://platjie-temno-bolshaybachkala | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373334118      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 916502768  | 0            | 0       | 0           | 0             | 0        | 5        | 1373321060     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | �
                                                                                                                                                   |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | 5530112846577889149  | -4713643203787022327 | 0    |
| 8718490359481193433 | 1          | Djunglish.RU > Приментиролиндромантинги, восстить Встрации... Керам The Long ATIV Variayaki | 1         | 1373347546 | 15895     | 178321    | 1896125518 | 169      | 134227586768255270 | 1            | 44 | 5         | http://poleznodolskiy.irr.ru/r/lyrics/snastree | http://platjie-showCamp&cid=5        | 0         | 13845             | 20              | 10291         | 22          | 1368            | 554              | 37              | 15         | 7          | 700         | 0        | 0        | 22             | D�             | 1            | 1                | 0        | 0           |                  |        | 1073011     | -1             | 0              |              | 0           | 0           | 1750              | 580                | 243            | 1373334120      | 4                   | 1                   | 16561               | 0                   | windows     | 1           | 0      | 0          | 0           | 0       |             | 506841884  | 0            | 0       | 0           | 0             | 0        | 5        | 1373321062     | 0   | 0   | 0      | 0         | 0         | 2073984325 | -1         | -1         | -1            | S0              | h1             |               |              | 0         | 0          | 0         | 0             | 0                   | 0                 | 0           | 0                     |                  | 0          |              | NH           | 0               |                     |                    |              |                  |           |           |             |            |         |         | 0        | -4283410856210349284 | 5420047383373475333  | 0    |
+---------------------+------------+---------------------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+--------------------+--------------+----+-----------+------------------------------------------------+--------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+-------------+-------------+--------+------------+-------------+---------+-------------+------------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+----------------------+------+
10 row(s) fetched.
Elapsed 0.239 seconds.

It currently doesn't in dft

$ dft -c "select * from '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet'"
Error: Error during planning: table 'datafusion.public./Users/andrewlamb/Software/datafusion/benchmarks/data/hits.parquet' not found

Location:
    /Users/andrewlamb/Software/datafusion-tui/src/app/execution.rs:127:20

Once datafusion 42.0.0 is released, we can likely use the DynamicFileProvider that @goldmedal added in apache/datafusion#11035

@matthewmturner
Copy link
Collaborator

It would be really cool if you could inspect parquet metadata with this

@devanbenz
Copy link

This seems like a really fun thing to work on. If no one else is working on it I'd love to take it on. I really like the metadata idea.

@matthewmturner
Copy link
Collaborator

Contributions very welcome :)

IMO it would be best to first integrate the DynamicFileProvider in one PR and then do a follow on for metadata.

I also just realized that their may be some existing metadata functionality for datafusion although its not clear to me from the docs whether that is only in datafusion-cli or a built-in function to datafusion that we could also use. If it was specific to datafusion cli then it would be great if we could add that.

@matthewmturner
Copy link
Collaborator

Also you could potentially get some inspiration for additional metadata capabilities from duckdb

@alamb
Copy link
Contributor Author

alamb commented Sep 16, 2024

BTW we can take inspiration from / copy outright the parquet_metadata table function for parquet from datafusion-cli (also modeled on duckdb):
https://datafusion.apache.org/user-guide/cli/usage.html#parquet-metadata

I would like to suggest creating those functions in their own crate (perhaps datafusion-functions-parquet?) -- it could be in the datafusion-dft repo initially for convenience, but I think eventually the goal should be that dft just be focused on integration rather than actually implementing such features.

In fact maybe once dft gets good enough we could remove the parquet_metadata function from datafusion-cli entirely 🤔

@alamb
Copy link
Contributor Author

alamb commented Sep 16, 2024

I also just realized that their may be some existing metadata functionality for datafusion although its not clear to me from the docs whether that is only in datafusion-cli or a built-in function to datafusion that we could also use. If it was specific to datafusion cli then it would be great if we could add that.

Sorry I missed this -- it is only in datafusion-cli

Implementation is here: https://github.com/apache/datafusion/blob/257e1409eca81cfff024ecc5e2567e9f67e6b5a3/datafusion-cli/src/functions.rs#L317-L459

@alamb
Copy link
Contributor Author

alamb commented Sep 16, 2024

I suggest we file a second ticket for implementing parquet_metadata and other duckdb metadata functions

@matthewmturner
Copy link
Collaborator

i agree with putting it in its own crate. like @alamb said i also think that dft could be used as an incubator of sorts. For example I have taken that approach in my WASM function factory PR. I have no intention of keeping that in this repo but its quite convenient for the time while it matures.

@devanbenz
Copy link

i agree with putting it in its own crate. like @alamb said i also think that dft could be used as an incubator of sorts. For example I have taken that approach in my WASM function factory PR. I have no intention of keeping that in this repo but its quite convenient for the time while it matures.

Sounds good -- I'll go ahead and get these both assigned to myself and then start cracking on it in the next few days :)

@devanbenz
Copy link

devanbenz commented Sep 16, 2024

take

edit: looks like github actions is not set up to auto-assign like datafusion 😅

@alamb
Copy link
Contributor Author

alamb commented Sep 17, 2024

take

edit: looks like github actions is not set up to auto-assign like datafusion 😅

How about you take #148 and I'll try this one? DataFusion 42 was just released and I very much want this particular feature in dft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants