-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read over http(s) ? #3134
Comments
This should work in many cases today! As you say, there are a bunch of edge cases around authentication, but it definitely works: I just ran: select count(*) from read_parquet('https://huggingface.co/api/datasets/RLHFlow/Orca-distibalel-standard/parquet/default/train/0.parquet'); And it worked just fine. Our Let me know if you have specific edge cases that you'd like to address or other concerns! |
Thanks for your response. Your example ( So I tried to reproduce what I did yesterday. After trying with a local file, I wanted to check if I could load data with http, using simple http file servers. I tried with miniserve, and it didn't work. I also tried today with another very basic web server (jwebserver), same result, and finally with hfs and it worked. The difference is that it uses by default port 80🗦💡🗧. Here are my tests:
then simple test with miniserve (default port is 8080)
then miniserve running on port 80:
Using |
That seems like a quite reasonable deduction. It's probably the case that somewhere the port is getting dropped. I've done some poking around and nothing is jumping out at me in either the DF code or the GlareDB code, will continue to look.
GlareDB will unwind top-level json arrays in json source data automatically, on the theory that top-level arrays of json objects, can mean nothing else (e.g. these would be an error any other way.) Coming very shortly, GlareDB, will allow you to filter JSON data through |
Description
With duckdb, one can read csv, json or parquet files which are accessed through a web server:
ex:
See duckdb parquet .
Will you implement such a feature in glaredb ? Authentication might be tricky in some cases (I thought it was present in
datafusion
, but I must be wrong as couldn't find any reference).The text was updated successfully, but these errors were encountered: