-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding logic to detect whether range requests are supported via ContentRange #1717
adding logic to detect whether range requests are supported via ContentRange #1717
Conversation
Thanks for looking this over! I just did some testing, and this new logic does not work with my data. Here are the details: Here is the query I am testing with (pre-signed url is valid for ~11h):
|
Thanks for the feedback, and providing the means for testing in a easier way. I am trying to check whether removing the line: - if (((contentLength !== null) && (+contentLength > 1)) || file.dataProtocol == DuckDBDataProtocol.S3) {
+ { I will do some testing on your testcase. |
Yes I think removing that conditional logic could work, although I really don't know much about the S3 protocol and how it is applies here. I think this issue could be generalized as 'detect support for range requests when HEAD isn't allowed' |
…EAD requests are not permitted
138c187
to
21d69b9
Compare
@carstonhernke: I changed the code so that now actually compiles, and I think now it should do as intended, can you possibly double check? The main problem here is that we might be regressing in some cases (say due to punitive performance of downloading the whole file, that might happen by mistake here), but for those you will have to turn off allowFullHttpReads. |
Yes, just tested and it behaves as expected! (uses range requests) |
@carstonhernke: actually there is a problem: Content-Range is an unsafe header, and as such it will not be issued in Web embeddings of duckdb-wasm. Possibly the logic might still make sense for node, but at the very least there should be added a setting, defaulting to false, on whether to use Content-Range or not. |
To test this, try to visit https://shell.duckdb.org/?bundle=eh/#queries=v0,LOAD-spatial~%0A%0ACREATE-TABLE-nyc-AS%0A----SELECT%0A--------borough%2C%0A--------st_union_agg(geom)-AS-full_geom%2C%0A--------st_area(full_geom)-AS-area%2C%0A--------st_centroid(full_geom)-AS-centroid%2C%0A--------count(*)-AS-count-FROM%0Ast_read('https%3A%2F%2Fraw.githubusercontent.com%2Fduckdb%2Fduckdb_spatial%2Fmain%2Ftest%2Fdata%2Fnyc_taxi%2Ftaxi_zones%2Ftaxi_zones.shp')%0AGROUP-BY-borough~%0ASELECT-borough%2C-area%2C-centroid%3A%3AVARCHAR%2C-count%0AFROM-nyc~ and check the errors in the developer console. Argh, this seems to be true on Chrome, I guess since it's not CORS safelisted and there the CORS headers are enforced. I might need to do more checks |
I think this depends on the CORS configuration/ headers of the remote file which is being requested. Because If the header is not available, but the server only returns a single byte, then I suppose we need another request to get the whole file. So in a case where this header is not available 90% of the time, I think you're right that it should be an optional setting to avoid adding unnecessary overhead. |
Mostly by @carstonhernke, minor refactor to unify with follow-up logic.
Fixes #1367, and closes #1702 since it implements the same functionality.