-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while reading again a parquet file after browser reload #1658
Comments
Hi @ericemc3, can you share the result of I am aware of a problem with threads, but on the browser I tested it was NOT enable by default, but your setup is somewhat different. IFF the result contains IFF the result is wasm_eh, this is a new problem, and I will later look at that. |
(and thanks @szarnyasg, this looks like a duckdb/duckdb-wasm specific problem) |
Hi @carlopi, and the same issue with https://shell.duckdb.org/?bundle=eh Please note that it works fine with Firefox (with the same user_agent displayed). So it looks like a Chrome/Edge specific issue as well. |
The same thing can be duplicated in DuckDB-WASM v1.28.0 (DuckDB v0.9.1). I thought this was an issue in my app and was still troubleshooting as a low priority, so I haven't reported an issue yet. I just saw this pop up so thought I'd share we see this in v1.28.0 as well. I have worked around it for the moment by fetching the files outside of DuckDB doing |
I've been experiencing this same issue while using Evidence (@evidence-dev/evidence), which uses @duckdb/duckdb-wasm as a dependency. Seems to be affecting Windows, and only Edge/Chrome. Page may load initially or error out, displaying Affecting 1.28.0 (the Evidence dependency version) and the latest build 1.28.1-dev190.0. |
im running into this quite often with evidence. is there a workaround form duckdb-wasm side? |
What is the value of the 'Content-Type' header for the parquet file being fetched ? To get that, even just opening browser console, select network, instruct duckdb-wasm to fetch the relevant resource, select the relevant row clicking on it, then there is a Headers tab, look for 'Content-Type'. If that is 'text/plain', it might be connected to this problem: #1580 that is connected to a problem in the spec + implementation of it by Web-browsers. Independently of the first question question, couple of other ones. |
hi @carlopi content-type is application/octet-stream when I start evidence and it starts pulling in parquet. I do not think its related to that issue given that that bug is firefox specific. this one is the other way around, being solely chrome and edge specific. I dont think I ever managed to get it in firefox. yes I control the server, you can go to reports.coreflowbased.eu and register a free account to test it out. thats my website which from time to time throws this error. if you get a database timeout simply refresh the site thats a different bug |
That bug is 2 bugs in one, one in Chrome and one in Firefox. But that's not so relevant. I have no idea how to reproduce within the setting of your website, if you can give some instructions, I might be able to give it a try tomorrow. |
@carlopi Content-Type for the Parquet files is Here's where the exception occurs. I tried looking up the status code |
Can also confirm, the application works perfectly in Firefox on the same system (Windows 10), so it's confined to Edge and Chrome on Windows. Seems to be the exact same issue as the one @ericemc3 has described in the DuckDB Shell. Edit: I've mitigated the issue on my application by disabling caching through the 'Cache-Control' header for parquet files. Error does not appear in Chrome/Edge with caching on Parquet files off. |
This emerged while investigating duckdb#1658, but reproduction is not deterministic so hard to say
I managed to reproduce on a Windows machine, I am very puzzled by this bug given it's non deterministic. |
This emerged while investigating #1658, but reproduction is not deterministic so hard to say
At the moment I am unable to reproduce this (on shell.duckdb.org, currently at @duckdb/duckdb-wasm@1.28.1-dev194.0), but hard to say whether this is properly fixed in all cases. |
Hi @carlopi, |
@carlopi thanks for looking into this. I am also able to reproduce this on shell.duckdb.org This does not occur on chrome on mac.
|
Hi all, we have developed a new duckdb-wasm version that allows to explicitly set whether to trust Content-Length informations from HEAD requests. I have a hard time reproducing this on my setup, it would be amazing if anyone could run the original issue in 2 additional modes:
and
Changing the setting for This should move away from the behaviour that here was problematic. I am experimenting with what behaviour should be set by default, input on whether this helps with this particular problem would be handy. |
Hi,
executed twice, OK. Then i reload the page, paste the same request again and get: Is there another wasm version to test? |
@ericemc3:
Link that could possiblywork: https://shell.duckdb.org/#queries=v0,SET-reliable_head_requests-%3D-false~,FROM-'https%3A%2F%2Fstatic.data.gouv.fr%2Fresources%2Fcommunes%202023%20format%20parquet%2F20240122%20085355%2Fcommunes2023.parquet'-%0ASELECT-codgeo-WHERE-epci-%3D-'200039865'-~ |
yes i tried both with the same outcome: |
I've been investigating an issue with ObservableHQ that relies on DuckDB that seems to have the same behaviour as listed in this issue. It does appear to be caching related. Is there any further information as to what might be causing this issue? |
Hi, same issue, +1 ! |
I've posted workaround over on the related Observable Framework issue already, but maybe this will help someone here as well until the issue is fixed. Obviously there are downsides, but I have been able to work around the issue by cache busting on any parquet link if a Windows device is detected. See example here. |
@bjyberg many thanks, it works very well ! hope future version will fix this... |
@carlopi Has there been any updates on this issue recently? I've been continuing to investigate where I can, even though we have a temporary fix through the caching flags. Thank you! |
@timothyhoward I'm experiencing this too, and it's become a late-in-the-game blocker for a dashboard I'm preparing for public release (all the early testers were Mac users and they had no problems). Since there doesn't seem to be a fix yet, I wonder if your caching flags approach is working smoothly? I can't tell from the previous posts whether it eliminates the problem entirely or if users have to reload once to get it to work. Any info and details you can provide about this would be great to hear, since it seems to be the only option at the moment. |
Upgrading tot he latest version of evidence fixes this as evidence sends the correct header
Get BlueMail for Android
…On 27 Sept 2024, 20:14, at 20:14, Elaine ***@***.***> wrote:
@timothyhoward I'm experiencing this too, and it's become a
late-in-the-game blocker for a dashboard I'm preparing for public
release (all the early testers were Mac users and they had no
problems). Since there doesn't seem to be a fix yet, I wonder if your
caching flags approach is working smoothly? I can't tell from the
previous posts whether it eliminates the problem entirely or if users
have to reload once to get it to work. Any info and details you can
provide about this would be great to hear, since it seems to be the
only option at the moment.
--
Reply to this email directly or view it on GitHub:
#1658 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
|
The more I contemplate this issue, the harder it is for me to understand why it isn't a problem with pretty much every use of DuckDB-Wasm that queries a file more than once. Does this just never work for Windows users on Chrome or Edge? The statement on the DuckDB-Wasm main page "Duckdb-Wasm speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js." certainly implies that this would have been tested. What am I missing? Does it impact a narrower context than I realize, and if so, what is that? |
@fboerman Unfortunately the Evidence patch only reimplements the original caching fix I suggested earlier, rather than rectifying the core of the issue which resides in duckdb-wasm or earlier in the line. The new fix has also caused some unresolved issues with excessive bandwidth use, so hoping to get to the core of issue so the patch can be removed. |
What happens?
Executing twice the same request, after reloading the shell page, yields an error.
To Reproduce
in https://shell.duckdb.org/, execute :
then reload the browser and execute that same query again.
On windows and with Chrome or Edge, i get:
Invalid Error: TProtocolException: Invalid data
codgeo
column, which is also the first of the dataset, seems to be responsible.With Firefox, no issue.
OS:
Win11
DuckDB Version:
10.0.0
DuckDB Client:
shell wasm1.28.1-dev159.0
Full Name:
eric mauviere
Affiliation:
icem7
Have you tried this on the latest nightly build?
I have tested with a nightly build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: