-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long JSON string causes "memory access out of bounds" error #1477
Comments
Of note -- this does not repro in the python library or duckdb binary. |
Thanks for the reproduction, I do reproduce, and at the very least this should throw an error visible to the user. I am not sure whether proper supporting any query length is in scope, and note that DuckDB MacOS CLI, at least on my machine also hangs here (they are likely two different problems, but I am not sure if SQL of any length are properly supported). |
Interesting, on my machine, both in the CLI and Python lib for 0.8 and 0.9, it completes instantly. |
It also does not seem to repro for me on duckdb-wasm using duckdb 0.8 |
I also benchmarked it using 0.9.1:
0.8.1:
0.9.1 uses about 5% more memory |
I am seeing the same error when reading query results from a specific Parquet file using DuckDB WASM 1.28.0. This bug does not show-off with v1.27.0 or v1.25.0. It does not occur with Node library or standalone version od DuckDB. Because of that (and due to structs involved in both cases), I suspect it might have the same root case as the problem reported in this issue I was trying to pinpoint what exactly triggers it in my case case but had hard time figuring it out. The file contains ~150 columns, out of which ~5 are structs. Removing one of the columns was usually enough to make it work. The said Parquet file contains sensitive data, so I cannot share it unfortunately. I tried to create a minimal reproducible example but the error goes away even with small changes to the contents |
I have the same problem with an object array with 280 entries and every object about 30 entries (columns). The generated query string for This is the error message:
I will try to make the query small to see if there is a barrier from which the error kicks in. |
Shortening the query string to around 25KB and it works as expected. |
Loading the CSV file by |
Does anyone know how to work around this problem or track it down and fix it? It looks like something fundamental, which requires knowing the source code very well. |
I'm unfortunately still on duckdb v0.8 due to this :( |
It doesn't seem to have anything to do with JSON, btw. |
Appears to kick in at length 66720, repro'ing with const q = `SELECT list_extract([${Array.from({ length: 66_688 }, () => 1).join("")}], 0) as q`; Edit: Actually the number seems to change with the query somehow, kicks in at 67424 with this one: const q = `SELECT list_distinct([${Array.from({ length: 33_697 }, () => 1).join(",")}]) as q `; Also, only seems to happen at certain numbers? Example: const works = `SELECT 1` + " ".repeat(68_600);
const doesnt = `SELECT 1` + " ".repeat(66_600); |
I'm running in to the same issue: getting a Could this be caused by exceeding the default total stack size of 64k set by Emscripten? If I understand correctly, the query string is passed as a stack allocated string. If that's the case, is there any reason to not allocate the query string on the heap and pass in a pointer? |
I believe this to be solved via #1768, where instead of (implicitly) passing a JS string as stack allocated it's now passed as heap-allocated within the Uin8Array memory space. |
I also added a test with a query of length 1 million, and it's handled fine, I think the limits here will be the limit for allocating a contiguous block of memory. |
Can you ping once #1768 is live on shell? I can sanity check the original repro. |
I think it's already on the shell, possibly you might need to reload due to caching. |
This repro just hangs now. I let it run for ~an hour. Note that when I filed the task, the old version of duckdb-wasm I was using (I believe pinned to 0.8 of duckdb main), it completed after a few seconds. |
@carlopi following up on this, given that the fix does not actually fix the repro, I would request that we reopen this task. I'm actually a bit worried with the new changes that it may hang or otherwise break cases that work (or were formerly returning errors). |
Also ran into this with the following query (v1.28) Query String
It's not a problem per se, but it feels like it should throw an error that the user can understand "Your query contains too many characters" |
Hello, do you have any visibility on this error? Do you know if a fix will be released soon? Thank you. |
@carlopi Hey! Do you think this will be fixed any time soon? |
Confirmed. I'll hold off on closing given the other related concerns. But confirmed my repro has been solved 🙏 |
@archiewood does your query now work w/ the latest duckdb-wasm? |
We haven't upgraded yet, and i don't have a minimal repro. Will be upgrading next week I think. |
What happens?
Running the SELECT query in this repro in the DuckDB WASM library causes a memory out of bounds error.
To Reproduce
Visit https://shell.duckdb.org/
Run this query
Browser/Environment:
DuckDB shell
Device:
macbook pro
DuckDB-Wasm Version:
@duckdb/duckdb-wasm@1.27.1-dev134.0
DuckDB-Wasm Deployment:
shell.duckdb.org
Full Name:
Ankur Goyal
Affiliation:
Braintrust Data
The text was updated successfully, but these errors were encountered: