-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String and JSON size limits #2570
Comments
i dont remember if this was bun or not but i swear i hit a string length limit at 6gb. could be wrong. but good catch on json.parse |
The problem is here:
The current way this works is very inefficient. Instead, we should implement a direct UTF-8 -> JS version of JSON.parse which skips allocating the temporary string. This would also enable larger JSON sizes. @lemire's simdjson would be the perfect tool for this, and if we did this on the |
Yeah |
Looks like using SIMDJSON is faster when the input is all primitives, but the cost of creating identifiers and converting strings means its slower than using native JSON.parse for objects with keys or strings longer than 1 character. I tried both the on-demand implementation and non-ondemand implementation. This uses the same atom string cache optimization that JSON.parse uses. benchmark time (avg) (min … max) p75 p99 p995
------------------------------------------------------------------------------ -----------------------------
• small object
------------------------------------------------------------------------------ -----------------------------
JSON.parse 656.38 ns/iter (621.56 ns … 788.83 ns) 665.85 ns 788.83 ns 788.83 ns
JSON.parse (SIMDJSON on-demand buffer) 743.61 ns/iter (720.6 ns … 833.83 ns) 745.12 ns 833.83 ns 833.83 ns
summary for small object
JSON.parse
1.13x faster than JSON.parse (SIMDJSON on-demand buffer)
• Array(4096) of true
------------------------------------------------------------------------------ -----------------------------
JSON.parse 45.42 µs/iter (42.79 µs … 1.91 ms) 44.79 µs 52.71 µs 56.54 µs
JSON.parse (SIMDJSON on-demand buffer) 38.65 µs/iter (35.33 µs … 1.44 ms) 38.58 µs 45.38 µs 50.17 µs
summary for Array(4096) of true
JSON.parse (SIMDJSON on-demand buffer)
1.18x faster than JSON.parse
• Array(4096) of 1234.567
------------------------------------------------------------------------------ -----------------------------
JSON.parse 100.79 µs/iter (96.42 µs … 962.79 µs) 100.08 µs 111.5 µs 115.38 µs
JSON.parse (SIMDJSON on-demand buffer) 62.12 µs/iter (58.13 µs … 751.96 µs) 62.75 µs 71.21 µs 75.96 µs
summary for Array(4096) of 1234.567
JSON.parse (SIMDJSON on-demand buffer)
1.62x faster than JSON.parse
• Array(4096) of 'hello'
------------------------------------------------------------------------------ -----------------------------
JSON.parse 142.44 µs/iter (132.75 µs … 1.38 ms) 141.33 µs 159.42 µs 169.54 µs
JSON.parse (SIMDJSON on-demand buffer) 196.67 µs/iter (130.54 µs … 1.9 ms) 203.5 µs 234.5 µs 407.46 µs
summary for Array(4096) of 'hello'
JSON.parse
1.38x faster than JSON.parse (SIMDJSON on-demand buffer)
• Array(4096) of 'hello'.repeat(1024)
------------------------------------------------------------------------------ -----------------------------
JSON.parse 9.8 ms/iter (9.07 ms … 11.26 ms) 10.19 ms 11.26 ms 11.26 ms
JSON.parse (SIMDJSON on-demand buffer) 6.39 ms/iter (5.9 ms … 9 ms) 6.74 ms 9 ms 9 ms
summary for Array(4096) of 'hello'.repeat(1024)
JSON.parse (SIMDJSON on-demand buffer)
1.53x faster than JSON.parse
• Array(4096) of {a: 123, b: 456}
------------------------------------------------------------------------------ -----------------------------
JSON.parse 310.68 µs/iter (297.96 µs … 1.14 ms) 308.25 µs 386.33 µs 752.25 µs
JSON.parse (SIMDJSON on-demand buffer) 413.16 µs/iter (398.67 µs … 1.13 ms) 411.88 µs 474.38 µs 717.29 µs
summary for Array(4096) of {a: 123, b: 456}
JSON.parse
1.33x faster than JSON.parse (SIMDJSON on-demand buffer) Benchmark:import { bench, group, run } from "mitata";
function load(obj) {
const asStr = JSON.stringify(obj);
const buffer = Buffer.from(asStr);
bench("JSON.parse", () => {
return JSON.parse(asStr);
});
bench("JSON.parse (SIMDJSON on-demand buffer)", () => {
return buffer.json();
});
}
group("small object", () => {
var obj = {
a: 1,
b: 2,
c: null,
false: false,
true: true,
null: null,
foo: "bar",
arr: [1, 2, 3],
h: {
a: 1,
},
i: {
a: 1,
},
j: {},
// 100 more keys
k: {},
};
load(obj);
});
group("Array(4096) of true", () => {
var obj = Array(4096);
obj.length = 4096;
obj.fill(true);
load(obj);
});
group("Array(4096) of 1234.567", () => {
var obj = Array(4096);
obj.length = 4096;
obj.fill(1234.567);
load(obj);
});
group("Array(4096) of 'hello'", () => {
var obj = Array(4096);
obj.length = 4096;
obj.fill("hello");
load(obj);
});
group("Array(4096) of 'hello'.repeat(1024)", () => {
var obj = Array(4096);
obj.length = 4096;
obj.fill("hello".repeat(1024));
load(obj);
});
group("Array(4096) of {a: 123, b: 456}", () => {
var obj = Array(4096);
obj.length = 4096;
obj.fill({ a: 123, b: 456 });
load(obj);
});
run(); Code: 84a9fac |
Ping me if you think I can help. |
This would be great feature, manipulating big files (especially json) with node.js can get quite tricky. |
What is the problem this feature would solve?
Both Node.js and Deno have explicit string length limit of 512 MiB. Trying to load bigger string leads to an error:
This limitation comes from V8 internals. It makes it annoying to use JS for data analysis, where you often want to load big JSON very briefly and reduce it to sanely-sized value. I then reach for Python, as it doesn't a similar limit. Or use libs like
big-json
which sadly are slow and inconvenient to use.From what I noticed,
bun
doesn't have any limit forreadFile{,Sync}
(it can load multi-GB strings just fine), but it'sJSON.parse
silently truncates input around 2GiB.What is the feature you are proposing to solve the problem?
What alternatives have you considered?
No response
The text was updated successfully, but these errors were encountered: