-
-
Notifications
You must be signed in to change notification settings - Fork 33.4k
lib: use slice to compact Readable buffer in streams #59676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review requested:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #59676 +/- ##
========================================
Coverage 89.89% 89.89%
========================================
Files 667 667
Lines 196675 196852 +177
Branches 38612 38652 +40
========================================
+ Hits 176796 176966 +170
+ Misses 12332 12312 -20
- Partials 7547 7574 +27
🚀 New features to boost your workflow:
|
Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1726/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause a copy and an allocation. I don't see how this is an improvement. Slicing in JavaScript is not like in go, it will copy the elements, not just create a view.
@ronag Thanks for the review! 😀 My primary intention was to avoid the O(N) element shifting cost incurred by This approach is also a proven optimization pattern used in PR #59406 to solve the same issue in I see the Jenkins CI benchmark results as data that empirically validates this hypothesis in a real-world scenario. |
Why is it more efficient? |
Here is an illustration of what needs to happen in both cases:
splice does the exact same thing as slice but without the extra allocation. Unless there is some V8 magic involved that I am unaware of. I think the benchmark results of both this and the other PR is within the margin of error. |
Thank you for the detailed illustration with the code examples. It was very helpful for clarifying the logical operations. You are correct, and I agree that splice uses a smaller temporary allocation since slice must allocate for the larger, remaining portion of the array. I believe the 'V8 magic' we're discussing is the most likely explanation for the consistent benchmark improvements we see for slice in both the CI for this PR and the previous one (#59406). To isolate this performance difference more clearly, I ran a more targeted local microbenchmark with a larger array. "use strict";
function bench(label, fn) {
const t0 = performance.now();
fn();
const t1 = performance.now();
console.log(`${label}:`.padEnd(25), `${(t1 - t0).toFixed(2)} ms`);
}
function makeArray(size) {
const arr = new Array(size);
for (let i = 0; i < size; i++) {
arr[i] = i;
}
return arr;
}
const ARRAY_SIZE = 20_000_000;
const REMOVE_COUNT = 100;
const RUNS = 100;
bench("Array.prototype.splice()", () => {
for (let i = 0; i < RUNS; i++) {
const arr = makeArray(ARRAY_SIZE);
arr.splice(0, REMOVE_COUNT);
}
});
bench("Array.prototype.slice()", () => {
const arr = makeArray(ARRAY_SIZE);
for (let i = 0; i < RUNS; i++) {
const newArr = arr.slice(REMOVE_COUNT);
}
}); The results are consistently and significantly in favor of
Test Environment:
This data strongly supports the idea that the high CPU cost of shifting millions of elements for This is the core rationale for my proposal: I believe trading the one-time memory allocation for a significant gain in CPU performance is a worthwhile improvement for this hot-path in the streams implementation. |
I believe your benchmark is flawed. V8 is probably entirely optimizing away newArr and slice since it has no side effect. clk: ~3.43 GHz
cpu: Apple M3 Pro
runtime: node 24.5.0 (arm64-darwin)
benchmark avg (min … max) p75 / p99 (min … top 1%)
------------------------------------------- -------------------------------
Array.prototype.splice() 42.12 µs/iter 36.21 µs █
(24.92 µs … 1.39 ms) 192.87 µs █
( 14.66 kb … 285.54 kb) 157.71 kb ▂█▃▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Array.prototype.slice() 56.49 µs/iter 56.07 µs █ ██
(48.54 µs … 73.75 µs) 71.57 µs █▅▅██ ▅ ▅ ▅
( 77.85 b … 6.09 kb) 701.80 b █████▁▁█▁▁▁▁▁▁▁▁▁█▁▁█ import { bench, run } from 'mitata'
function makeArray(size) {
const arr = new Array(size);
for (let i = 0; i < size; i++) {
arr[i] = i;
}
return arr;
}
const ARRAY_SIZE = 20_000;
const REMOVE_COUNT = 100;
let counter = 0;
bench("Array.prototype.splice()", () => {
const arr = makeArray(ARRAY_SIZE);
counter += arr.splice(0, REMOVE_COUNT).length;
});
bench("Array.prototype.slice()", () => {
const arr = makeArray(ARRAY_SIZE);
counter += arr.slice(REMOVE_COUNT).length;
});
await run();
console.log(counter); |
Thank you for all the helpful feedback. You've helped me clear up my misunderstanding of the performance characteristics. Based on our discussion, I'll go ahead and close this PR. Thanks again! |
Replace
splice(0, idx)
withstate.buffer = ArrayPrototypeSlice(state.buffer, idx)
when compacting the Readable buffer.Rationale:
splice
Benchmark (local):