Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: up to 2x times faster copy of buffers #24977

Closed

Conversation

reklatsmasters
Copy link
Contributor

%TypedArray%.prototype.set is able to copy data between ArrayBuffer a lot faster. This allows us to speed up Buffer.copy and Buffer.concat by 2x in some cases.

benchmarks

node@master vs node@pr
                                         confidence improvement accuracy (*)   (**)  (***)
 buffers/buffer-copy.js n=1024 size=10          ***    117.07 %       ±2.82% ±3.76% ±4.90%
 buffers/buffer-copy.js n=1024 size=1024        ***     94.86 %       ±3.36% ±4.47% ±5.83%
 buffers/buffer-copy.js n=1024 size=2048        ***     53.33 %       ±4.03% ±5.38% ±7.06%
 buffers/buffer-copy.js n=1024 size=4096        ***     30.17 %       ±2.49% ±3.33% ±4.36%
 buffers/buffer-copy.js n=1024 size=8192        ***     21.52 %       ±2.23% ±2.97% ±3.86%
                                                                           confidence improvement accuracy (*)    (**)   (***)
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=16 pieces=1          ***      4.52 %       ±1.73%  ±2.31%  ±3.03%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=16 pieces=16         ***    144.18 %       ±1.97%  ±2.63%  ±3.46%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=16 pieces=4          ***    104.35 %       ±6.67%  ±8.97% ±11.88%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=1 pieces=1           ***      5.51 %       ±2.37%  ±3.18%  ±4.18%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=1 pieces=16          ***    143.85 %       ±2.21%  ±2.95%  ±3.86%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=1 pieces=4           ***    101.56 %       ±6.18%  ±8.31% ±10.99%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=256 pieces=1         ***      5.50 %       ±2.97%  ±3.99%  ±5.27%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=256 pieces=16        ***     45.37 %       ±2.16%  ±2.89%  ±3.81%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=256 pieces=4         ***     66.76 %       ±1.69%  ±2.25%  ±2.94%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=16 pieces=1          ***      5.70 %       ±2.81%  ±3.78%  ±4.99%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=16 pieces=16         ***    160.34 %       ±5.80%  ±7.81% ±10.34%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=16 pieces=4          ***    116.18 %       ±5.38%  ±7.23%  ±9.55%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=1 pieces=1           ***      5.19 %       ±1.82%  ±2.44%  ±3.20%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=1 pieces=16          ***    160.41 %       ±2.37%  ±3.17%  ±4.15%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=1 pieces=4           ***    108.86 %       ±7.55% ±10.16% ±13.47%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=256 pieces=1         ***      5.46 %       ±3.00%  ±4.02%  ±5.30%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=256 pieces=16        ***     47.44 %       ±1.57%  ±2.10%  ±2.75%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=256 pieces=4         ***     70.00 %       ±1.66%  ±2.22%  ±2.89%
node@v10 vs node@v10-pr
                                         confidence improvement accuracy (*)   (**)  (***)
 buffers/buffer-copy.js n=1024 size=10          ***    187.01 %       ±5.37% ±7.21% ±9.54%
 buffers/buffer-copy.js n=1024 size=1024        ***    129.87 %       ±3.41% ±4.57% ±6.01%
 buffers/buffer-copy.js n=1024 size=2048        ***     78.64 %       ±2.57% ±3.42% ±4.45%
 buffers/buffer-copy.js n=1024 size=4096        ***     59.96 %       ±4.00% ±5.34% ±7.00%
 buffers/buffer-copy.js n=1024 size=8192        ***     75.37 %       ±1.23% ±1.64% ±2.14%
                                                                          confidence improvement accuracy (*)   (**)  (***)
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=16 pieces=1          ***     22.93 %       ±1.38% ±1.83% ±2.39%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=16 pieces=16         ***    203.86 %       ±3.10% ±4.15% ±5.45%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=16 pieces=4          ***    154.01 %       ±2.50% ±3.34% ±4.39%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=1 pieces=1           ***     26.54 %       ±1.39% ±1.86% ±2.42%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=1 pieces=16          ***    226.57 %       ±2.42% ±3.24% ±4.26%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=1 pieces=4           ***    153.34 %       ±2.78% ±3.74% ±4.94%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=256 pieces=1         ***     19.00 %       ±1.83% ±2.44% ±3.18%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=256 pieces=16        ***     72.48 %       ±1.56% ±2.08% ±2.72%
 buffers/buffer-concat.js n=1024 withTotalLength=0 pieceSize=256 pieces=4         ***     89.29 %       ±2.61% ±3.50% ±4.62%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=16 pieces=1          ***     22.45 %       ±1.68% ±2.24% ±2.92%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=16 pieces=16         ***    214.20 %       ±3.17% ±4.25% ±5.58%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=16 pieces=4          ***    155.20 %       ±3.64% ±4.88% ±6.43%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=1 pieces=1           ***     23.09 %       ±1.88% ±2.51% ±3.27%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=1 pieces=16          ***    232.81 %       ±2.64% ±3.53% ±4.63%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=1 pieces=4           ***    165.25 %       ±2.23% ±2.99% ±3.92%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=256 pieces=1         ***     21.09 %       ±1.66% ±2.22% ±2.88%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=256 pieces=16        ***     71.34 %       ±1.11% ±1.47% ±1.92%
 buffers/buffer-concat.js n=1024 withTotalLength=1 pieceSize=256 pieces=4         ***     91.33 %       ±2.32% ±3.10% ±4.07%

This PR is able to be backported to v10 branch. The main question is what to doing with error messages. Some of previous error messages are less informative.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the buffer Issues and PRs related to the buffer subsystem. label Dec 12, 2018
@BridgeAR BridgeAR added the performance Issues and PRs related to the performance of Node.js. label Dec 12, 2018
Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work! Improving the performance here always helps a lot of people.

function fastcopy(source, target, targetStart = 0, sourceStart = 0, sourceEnd) {
if (!isUint8Array(source)) {
throw new ERR_INVALID_ARG_TYPE('source', ['Buffer', 'Uint8Array'], source);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked in the concat case. Therefore I recommend to move this check into Buffer.prototype.copy. The target is similar but it's created using Buffer.allocUnsafe. This function could be manipulated and therefore it would be best to have a reference to the original function and call that as e.g.: const { allocUnsafe } = Buffer;. In that case the target check can also be moved in Buffer.prototype.copy.

lib/buffer.js Outdated Show resolved Hide resolved
lib/buffer.js Outdated
}

if (sourceEnd - sourceStart > target.byteLength - targetStart)
sourceEnd = sourceStart + target.byteLength - targetStart;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment what this does (if I read it correct, it's the overflow mechanism to search from the end instead of the beginning).

lib/buffer.js Outdated Show resolved Hide resolved
const b = Buffer.alloc(1);
a.copy(b, 0, 0x100000000, 0x100000001);
}, outOfRangeError);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because it's not obvious why the test is removed: can you elaborate why it's now obsolete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseArrayIndex() isn't called without type casting to uint32 any more.

inline MUST_USE_RESULT bool ParseArrayIndex(Local<Value> arg,

benchmark/buffers/buffer-copy.js Outdated Show resolved Hide resolved

bench.start();
for (var i = 0; i < n * 1024; i++) {
source.copy(target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to test some options as well to see how the sourceStart / sourceEnd etc. compete :)

@BridgeAR
Copy link
Member

lib/buffer.js Outdated Show resolved Hide resolved
lib/buffer.js Outdated Show resolved Hide resolved
});

function main({ n, size }) {
const source = Buffer.allocUnsafe(size);
Copy link
Member

@ChALkeR ChALkeR Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically copies from an uninitialized memory chunk below. Which could be or not be all zeroes, btw.

Perhaps use a single source, initialized with crypto (random data) or fill (repeated string)?

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some possible improvements.

lib/buffer.js Outdated
);

if (sourceStart > 0 || to_copy < source.byteLength) {
source = source.subarray(sourceStart, sourceStart + to_copy);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check if .subarray() is slower than Buffer.prototype.slice() (not TypedArray.prototype.slice()). They do the same thing but according to https://crbug.com/v8/7161 (cf. #17431) .subarray is slower.

You might need to make sure that source is a Buffer object to get the correct slice function, by doing Buffer.from(source.buffer, source.byteOffset, source.byteLength).

@@ -488,6 +488,59 @@ Buffer.concat = function concat(list, length) {
return buffer;
};

function fastcopy(source, target, targetStart = 0, sourceStart = 0, sourceEnd) {
if (!isUint8Array(source)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we allowed any TypedArray (e.g. Uint16Array) as well as DataView as the source and target. Please also make sure to add tests for these cases.

@reklatsmasters
Copy link
Contributor Author

Sadly, i found bottleneck. Creating of %TypedArray% is realy slow operation. When i return _copy() call in common case, it doesn't help (up to 20% slower).

                                                       confidence improvement accuracy (*)    (**)   (***)
 buffers/buffer-copy.js n=1024 targetStart=0 size=10           ***    127.08 %       ±4.84%  ±6.51%  ±8.61%
 buffers/buffer-copy.js n=1024 targetStart=0 size=1024         ***    100.60 %       ±3.48%  ±4.66%  ±6.11%
 buffers/buffer-copy.js n=1024 targetStart=0 size=2048         ***     60.33 %       ±2.43%  ±3.24%  ±4.23%
 buffers/buffer-copy.js n=1024 targetStart=0 size=65536        ***     12.37 %       ±4.31%  ±5.74%  ±7.48%
 buffers/buffer-copy.js n=1024 targetStart=0 size=8192         ***     77.96 %       ±8.61% ±11.46% ±14.93%
 buffers/buffer-copy.js n=1024 targetStart=1 size=10           ***    -22.75 %       ±1.42%  ±1.89%  ±2.47%
 buffers/buffer-copy.js n=1024 targetStart=1 size=1024         ***    -16.36 %       ±1.42%  ±1.88%  ±2.45%
 buffers/buffer-copy.js n=1024 targetStart=1 size=2048         ***    -16.44 %       ±1.42%  ±1.89%  ±2.47%
 buffers/buffer-copy.js n=1024 targetStart=1 size=65536                -0.20 %       ±4.35%  ±5.79%  ±7.54%
 buffers/buffer-copy.js n=1024 targetStart=1 size=8192                  5.47 %       ±5.64%  ±7.59% ±10.05%

I have only one idea: use fastcopy() only for Buffer.concat() and Buffer.from().

@BridgeAR
Copy link
Member

BridgeAR commented Jan 9, 2019

I had another look at this PR and there's something very weird going on with the benchmark: depending on the number of iterations the numbers either go up (huge n, e.g. 2 ** 20) or down (smaller n e.g., 2 ** 15).

I guess we get a highly specialised function from V8 with those big numbers. But that's not really a likely thing a user would do.

@reklatsmasters instead of using .subarray or _clone you could use Buffer.prototype.slice as @TimothyGu suggested. In this specific case you could also prevent some further checks and just directly use newSource = new Uint8Array(source.buffer, sourceStart, sourceEnd). Using that would increase the targetStart = 1 numbers. However, as @TimothyGu also pointed out the Buffer.prototype.clone function currently accepts different input types and that would limit it to Uint8Array (which is the documented state and we should change that).

@lundibundi
Copy link
Member

@reklatsmasters ping, will you have time to work on this?

@reklatsmasters
Copy link
Contributor Author

@lundibundi This pr should be closed, already implemented in #29066.

@lundibundi lundibundi closed this Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants