-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: up to 2x times faster copy of buffers #24977
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work! Improving the performance here always helps a lot of people.
function fastcopy(source, target, targetStart = 0, sourceStart = 0, sourceEnd) { | ||
if (!isUint8Array(source)) { | ||
throw new ERR_INVALID_ARG_TYPE('source', ['Buffer', 'Uint8Array'], source); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked in the concat
case. Therefore I recommend to move this check into Buffer.prototype.copy
. The target
is similar but it's created using Buffer.allocUnsafe
. This function could be manipulated and therefore it would be best to have a reference to the original function and call that as e.g.: const { allocUnsafe } = Buffer;
. In that case the target
check can also be moved in Buffer.prototype.copy
.
lib/buffer.js
Outdated
} | ||
|
||
if (sourceEnd - sourceStart > target.byteLength - targetStart) | ||
sourceEnd = sourceStart + target.byteLength - targetStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment what this does (if I read it correct, it's the overflow mechanism to search from the end instead of the beginning).
const b = Buffer.alloc(1); | ||
a.copy(b, 0, 0x100000000, 0x100000001); | ||
}, outOfRangeError); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because it's not obvious why the test is removed: can you elaborate why it's now obsolete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseArrayIndex()
isn't called without type casting to uint32 any more.
Line 164 in afeb56a
inline MUST_USE_RESULT bool ParseArrayIndex(Local<Value> arg, |
benchmark/buffers/buffer-copy.js
Outdated
|
||
bench.start(); | ||
for (var i = 0; i < n * 1024; i++) { | ||
source.copy(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to test some options as well to see how the sourceStart
/ sourceEnd
etc. compete :)
@reklatsmasters was there a specific reason that you kept https://github.com/nodejs/node/pull/24977/files#diff-196d056a936b6d2649721eb639e0442bR404? |
benchmark/buffers/buffer-copy.js
Outdated
}); | ||
|
||
function main({ n, size }) { | ||
const source = Buffer.allocUnsafe(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically copies from an uninitialized memory chunk below. Which could be or not be all zeroes, btw.
Perhaps use a single source, initialized with crypto (random data) or fill (repeated string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some possible improvements.
lib/buffer.js
Outdated
); | ||
|
||
if (sourceStart > 0 || to_copy < source.byteLength) { | ||
source = source.subarray(sourceStart, sourceStart + to_copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check if .subarray()
is slower than Buffer.prototype.slice()
(not TypedArray.prototype.slice()
). They do the same thing but according to https://crbug.com/v8/7161 (cf. #17431) .subarray
is slower.
You might need to make sure that source
is a Buffer
object to get the correct slice
function, by doing Buffer.from(source.buffer, source.byteOffset, source.byteLength)
.
@@ -488,6 +488,59 @@ Buffer.concat = function concat(list, length) { | |||
return buffer; | |||
}; | |||
|
|||
function fastcopy(source, target, targetStart = 0, sourceStart = 0, sourceEnd) { | |||
if (!isUint8Array(source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we allowed any TypedArray
(e.g. Uint16Array
) as well as DataView
as the source and target. Please also make sure to add tests for these cases.
Co-Authored-By: reklatsmasters <dmitrycvet@gmail.com>
Sadly, i found bottleneck. Creating of %TypedArray% is realy slow operation. When i return
I have only one idea: use |
I had another look at this PR and there's something very weird going on with the benchmark: depending on the number of iterations the numbers either go up (huge n, e.g. 2 ** 20) or down (smaller n e.g., 2 ** 15). I guess we get a highly specialised function from V8 with those big numbers. But that's not really a likely thing a user would do. @reklatsmasters instead of using |
@reklatsmasters ping, will you have time to work on this? |
@lundibundi This pr should be closed, already implemented in #29066. |
%TypedArray%.prototype.set
is able to copy data betweenArrayBuffer
a lot faster. This allows us to speed upBuffer.copy
andBuffer.concat
by 2x in some cases.benchmarks
node@master vs node@pr
node@v10 vs node@v10-pr
This PR is able to be backported to v10 branch. The main question is what to doing with error messages. Some of previous error messages are less informative.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes