`Buffer.concat` and `Buffer.copy` silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

rotemdan · 2024-10-17T11:49:54Z

Version

v22.9.0, v23.0.0

Platform

Windows 11 x64

Microsoft Windows NT 10.0.22631.0 x64

Subsystem

Buffer

What steps will reproduce the bug?

const largeBuffer = Buffer.alloc(2 ** 32 + 5)
largeBuffer.fill(111)

const result = Buffer.concat([largeBuffer])
console.log(result)

How often does it reproduce? Is there a required condition?

Consistent in v22.9.0 and v23.0.0

What is the expected behavior? Why is that the expected behavior?

All bytes of the return buffer produced by Buffer.concat([largeBuffer]) should be identical to the source:

In this example:

111, 111, 111, 111, 111, 111, 111, 111, 111, 111, 111, ....

What do you see instead?

In the returned buffer, first 5 bytes are 111, and all following ones are 0.

111, 111, 111, 111, 111, 0, 0, 0, 0, 0, 0, ....

The console.log(result) output looks like:

<Buffer 6f 6f 6f 6f 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 4294967251
 more bytes>

Additional information

No response

The text was updated successfully, but these errors were encountered:

rotemdan · 2024-10-17T13:49:35Z

My current workaround (tested to produce correct results with sizes greater than 4 GiB):

export function concatBuffers(buffers: Buffer[]) {
	let totalLength = 0

	for (const buffer of buffers) {
		totalLength += buffer.length
	}

	const resultBuffer = Buffer.alloc(totalLength)

	if (totalLength === 0) {
		return resultBuffer
	}

	let writeOffset = 0

	for (const buffer of buffers) {
		resultBuffer.set(buffer, writeOffset)

		writeOffset += buffer.length
	}

	return resultBuffer
}

RedYetiDev · 2024-10-17T13:53:15Z

The issue started in v22.7.0. I'll start bisecting. Maybe #54087?

RedYetiDev · 2024-10-17T14:45:20Z

I've finished bisecting. This was indeed caused by #54087 cc @ronag.

9f8f26eb2ff36f9352dd85643073af876b9d6b46 is the first bad commit
commit 9f8f26eb2ff36f9352dd85643073af876b9d6b46 (HEAD)
Author: Robert Nagy <ronagy@icloud.com>
Date:   Fri Aug 2 11:19:41 2024 +0200

    buffer: use native copy impl
    
    PR-URL: https://github.com/nodejs/node/pull/54087
    Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
    Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
    Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com>
    Reviewed-By: Daniel Lemire <daniel@lemire.me>

 benchmark/buffers/buffer-copy.js |  6 ------
 lib/buffer.js                    | 11 ++++++-----
 src/node_buffer.cc               | 56 +++++++++++++++++++++++++++-----------------------------
 src/node_external_reference.h    |  9 +++++++++
 4 files changed, 42 insertions(+), 40 deletions(-)

ronag · 2024-10-21T05:02:30Z

Anyone care to open a PR? I think this could be a simple case of just switching to .set(srcBuffer) (instead of using native methods) in the case where the total length exceeds e.g. 2 GB.

duncpro · 2024-10-21T08:25:49Z

I reproduced this on macOS.

@ronag I'd like to try and tackle this one.

MrJithil · 2024-10-21T10:08:40Z

I reproduced this on macOS.

@ronag I'd like to try and tackle this one.

good luck.

rotemdan · 2024-10-21T10:18:44Z

This call to _copy is possibly the reason:

function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
  if (sourceEnd - sourceStart > target.byteLength - targetStart)
    sourceEnd = sourceStart + target.byteLength - targetStart;

  let nb = sourceEnd - sourceStart;
  const sourceLen = source.byteLength - sourceStart;
  if (nb > sourceLen)
    nb = sourceLen;

  if (nb <= 0)
    return 0;

  _copy(source, target, targetStart, sourceStart, nb); // <--

  return nb;
}

_copy is imported from some sort of internal binding.

const {
  byteLengthUtf8,
  compare: _compare,
  compareOffset,
  copy: _copy, // <--
  fill: bindingFill,
  isAscii: bindingIsAscii,
  isUtf8: bindingIsUtf8,
  indexOfBuffer,
  indexOfNumber,
  indexOfString,
  swap16: _swap16,
  swap32: _swap32,
  swap64: _swap64,
  kMaxLength,
  kStringMaxLength,
  atob: _atob,
  btoa: _btoa,
} = internalBinding('buffer');

A thorough solution is to ensure this method correctly handles large array sizes, or fails.

Just working around it by falling back to TypedArray.set, would leave the possibility of a future issue if some other code calls _copy.

duncpro · 2024-10-21T10:31:42Z

So the root cause of this problem is 32-bit integer overflow in SlowCopy in node_buffer.cc here.

const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked();

Apparently Uint32Value performs a wrapping conversion. So that's why in the example below the target buffer only gets filled with 5 bytes.

const largeBuffer = Buffer.alloc(2 ** 32 + 5)
largeBuffer.fill(111)

const result = Buffer.concat([largeBuffer])
console.log(result); // 6f 6f 6f 6f 6f 00 00 00 ...
                     // 1  2  3  4  5

Simply replacing Uint32Value with IntegerValue will fix this barring edge cases I've yet to fully consider.

rotemdan · 2024-10-21T10:32:41Z

I'm not sure what exactly the binding refers to, but I found a candidate method in the C++ code (at node/src/node_buffer.cc) that treats all arguments as Uint32:

// Assume caller has properly validated args.
void SlowCopy(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  ArrayBufferViewContents<char> source(args[0]);
  SPREAD_BUFFER_ARG(args[1].As<Object>(), target);

  const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
  const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
  const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked();

  memmove(target_data + target_start, source.data() + source_start, to_copy);
  args.GetReturnValue().Set(to_copy);
}

Regardless on whether it's the method used in the binding, using Uint32Value to extract the arguments doesn't seem right.

This method follows, also taking in uint32_ts:

uint32_t FastCopy(Local<Value> receiver,
                  const v8::FastApiTypedArray<uint8_t>& source,
                  const v8::FastApiTypedArray<uint8_t>& target,
                  uint32_t target_start,
                  uint32_t source_start,
                  uint32_t to_copy) {
  uint8_t* source_data;
  CHECK(source.getStorageIfAligned(&source_data));

  uint8_t* target_data;
  CHECK(target.getStorageIfAligned(&target_data));

  memmove(target_data + target_start, source_data + source_start, to_copy);

  return to_copy;
}

duncpro · 2024-10-21T10:33:27Z

@rotemdan this is correct

rotemdan · 2024-10-21T10:43:39Z

If you simply search for the string "uint32" in node/src/node_buffer.cc, you'd realize that many other methods assume that indices are uint32 (4 GiB max). Examples I've found:

CopyArrayBuffer
Fill
StringWrite
FastByteLengthUtf8
SlowIndexOfNumber (makes assumption that needle is uint32 - not the index)
FastIndexOfNumber (makes assumption that needle is uint32 - not the index)
WriteOneByteString
FastWriteString
...

ronag · 2024-10-21T10:48:22Z

I think the fast methods won't get called with anything that doesn't fit into uint32.

ronag · 2024-10-21T10:48:45Z

It's the slow methods that need fixing I guess. Should we even support 4G+ Buffers? @jasnell

rotemdan · 2024-10-21T11:08:00Z

It already supports large typed arrays (new Uint8Array(>= 4 GiB)) and buffers (Buffer.alloc(>= 4 GiB)) since version 22 (or earlier? not sure), which I think is great because it opened up many use cases that were limited before (in my case audio processing of multi-hour audio, and loading large machine-learning models, etc).

Fixing the methods in node/src/node_buffer.cc, by itself, isn't really that hard. It's more about ensuring that the code works correctly in various 32 bit and 64 bit platforms and processor architectures that are currently supported by Node.js.

As an intermediate solution, you could allow large ArrayBuffers but disallow large Buffer objects, but eventually you'd want to fix the Buffer objects to match the capabilities of ArrayBuffers (unless Buffer would be entirely deprecated at some point, or something like that).

rotemdan · 2024-10-22T06:51:37Z

The fix should be really simple (couldn't test because I don't really know how to compile Node.js at the moment):

In SlowCopy: change ->Uint32Value to ->IntegerValue, which would cause target_start, source_start, to_copy to receive int64_t values:

// Assume caller has properly validated args.
void SlowCopy(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);

  ArrayBufferViewContents<char> source(args[0]);
  SPREAD_BUFFER_ARG(args[1].As<Object>(), target);

  const auto target_start = args[2]->IntegerValue(env->context()).ToChecked();
  const auto source_start = args[3]->IntegerValue(env->context()).ToChecked();
  const auto to_copy = args[4]->IntegerValue(env->context()).ToChecked();

  memmove(target_data + target_start, source.data() + source_start, to_copy);
  args.GetReturnValue().Set(to_copy);
}

The signature of memmove is:

_VCRTIMP void* __cdecl memmove(
    _Out_writes_bytes_all_opt_(_Size) void*       _Dst,
    _In_reads_bytes_opt_(_Size)       void const* _Src,
    _In_                              size_t      _Size
    );

This means there's an implicit cast here from int64_t to size_t. If we pre-validate (in JavaScript) that these are all positive JavaScript safe integers (between 0 and Number.MAX_SAFE_INTEGER), then the cast should be safe (could add static_cast<size_t>(...) but it wouldn't necessarily do anything).

For extra safety for 32-bit platforms, we could ensure they are all in the range of 0 to SIZE_MAX (pointer size), but those checks can also be done in JavaScript.

It's also easy fix FastCopy by changing uint32_t to size_t:

// Assume caller has properly validated args.
size_t FastCopy(Local<Value> receiver,
                  const v8::FastApiTypedArray<uint8_t>& source,
                  const v8::FastApiTypedArray<uint8_t>& target,
                  size_t target_start,
                  size_t source_start,
                  size_t to_copy) {
  uint8_t* source_data;
  CHECK(source.getStorageIfAligned(&source_data));

  uint8_t* target_data;
  CHECK(target.getStorageIfAligned(&target_data));

  memmove(target_data + target_start, source_data + source_start, to_copy);

  return to_copy;
}

These kind of changes are really simple to do.

I definitely think they are worth it.

Anyway, 4 GiB+ contiguous ArrayBuffers should be important (essential?) for WASM64, I believe (and so many other great applications, of course, like memory mapping, databases, machine-learning, large vectors/matrices etc.), and based on my observations of the Node.js code, the amount of effort that would be required to try to artificially restrict Buffer to 32-bit pointers may also be significant by itself, and may turn out to be problematic due to various subtle interactions with underlying ArrayBuffers.

Deprecating Buffer entirely would be orders of magnitudes larger in terms of effort (would require changing massive amounts of code, like string processing, networking, streams, etc.). These fixes aren't that much in comparison. I'm willing to participate, once I figure out how to compile Node.js.

rotemdan · 2024-10-22T11:31:56Z

I've verified that changing:

  const auto target_start = args[2]->Uint32Value(env->context()).ToChecked();
  const auto source_start = args[3]->Uint32Value(env->context()).ToChecked();
  const auto to_copy = args[4]->Uint32Value(env->context()).ToChecked();

To:

  const auto target_start = args[2]->IntegerValue(env->context()).ToChecked();
  const auto source_start = args[3]->IntegerValue(env->context()).ToChecked();
  const auto to_copy = args[4]->IntegerValue(env->context()).ToChecked();

Seems to fix the issue (tested on Windows 11 x64)

Before:

After:

I'll try to do some more testing before I'll give a pull request.

I also made a fix for CopyArrayBuffer, which is also an exported method.

I've had trouble with fixing other methods that required changing the signature, like FastCopy and FastByteLengthUtf8 since the compiler was giving errors I didn't fully understand. Maybe their signature is encoded somewhere, and I need to modify it there. I couldn't really find so far.

There are also two other minor fixes I looked at:

In StringWrite and SlowWriteString:

uint32_t written = 0;

May be changed to:

size_t written = 0;

Since those assignments may be casting from size_t anyway.

I'll try to work on each fix separately for now. Not all at once.

rotemdan · 2024-10-23T08:50:21Z

Based on observations on the code, I realized the same problem should also occur in Buffer.copy. Turns out it did actually:

Before fix:

After fix:

Buffer.copy is defined as:

Buffer.prototype.copy =
  function copy(target, targetStart, sourceStart, sourceEnd) {
    return copyImpl(this, target, targetStart, sourceStart, sourceEnd);
  };

copyImpl (also called from Buffer.concat) calls _copyActual which calls the C++ function SlowCopy via the binding:

function _copyActual(source, target, targetStart, sourceStart, sourceEnd) {
  if (sourceEnd - sourceStart > target.byteLength - targetStart)
    sourceEnd = sourceStart + target.byteLength - targetStart;

  let nb = sourceEnd - sourceStart;
  const sourceLen = source.byteLength - sourceStart;
  if (nb > sourceLen)
    nb = sourceLen;

  if (nb <= 0)
    return 0;

  _copy(source, target, targetStart, sourceStart, nb); // <------- Binds to SlowCopy

  return nb;
}

Fixing the C++ method (SlowCopy) should also fix this.

targos added the buffer Issues and PRs related to the buffer subsystem. label Oct 17, 2024

rotemdan changed the title ~~Buffer.concat silently produces invalid output when its output size is greater than 4GB~~ Buffer.concat silently produces invalid output when its output size is greater than 4GiB Oct 17, 2024

RedYetiDev added the confirmed-bug Issues with confirmed bugs. label Oct 17, 2024

RedYetiDev added the regression Issues related to regressions. label Oct 17, 2024

ronag added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label Oct 21, 2024

duncpro linked a pull request Oct 22, 2024 that will close this issue

fix: support buffers greater than 2^32 bytes in length in Buffer.concat and Buffer.copy #55492

Open

rotemdan changed the title ~~Buffer.concat silently produces invalid output when its output size is greater than 4GiB~~ Buffer.concat and Buffer.copy silently produce invalid results when the operation involves indices equal or greater than 2^32 Oct 23, 2024

RedYetiDev mentioned this issue Oct 27, 2024

Incorrect result from Buffer.from(...).buffer when using base64url encoding #55564

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Buffer.concat` and `Buffer.copy` silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

`Buffer.concat` and `Buffer.copy` silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

rotemdan commented Oct 17, 2024

rotemdan commented Oct 17, 2024

RedYetiDev commented Oct 17, 2024 •

edited

Loading

RedYetiDev commented Oct 17, 2024 •

edited

Loading

ronag commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024 •

edited

Loading

MrJithil commented Oct 21, 2024

rotemdan commented Oct 21, 2024

duncpro commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024

rotemdan commented Oct 21, 2024 •

edited

Loading

ronag commented Oct 21, 2024

ronag commented Oct 21, 2024

rotemdan commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 22, 2024 •

edited

Loading

rotemdan commented Oct 22, 2024 •

edited

Loading

rotemdan commented Oct 23, 2024 •

edited

Loading

Buffer.concat and Buffer.copy silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

Buffer.concat and Buffer.copy silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

Comments

rotemdan commented Oct 17, 2024

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior? Why is that the expected behavior?

What do you see instead?

Additional information

rotemdan commented Oct 17, 2024

RedYetiDev commented Oct 17, 2024 • edited Loading

RedYetiDev commented Oct 17, 2024 • edited Loading

ronag commented Oct 21, 2024 • edited Loading

duncpro commented Oct 21, 2024 • edited Loading

MrJithil commented Oct 21, 2024

rotemdan commented Oct 21, 2024

duncpro commented Oct 21, 2024 • edited Loading

rotemdan commented Oct 21, 2024 • edited Loading

duncpro commented Oct 21, 2024

rotemdan commented Oct 21, 2024 • edited Loading

ronag commented Oct 21, 2024

ronag commented Oct 21, 2024

rotemdan commented Oct 21, 2024 • edited Loading

rotemdan commented Oct 22, 2024 • edited Loading

rotemdan commented Oct 22, 2024 • edited Loading

rotemdan commented Oct 23, 2024 • edited Loading

`Buffer.concat` and `Buffer.copy` silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

`Buffer.concat` and `Buffer.copy` silently produce invalid results when the operation involves indices equal or greater than 2^32 #55422

RedYetiDev commented Oct 17, 2024 •

edited

Loading

RedYetiDev commented Oct 17, 2024 •

edited

Loading

ronag commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024 •

edited

Loading

duncpro commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 21, 2024 •

edited

Loading

rotemdan commented Oct 22, 2024 •

edited

Loading

rotemdan commented Oct 22, 2024 •

edited

Loading

rotemdan commented Oct 23, 2024 •

edited

Loading