Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: result.second #32463

Closed
richardlau opened this issue Mar 24, 2020 · 26 comments
Closed

Check failed: result.second #32463

richardlau opened this issue Mar 24, 2020 · 26 comments
Labels
addons Issues and PRs related to native addons. buffer Issues and PRs related to the buffer subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. node-api Issues and PRs related to the Node-API.

Comments

@richardlau
Copy link
Member

unrelated failure in multiple different PRs?

test.node-api/test_buffer/test

#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x3fffcd8ab3d0
 1: 0x106f06f4  [out/Release/node]
 2: 0x11b83bec V8_Fatal(char const*, ...) [out/Release/node]
 3: 0x10c4685c v8::internal::GlobalBackingStoreRegistry::Register(std::shared_ptr<v8::internal::BackingStore>) [out/Release/node]
 4: 0x1089f2c0 v8::ArrayBuffer::GetBackingStore() [out/Release/node]
 5: 0x106385f8  [out/Release/node]
 6: 0x108fb184  [out/Release/node]
 7: 0x108fd324 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [out/Release/node]
 8: 0x11377680  [out/Release/node]

Not sure why you pinged build here as that's a code problem (failing check). From the stack trace and previously (#31061) that check is to with arraybuffers and backing stores. If it's occurring in multiple PR's that would suggest something has landed on master that has regressed.

Originally posted by @richardlau in #32414 (comment)

@richardlau
Copy link
Member Author

Pulling this out into a separate issue for visibility.

@ronag do you have links to the failing runs? If we can isolate when it started happening it might help work out what caused it.

@ronag
Copy link
Member

ronag commented Mar 24, 2020

https://ci.nodejs.org/job/node-test-commit-plinux/31722/

Might be flaky. I did see it in another PR as well but can't find it at the moment.

@davedoesdev
Copy link
Contributor

I'm seeing this in Node 14.1.0 if I create two Buffers (using Napi::Buffer<uint8_t>::New) pointing to the same memory (stack trace below). I'm guessing this trips the CHECK in backing-store.cc because the address is already in map_.
Allocating two external buffers pointing to the same address used to work. Is the change in behaviour expected?

#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x7ffc13226770
 1: 0x55f12e563285  [/home/david/node/out/Release/node]
 2: 0x55f12f57d916 V8_Fatal(char const*, ...) [/home/david/node/out/Release/node]
 3: 0x55f12e99b485 v8::internal::GlobalBackingStoreRegistry::Register(std::shared_ptr<v8::internal::BackingStore>) [/home/david/node/out/Release/node]
 4: 0x55f12e6bd967 v8::ArrayBuffer::GetBackingStore() [/home/david/node/out/Release/node]
 5: 0x55f12e4c4cbb node::Buffer::New(node::Environment*, char*, unsigned long, void (*)(char*, void*), void*) [/home/david/node/out/Release/node]
 6: 0x55f12e4c4fa2 node::Buffer::New(v8::Isolate*, char*, unsigned long, void (*)(char*, void*), void*) [/home/david/node/out/Release/node]
 7: 0x55f12e4b743e napi_create_external_buffer [/home/david/node/out/Release/node]
 8: 0x7f61c8d4e062 Napi::Buffer<unsigned char>::New(napi_env__*, unsigned char*, unsigned long) [/home/david/shared-memory-disruptor/build/Debug/disruptor.node]
 9: 0x7f61c8d4d2be Napi::Buffer<unsigned char> Disruptor::ProduceClaimSync<Napi::Buffer>(Napi::Env const&, bool, unsigned long&, unsigned long&) [/home/david/shared-memory-disruptor/build/Debug/disruptor.node]
10: 0x7f61c8d4565e Disruptor::ProduceClaim(Napi::CallbackInfo const&) [/home/david/shared-memory-disruptor/build/Debug/disruptor.node]
11: 0x7f61c8d4f49d Napi::ObjectWrap<Disruptor>::InstanceMethodCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}::operator()() const [/home/david/shared-memory-disruptor/build/Debug/disruptor.node]
12: 0x7f61c8d50c02 napi_value__* Napi::details::WrapCallback<Napi::ObjectWrap<Disruptor>::InstanceMethodCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}>(Napi::ObjectWrap<Disruptor>::InstanceMethodCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}) [/home/david/shared-memory-disruptor/build/Debug/disruptor.node]
13: 0x7f61c8d4f535 Napi::ObjectWrap<Disruptor>::InstanceMethodCallbackWrapper(napi_env__*, napi_callback_info__*) [/home/david/shared-memory-disruptor/build/Debug/disruptor.node]
14: 0x55f12e49a9d8  [/home/david/node/out/Release/node]
15: 0x55f12e6f6d3f v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/home/david/node/out/Release/node]
16: 0x55f12e6f7100  [/home/david/node/out/Release/node]
17: 0x55f12e6f799a  [/home/david/node/out/Release/node]
18: 0x55f12e6f824a v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/home/david/node/out/Release/node]
19: 0x55f12ef766b9  [/home/david/node/out/Release/node]

@davedoesdev
Copy link
Contributor

I guess this change is intended: https://monorail-prod.appspot.com/p/v8/issues/detail?id=9908

I'll try to work around it in my addons.

@lovell
Copy link
Contributor

lovell commented May 6, 2020

Here are a couple of downstream issues from native modules that have been impacted by this:

(Anecdotally I've heard https://github.com/grpc/grpc-node is also affected, but the only issue I can find that mentions it is pulumi/pulumi#4258 (comment) .)

Suggestions for a straightforward fix or workaround for this breaking change would be greatly appreciated.

@strigona-worksight
Copy link

I have experienced issues with grpc-node as well in some early tests of node 14. I haven't circled back to gather enough info to submit an issue to the project.

@jasnell
Copy link
Member

jasnell commented May 6, 2020

/cc @addaleax

@davedoesdev
Copy link
Contributor

davedoesdev commented May 6, 2020

My addon has a fixed memory range so I'm able to work around this issue by making a single buffer and then calling Buffer#slice when I need to return a view onto it.
Wouldn't work for addons which dynamically allocate memory though - I guess you could keep track of what you've allocated previously and use finalizers to determine when and when not to return the same object.

@chjj
Copy link
Contributor

chjj commented May 8, 2020

I'm having serious issues with this. I spent a while trying to debug it and I can't seem to figure it out. I've audited my code thoroughly and concluded that it does not pass live duplicate pointers into napi_create_external_buffer. I've also valgrinded it to ensure nothing funky is happening memory-wise.

From what I can guess this is happening:

  1. Addon code allocates pointer P via malloc.
  2. P is passed into napi_create_external_buffer with a finalization callback which calls free(P). P is inserted into v8's global array buffer table for tracking.
  3. The finalization callback is executed on GC. P is freed and returned to the allocator. P is not yet removed from v8's global array buffer table.
  4. Addon code attempts to allocate memory once again. The allocator returns P, as it is now available.
  5. P is passed into napi_create_external_buffer. P still has not been removed from the v8 global array buffer table.
  6. The world ends with Check failed: result.second.

Step 3 seems to be the issue (again, this is just my guess at what is happening). It's been difficult trying to isolate this as it appears to be a "now you see it, now you don't" type of issue. I've come up with an offensively slow workaround. I don't consider it acceptable performance-wise.

@addaleax
Copy link
Member

addaleax commented May 8, 2020

@chjj Yeah, we’ve been having a lot of trouble with this, too. This isn’t quite trivial to fix, because the API contracts for napi_create_external_buffer and the V8 BackingStore aren’t the same – you can’t run JS from the BackingStore deleter callback or necessarily even be sure on which thread it runs, but you can do that from the Node.js/N-API Buffer free callback.

I’ll try to change the behavior here by making the Buffer free callback only fire once the BackingStore deleter runs, by posting a task back to the JS thread, but that’s not necessarily trivial either.

(It’s unfortunate that we have a Buffer API in N-API at all, tbh. But that’s a mistake we can’t undo now.)

@chjj
Copy link
Contributor

chjj commented May 8, 2020

@addaleax, thanks for the response.

I’ll try to change the behavior here by making the Buffer free callback only fire once the BackingStore deleter runs, by posting a task back to the JS thread, but that’s not necessarily trivial either.

I see. Now that I'm understanding the issue more, I suppose the alternative would be to allow the programmer to pass two finalize callbacks: one for JS stuff, one for non-JS stuff which would execute atomically with the BackingStore deletion. That seems pretty ugly though and probably not possible to implement at this point.

(It’s unfortunate that we have a Buffer API in N-API at all, tbh. But that’s a mistake we can’t undo now.)

Would napi_create_external_arraybuffer not have this issue? Based on what you said above, I assumed it would.

I'm going through my code now and considering how to rewrite all of this.

@addaleax
Copy link
Member

addaleax commented May 8, 2020

(It’s unfortunate that we have a Buffer API in N-API at all, tbh. But that’s a mistake we can’t undo now.)

Would napi_create_external_arraybuffer not have this issue? Based on what you said above, I assumed it would.

It does, unfortunately – for the same reason as the Buffer variant, its API contract precedes the V8 changes here. I’ll try to implement the solution from above, and see how far I’ll get with that.

@addaleax addaleax added buffer Issues and PRs related to the buffer subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. node-api Issues and PRs related to the Node-API. addons Issues and PRs related to native addons. labels May 9, 2020
addaleax added a commit to addaleax/node that referenced this issue May 9, 2020
Release `Buffer` and `ArrayBuffer` instances that were created through
our addon APIs and have finalizers attached to them only after V8 has
called the deleter callback passed to the `BackingStore`, instead of
relying on our own GC callback(s).

This fixes the following race condition:

1. Addon code allocates pointer P via `malloc`.
2. P is passed into `napi_create_external_buffer` with a finalization
   callback which calls `free(P)`. P is inserted into V8’s global array
   buffer table for tracking.
3. The finalization callback is executed on GC. P is freed and returned
   to the allocator. P is not yet removed from V8’s global array
   buffer table. (!)
4. Addon code attempts to allocate memory once again. The allocator
   returns P, as it is now available.
5. P is passed into `napi_create_external_buffer`. P still has not been
   removed from the v8 global array buffer table.
6. The world ends with `Check failed: result.second`.

Since our API contract is to call the finalizer on the JS thread on
which the `ArrayBuffer` was created, but V8 may call the `BackingStore`
deleter callback on another thread, fixing this requires posting
a task back to the JS thread.

Refs: nodejs#32463 (comment)
Fixes: nodejs#32463
@addaleax
Copy link
Member

addaleax commented May 9, 2020

@chjj #33321 should hopefully fix this. It’s not a trivial fix, but it should address the race condition you described while upholding the pre-v14.x Node.js API contract.

codebytere pushed a commit that referenced this issue May 16, 2020
Release `Buffer` and `ArrayBuffer` instances that were created through
our addon APIs and have finalizers attached to them only after V8 has
called the deleter callback passed to the `BackingStore`, instead of
relying on our own GC callback(s).

This fixes the following race condition:

1. Addon code allocates pointer P via `malloc`.
2. P is passed into `napi_create_external_buffer` with a finalization
   callback which calls `free(P)`. P is inserted into V8’s global array
   buffer table for tracking.
3. The finalization callback is executed on GC. P is freed and returned
   to the allocator. P is not yet removed from V8’s global array
   buffer table. (!)
4. Addon code attempts to allocate memory once again. The allocator
   returns P, as it is now available.
5. P is passed into `napi_create_external_buffer`. P still has not been
   removed from the v8 global array buffer table.
6. The world ends with `Check failed: result.second`.

Since our API contract is to call the finalizer on the JS thread on
which the `ArrayBuffer` was created, but V8 may call the `BackingStore`
deleter callback on another thread, fixing this requires posting
a task back to the JS thread.

Refs: #32463 (comment)
Fixes: #32463

PR-URL: #33321
Reviewed-By: James M Snell <jasnell@gmail.com>
@ggreco
Copy link
Contributor

ggreco commented May 21, 2020

Except for the race condition, so, with node 14.x (also the 14.3 that implements the fix), it's no (more?) correct to have multiple create_node_buffer pointing at the same c buffer?

I'm using a pool of frames for decoding and I used to create a new node buffer for every frame (not caring if the source frame was already used in another buffer).
In this particular case node crashes with the error of the original post as soon as I start reusing the first frame.
What is the correct way to handle this situation?
I have to keep a reference to the node buffers mapped to the frame data?

Something like:

std::map<uint8_t *, Napi::Reference<Napi::Buffer<uint8_t> > > frames;

auto it = frames.find(ptr);
if (it != frames.end())
    return it->second.Value();
else {
    frames_[ptr] =  Napi::Persistent(Napi::Buffer<uint8_t>::New(env, ptr, frame_size));
 }
so that I can give an existing buffer if the source data is the same?

@addaleax
Copy link
Member

Except for the race condition, so, with node 14.x (also the 14.3 that implements the fix), it's no (more?) correct to have multiple create_node_buffer pointing at the same c buffer?

Yes – it’s unfortunate but it’s what it is.

I have to keep a reference to the node buffers mapped to the frame data?

Well, that’s what I ended up doing – it’s not great, but it works. You’ll want to make sure that frames is accessed correctly and persisted for the duration of the addon (depending on your addon that might mean using a) napi_set_instance_data()/napi_get_instance_data() to store it in some struct that contains that map, or, if it is possible that the same Buffer base pointers end up being used across multiple threads, b) use a global for the map + mutexes for locking) and you’ll want to add a finalizer callback to Buffer::New() that removes the entry from the map to avoid memory leaks.

@davedoesdev
Copy link
Contributor

davedoesdev commented May 21, 2020

I ran into a problem keeping track of weak references to buffers, which I need because I hand them out to the Javascript app and let them be garbage collected.
The buffers would get collected (and thus become stale). However, the finalizer doesn't get run until it's removed from the backing store.
This means there's a gap between the buffer reference becoming unusable (it's been GC'ed but is still in the backing store) and when my finalizer gets called to remove it from my map.
More here, plus details of the horrible hack I've implemented to work around it:
https://github.com/davedoesdev/shared-memory-disruptor/blob/master/src/disruptor.cc#L468

@ggreco
Copy link
Contributor

ggreco commented May 22, 2020

@davedoesdev thanks for showing your implementation, I wonder how does it work with LTS (or older) node, where there is no guarantee of the release order of the backingstore.
I ended up doing an extra copy of the frame (not using the frame pool) and having a single persisted buffer passed to the javascript world. Also with 1080p videos it seems not much slower than the previous approach that avoided the copy (using a new buffer pointing to the decoded frame data, that does not work with node 14) and does not require the complexity of finalizers and mutexes that has the map version...
Still it's sad that the NAPI node interface that has been built to shield native modules against V8 changes is so vulnerable to v8 changes!

wilsonzlin added a commit to wilsonzlin/minify-html that referenced this issue Aug 7, 2021
@Brooooooklyn
Copy link

@addaleax should we reopen it since it doesn't actually be fixed.

@addaleax
Copy link
Member

@Brooooooklyn Can you provide more context than this?

@schmod
Copy link

schmod commented May 26, 2022

We're seeing this on Node 16.15.0 while running tests with Ava.

@Pablodotjs
Copy link

Getting this in Node v16.14.0 several times

@Ram4GB
Copy link

Ram4GB commented Jun 29, 2022

I got same issue in Node v16.15.1 when I run tests with Vitest.

# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0000002046CFB340
 1: 00007FF7F1B879CF v8::internal::CodeObjectRegistry::~CodeObjectRegistry+114207
 2: 00007FF7F1AA3E9F std::basic_ostream<char,std::char_traits<char> >::operator<<+65103
 3: 00007FF7F27826C2 V8_Fatal+162
 4: 00007FF7F21EDA3D v8::internal::BackingStore::Reallocate+637
 5: 00007FF7F24371B9 v8::ArrayBuffer::GetBackingStore+137
 6: 00007FF7F1B367EA node::Buffer::Data+58
 7: 00007FF7F1B0AC5C DSA_meth_get_flags+19404
 8: 00007FF7F2405BC6 v8::internal::Builtins::code_handle+172790
 9: 00007FF7F24057B9 v8::internal::Builtins::code_handle+171753
10: 00007FF7F2405A7C v8::internal::Builtins::code_handle+172460
11: 00007FF7F24058E0 v8::internal::Builtins::code_handle+172048
12: 00007FF7F24D8FE1 v8::internal::SetupIsolateDelegate::SetupHeap+494673
13: 000001DFF0D54323

@bnoordhuis
Copy link
Member

@Ram4GB Can you open a new issue and link back to this one? It's not clear whether this is in fact the same issue or just the same error message.

By the way, the stack trace is bogus (common problem on Windows.) It'd be helpful if you could test on Linux or macOS, where the stack trace is reliable.

@mpowell90
Copy link

@bnoordhuis not sure if this helps but rolling back to node v16.13.2 has stopped these intermittent fatal errors on my system (MacBook Pro M1).

@tony19
Copy link

tony19 commented Aug 10, 2022

FWIW, I can still reproduce the issue with Node v16.13.2.

sparist added a commit to Distributive-Network/node that referenced this issue Feb 3, 2023
This commit effectively reverts:
nodejs#32463

That change introduces an upward JS API dependency
on the Node API. Since that change fixed a bug, this
will need to be revisited to determine the best way to
fix the original bug without circular dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. buffer Issues and PRs related to the buffer subsystem. flaky-test Issues and PRs related to the tests with unstable failures on the CI. node-api Issues and PRs related to the Node-API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.