Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v14.x] Backport environment teardown Node-API reference double free fixes #37802

Conversation

gabrielschulhof
Copy link
Contributor

These two fixes are necessary in order to prevent an issue on v14.x whereby a gc happens during environment shutdown causing a double free of a Node-API reference.

@gabrielschulhof gabrielschulhof added c++ Issues and PRs that require attention from people who are familiar with C++. node-api Issues and PRs related to the Node-API. v14.x labels Mar 19, 2021
@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Mar 19, 2021
@gabrielschulhof gabrielschulhof changed the title Backport environment teardown Node-API reference double free fixes [v14.x] Backport environment teardown Node-API reference double free fixes Mar 19, 2021
@gabrielschulhof gabrielschulhof force-pushed the backport-teardown-gc-ref-fix-to-v14.x branch from 9083454 to e31b2e3 Compare March 19, 2021 04:26
Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should figure out nodejs/node-addon-api#906 before this lands.

@gabrielschulhof
Copy link
Contributor Author

@legendecas can you verify that this fixes #37236? @mhdawson if so, we may want to land it for that alone.

@mhdawson
Copy link
Member

mhdawson commented Mar 25, 2021

@gabrielschulhof nodejs/node-addon-api#906 has landed so as long as the 2 land together we should be good to go.

EDIT: The PR that landed was: #37876 and we should make sure that it lands along with this one in 14.x (ie either both or neither)

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@legendecas
Copy link
Member

legendecas commented Mar 26, 2021

@gabrielschulhof I tried to cherry-picked @mhdawson's fix and ran with debug build and get segv rarely with stack:

#0  0x00000000070c5630 in ?? ()
#1  0x0000000000f43cb8 in v8impl::(anonymous namespace)::Reference::SecondPassCallback (data=...) at ../../../src/js_native_api_v8.cc:405
#2  0x00000000014f0ea2 in v8::internal::GlobalHandles::PendingPhantomCallback::Invoke (this=this@entry=0x7ffd2bd85aa0, isolate=<optimized out>, type=type@entry=v8::internal::GlobalHandles::PendingPhantomCallback::kSecondPass)
    at ../../../deps/v8/src/handles/global-handles.cc:1500
#3  0x00000000014f0f27 in v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacks (this=0x68071c0) at ../../../deps/v8/src/handles/global-handles.cc:1367
#4  0x00000000014f0fd8 in v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacksFromTask (this=0x68071c0) at ../../../deps/v8/src/handles/global-handles.cc:1350
#5  0x00000000013ff9b9 in non-virtual thunk to v8::internal::CancelableTask::Run() ()
Python Exception <class 'gdb.error'> There is no member or method named _M_head_impl.:
#6  0x000000000107abb4 in node::PerIsolatePlatformData::RunForegroundTask (this=0x6806640, task=) at ../../../src/node_platform.cc:410
#7  0x000000000107b157 in node::PerIsolatePlatformData::FlushForegroundTasksInternal (this=0x6806640) at ../../../src/node_platform.cc:479
#8  0x0000000001079e61 in node::PerIsolatePlatformData::FlushTasks (handle=0x6806d50) at ../../../src/node_platform.cc:238
#9  0x0000000001e20d3c in uv__async_io (loop=0x59f7d60 <default_loop_struct>, w=0x59f7f28 <default_loop_struct+456>, events=1) at ../../../deps/uv/src/unix/async.c:163
#10 0x0000000001e38ea1 in uv__io_poll (loop=0x59f7d60 <default_loop_struct>, timeout=0) at ../../../deps/uv/src/unix/linux-core.c:462
#11 0x0000000001e21725 in uv_run (loop=0x59f7d60 <default_loop_struct>, mode=UV_RUN_ONCE) at ../../../deps/uv/src/unix/core.c:385
#12 0x0000000000f1d6d1 in node::Environment::CleanupHandles (this=0x689abf0) at ../../../src/env.cc:576
#13 0x0000000000f1da25 in node::Environment::RunCleanup (this=0x689abf0) at ../../../src/env.cc:627
#14 0x0000000000eb2447 in node::FreeEnvironment (env=0x689abf0) at ../../../src/api/environment.cc:385
#15 0x0000000001029556 in node::FunctionDeleter<node::Environment, &node::FreeEnvironment>::operator() (this=0x7ffd2bd89850, pointer=0x689abf0) at ../../../src/util.h:624
#16 0x000000000102925d in std::unique_ptr<node::Environment, node::FunctionDeleter<node::Environment, &node::FreeEnvironment> >::~unique_ptr (this=0x7ffd2bd89850, __in_chrg=<optimized out>) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/unique_ptr.h:268
#17 0x0000000001028a4d in node::NodeMainInstance::Run (this=0x7ffd2bd898e0) at ../../../src/node_main_instance.cc:114
#18 0x0000000000f65738 in node::Start (argc=4, argv=0x7ffd2bd89b78) at ../../../src/node.cc:1101
#19 0x0000000002665934 in main (argc=4, argv=0x7ffd2bd89b78) at ../../../src/node_main.cc:141

and the data of frame 2:

(gdb) print *data.parameter_
$4 = {<v8impl::(anonymous namespace)::RefBase> = {<v8impl::Finalizer> = {_env = 0x6855a90, _finalize_callback = 0x0, _finalize_data = 0x70cb6d0, _finalize_hint = 0x0, _finalize_ran = false, _has_env_reference = false}, <v8impl::RefTracker> = {_vptr.RefTracker = 0x6d8acf0,
      next_ = 0x0, prev_ = 0x0}, _refcount = 0, _delete_self = true}, _persistent = {<v8::PersistentBase<v8::Value>> = {val_ = 0x0}, <No data fields>}}

Not sure why it is crashing.. Will try to dig into it to see if the problem is related to the one we are resolving.

@gabrielschulhof
Copy link
Contributor Author

@legendecas @mhdawson I have now cherry-picked #37876 on top. Please re-review!

@legendecas does @mhdawson's fix prevent the crash you are seeing? Please let us know if this additional patch fixes your issue, and whether this issue occurs in production for you. We need to know because there are currently no v14.x releases planned so, if this is an important fix, we may need to arrange for an additional v14.x release.

@legendecas
Copy link
Member

@gabrielschulhof So I find out that v8 doesn't clear the second pass callback if the weakness was cleared (see https://github.com/nodejs/node/blob/master/deps/v8/src/handles/global-handles.cc#L532). The fix in the PR itself is self-contained, however, v8 does call into invalidated callbacks (https://github.com/nodejs/node/blob/master/deps/v8/src/handles/global-handles.cc#L1387) as the callback was not been cleared. Maybe we need to file a fix to v8 first.

// Thus, we don't want any stray gc passes to trigger a second call to
// `Finalize()`, so let's reset the persistent here if nothing is
// keeping it alive.
if (is_env_teardown && _persistent.IsWeak()) {
Copy link
Member

@legendecas legendecas Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging into v8::PersistentBase and the second pass callbacks of GlobalHandles, I find out that _persistent.IsWeak() is always false after the first pass weak callback (in which we reset the persistent).

So this fix ultimately doesn't seem to be applied to the issue at all 😵 , so sorry for not picking this up earlier.

Anyway, the Persistent handle has to be reset on the first pass weak callback, thus the persistent no longer holds an address value of the global handle -- and unable to cancel the second pass callback by any means after the first pass callback. So here we have to ensure that the parameters of the weak info of second pass callbacks been kept alive until the second pass callbacks have been invoked.

I've submitted an issue to https://bugs.chromium.org/p/v8/issues/detail?id=11608. However, while I'm thinking that in the nature of the Persistent, it is possible that we can split the weak parameter lifetime from the v8impl::Reference. Trying to sum up a fix so that we can determine if the approach is acceptable.

Gabriel Schulhof and others added 3 commits April 24, 2021 14:55
The finalizer normally never gets called while a reference is strong.
However, during environment shutdown all finalizers must get called. In
order to unify the deferring behavior with that of a regular
finalization, we must force the reference to be weak when we call its
finalizer during environment shutdown.

Fixes: nodejs#37236
Co-authored-by: Chengzhong Wu <legendecas@gmail.com>
PR-URL: nodejs#37303
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
A gc may happen during environment teardown. Thus, during finalization
initiated by environment teardown we must remove the V8 finalizer
before calling the Node-API finalizer.

Fixes: nodejs#37236
PR-URL: nodejs#37616
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
Refs: nodejs/node-addon-api#906
Refs: nodejs#37616

Fix crash introduced by nodejs#37616

Signed-off-by: Michael Dawson <mdawson@devrus.com>

PR-URL: nodejs#37876
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@targos targos force-pushed the backport-teardown-gc-ref-fix-to-v14.x branch from fecc3f4 to 180c33d Compare April 24, 2021 12:55
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@targos
Copy link
Member

targos commented Apr 26, 2021

Landed in 5a3e12b...8413759

@targos targos closed this Apr 26, 2021
targos pushed a commit that referenced this pull request Apr 26, 2021
The finalizer normally never gets called while a reference is strong.
However, during environment shutdown all finalizers must get called. In
order to unify the deferring behavior with that of a regular
finalization, we must force the reference to be weak when we call its
finalizer during environment shutdown.

Fixes: #37236
Co-authored-by: Chengzhong Wu <legendecas@gmail.com>
PR-URL: #37303
Backport-PR-URL: #37802
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
targos pushed a commit that referenced this pull request Apr 26, 2021
A gc may happen during environment teardown. Thus, during finalization
initiated by environment teardown we must remove the V8 finalizer
before calling the Node-API finalizer.

Fixes: #37236
PR-URL: #37616
Backport-PR-URL: #37802
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Michael Dawson <midawson@redhat.com>
targos pushed a commit that referenced this pull request Apr 26, 2021
Refs: nodejs/node-addon-api#906
Refs: #37616

Fix crash introduced by #37616

Signed-off-by: Michael Dawson <mdawson@devrus.com>

PR-URL: #37876
Backport-PR-URL: #37802
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. node-api Issues and PRs related to the Node-API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants