-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v10.x] src: dispose of V8 platform in process.exit()
#26048
Conversation
Resume CI: https://ci.nodejs.org/job/node-test-pull-request/20737/console [Green] |
Calling `process.exit()` calls the C `exit()` function, which in turn calls the destructors of static C++ objects. This can lead to race conditions with other concurrently executing threads; disposing of all Worker threads and then the V8 platform instance helps with this (although it might not be a full solution for all problems of this kind). Refs: #24403 Refs: #25007 Backport-PR-URL: #26048 PR-URL: #25061 Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Landed in |
This commit breaks some repos for v10.x. For example
We are going to back it out of the v10.15.2 proposal and we can run CITGM against this PR and figure out what is going on. FWIW there were no breakages for citgm with this change on 11.x, so it is possible that there are some other required changes. @gireeshpunathil I know that you wanted this to land on 10.x to fix some other related issues, perhaps you want to dig in to try and get this back in the 10.15.2 proposal |
Needs new reviews as this PR broke 10.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 until CITGM errors are resolved
@MylesBorins Do you think you could create a core dump/stack trace for the crash you are seeing? I can’t reproduce it locally. |
@addaleax I'll dig back in a bit later this week. What command should I run with to get the core dump? 😅 |
@MylesBorins - also can you tell which platform the issue is seen? |
@MylesBorins Err, I’m not sure how to do that reliably on macOS. Running |
13f9356
to
5711238
Compare
Unable to recreate locally. For any one else interested, looks like it is an easy recreate on mac os mojave with this step (node v10.15.2-rc.0): |
RCX is supposed to hold node/deps/v8/src/parsing/parser.cc Line 2591 in b7bbd87
ok, got one crash with this version. Though this ( here, the main thread has cleaned up the execution environment and is waiting for the workers to stop, but one of the worker does not seem to have stopped. I don't know if there is an associated PR that applies here. @addaleax will know! |
That’s a static global variable, I’m not sure how accessing it could fail? This looks more like execution was stopped at some random point due to the signal?
What is the Worker thread doing at this point? Or is that the first stack trace in your comment?
Not that I know of, at least without knowing why the Worker hasn’t stopped. |
@addaleax - yes, the first stack trace is for the worker. |
I did 2 mistakes last time: i) used the master source ii) bluntly trusted the
moving on, I examined several dumps, and get different patterns, but one thing in common is: worker threads are running while main thread is waiting. Could it be that |
Able to get the original crash! (gulp-util, no worker in the picture)
there are only 2 threads, the other thread being The mutex does not have a destructor so goes down only with NodePlatform object, so not sure what is happening here! |
@gireeshpunathil Looks like the issue is that we’re trying to recursively lock This reminds me of bafd808. We removed the I’m wondering whether it would be okay to just remove that call, not replacing it with checks. |
wow! that matches. but further progression from But on this very edge of termination, making sure |
I have done a test build on MacOS and removing the |
Did this end up getting landed? |
@MylesBorins, no. On merge, this either needs to remove the |
@BethGriggs do you want to push those chnages to this pr? |
Calling `process.exit()` calls the C `exit()` function, which in turn calls the destructors of static C++ objects. This can lead to race conditions with other concurrently executing threads; disposing of all Worker threads and then the V8 platform instance helps with this (although it might not be a full solution for all problems of this kind). Refs: nodejs#24403 Refs: nodejs#25007 PR-URL: nodejs#25061 Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Node first calls `Isolate::Dispose`, then `NodePlatform::UnregisterIsolate`. This again calls `PerIsolatePlatformData::Shutdown`, which (before this patch) called `FlushForegroundTasksInternal`, which might call `RunForegroundTask` if it finds foreground tasks to be executed. This will fail however, since `Isolate::GetCurrent` was already reset during `Isolate::Dispose`. Hence remove the check to `FlushForegroundTasksInternal` and add checks instead that no more foreground tasks are scheduled. Refs: v8#86 PR-URL: nodejs#25653 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
24f8e28
to
49674f7
Compare
@MylesBorins, I cherry-picked the commit we need and rebased the PR |
Calling `process.exit()` calls the C `exit()` function, which in turn calls the destructors of static C++ objects. This can lead to race conditions with other concurrently executing threads; disposing of all Worker threads and then the V8 platform instance helps with this (although it might not be a full solution for all problems of this kind). Refs: #24403 Refs: #25007 Backport-PR-URL: #26048 PR-URL: #25061 Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Node first calls `Isolate::Dispose`, then `NodePlatform::UnregisterIsolate`. This again calls `PerIsolatePlatformData::Shutdown`, which (before this patch) called `FlushForegroundTasksInternal`, which might call `RunForegroundTask` if it finds foreground tasks to be executed. This will fail however, since `Isolate::GetCurrent` was already reset during `Isolate::Dispose`. Hence remove the check to `FlushForegroundTasksInternal` and add checks instead that no more foreground tasks are scheduled. Refs: v8#86 Backport-PR-URL: #26048 PR-URL: #25653 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Landed on v10.x-staging |
Calling
process.exit()
calls the Cexit()
function, which in turncalls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).
Refs: #24403
Refs: #25007
PR-URL: #25061
Reviewed-By: Gireesh Punathil gpunathi@in.ibm.com
Reviewed-By: Joyee Cheung joyeec9h3@gmail.com
Reviewed-By: Ben Noordhuis info@bnoordhuis.nl