Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: dispose of V8 platform in process.exit() #25061

Closed
wants to merge 1 commit into from

Conversation

addaleax
Copy link
Member

This is essentially a re-do of #24828, with the addition of also shutting down worker threads before exiting. As pointed out by @Trott, this also is unlikely to fully resolve the issues we were seeing with test-cli-syntax, but even in the worse case this might help with figuring out what the root cause is.


Calling process.exit() calls the C exit() function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: #24403
Refs: #25007

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: nodejs#24403
Refs: nodejs#25007
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Dec 15, 2018
@gireeshpunathil
Copy link
Member

@addaleax - I tested this patch upon most of the currently open issues with exit races, and concluded that this works indeed!

however, with the worker_threads specific test case reported 2 types of issues, when severely stressed, consistent in both AIX and Linux.

  1. pre-mature closure of main - worker communication port:
(gdb) info thr
  Id   Target Id         Frame 
  9    Thread 0x7f853911b700 (LWP 10944) 0x00007f853acf7945 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  8    Thread 0x7f853bd52700 (LWP 10946) 0x00007f853acf7945 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  7    Thread 0x7f853991c700 (LWP 10943) 0x00007f853acf7945 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  6    Thread 0x7f853a11d700 (LWP 10942) 0x00007f853acf7945 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  5    Thread 0x7f853891a700 (LWP 10945) 0x00007f853acf7945 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
  4    Thread 0x7f853a91e700 (LWP 10941) 0x00007f853aa1dd77 in epoll_pwait () from /lib64/libc.so.6
  3    Thread 0x7f8523fff700 (LWP 10948) 0x00007f853aa180b7 in mprotect () from /lib64/libc.so.6
  2    Thread 0x7f853bd32780 (LWP 10940) 0x00007f853acf4f47 in pthread_join ()
   from /lib64/libpthread.so.0
* 1    Thread 0x7f8522ffd700 (LWP 10950) 0x0000000000e5bb56 in node::worker::Message::Deserialize (
    this=0x7f8522ff9450, env=0x37dc140, context=...) at ../src/node_messaging.cc:93
(gdb) thr 2
[Switching to thread 2 (Thread 0x7f853bd32780 (LWP 10940))]
#0  0x00007f853acf4f47 in pthread_join () from /lib64/libpthread.so.0
(gdb) where
#0  0x00007f853acf4f47 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000fd8a09 in uv_thread_join (tid=0x3756738) at ../deps/uv/src/unix/thread.c:227
#2  0x0000000000ec3477 in node::worker::Worker::JoinThread (this=0x3756390) at ../src/node_worker.cc:310
#3  0x0000000000d7d5d7 in node::Environment::stop_sub_worker_contexts (this=0x7ffe5f8b1780)
    at ../src/env.cc:874
#4  0x0000000000d7d547 in node::Environment::Exit (this=0x7ffe5f8b1780, exit_code=0)
    at ../src/env.cc:861
#5  0x0000000000dbd95e in node::Exit (args=...) at ../src/node.cc:733
#6  0x000000000113e1cd in v8::internal::FunctionCallbackArguments::Call (
    this=this@entry=0x7ffe5f8b06e0, handler=handler@entry=0x2e023f322d09)
    at ../deps/v8/src/api-arguments-inl.h:140
#7  0x000000000113ee82 in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (
    isolate=isolate@entry=0x35c24c0, function=..., function@entry=..., new_target=..., 
    new_target@entry=..., fun_data=..., receiver=..., receiver@entry=..., args=...)
    at ../deps/v8/src/builtins/builtins-api.cc:109
#8  0x00000000011432ab in v8::internal::Builtin_Impl_HandleApiCall (args=..., 
    isolate=isolate@entry=0x35c24c0) at ../deps/v8/src/builtins/builtins-api.cc:139
#9  0x0000000001143cc1 in v8::internal::Builtin_HandleApiCall (args_length=6, 
    args_object=0x7ffe5f8b0918, isolate=0x35c24c0) at ../deps/v8/src/builtins/builtins-api.cc:127
#10 0x0000000002344515 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit ()
#11 0x00000080f108cb8e in ?? ()
#12 0x000023a7a0c025a1 in ?? ()
#13 0x00002e023f322d61 in ?? ()
#14 0x0000000600000000 in ?? ()
#15 0x000023a7a0c02681 in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb) thr 1
[Switching to thread 1 (Thread 0x7f8522ffd700 (LWP 10950))]
#0  0x0000000000e5bb56 in node::worker::Message::Deserialize (this=0x7f8522ff9450, env=0x37dc140, 
    context=...) at ../src/node_messaging.cc:93
93	        port->Close();
(gdb) where
#0  0x0000000000e5bb56 in node::worker::Message::Deserialize (this=0x7f8522ff9450, env=0x37dc140, 
    context=...) at ../src/node_messaging.cc:93
#1  0x0000000000e5dd39 in node::worker::MessagePort::OnMessage (this=0x7f8518002ae0)
    at ../src/node_messaging.cc:547
#2  0x0000000000e5d10f in node::worker::MessagePort::<lambda(uv_async_t*)>::operator()(uv_async_t *) const (__closure=0x0, handle=0x7f8518002b50) at ../src/node_messaging.cc:414
#3  0x0000000000e5d12f in node::worker::MessagePort::<lambda(uv_async_t*)>::_FUN(uv_async_t *) ()
    at ../src/node_messaging.cc:415
#4  0x0000000000fc74e8 in uv__async_io (loop=0x37563c0, w=0x3756588, events=1)
    at ../deps/uv/src/unix/async.c:118
#5  0x0000000000fdc4b3 in uv__io_poll (loop=0x37563c0, timeout=-1)
    at ../deps/uv/src/unix/linux-core.c:375
#6  0x0000000000fc7f4b in uv_run (loop=0x37563c0, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:370
#7  0x0000000000ec2b7e in node::worker::Worker::Run (this=0x3756390) at ../src/node_worker.cc:202
#8  0x0000000000ec3d40 in node::worker::Worker::<lambda(void*)>::operator()(void *) const (
    __closure=0x0, arg=0x3756390) at ../src/node_worker.cc:419
#9  0x0000000000ec3d5f in node::worker::Worker::<lambda(void*)>::_FUN(void *) ()
    at ../src/node_worker.cc:419
#10 0x00007f853acf3dd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f853aa1db3d in clone () from /lib64/libc.so.6
(gdb) p port
$2 = (node::worker::MessagePort *) 0x0
(gdb) 
  1. attempt to re-load internal module(s) that was unloaded?
(gdb) where
#0  0x00007f580ed4bf47 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000abdcee in uv_thread_join (tid=<optimized out>) at ../deps/uv/src/unix/thread.c:227
#2  0x00000000009fa480 in node::worker::Worker::JoinThread() ()
#3  0x0000000000920770 in node::Environment::stop_sub_worker_contexts() ()
#4  0x0000000000920803 in node::Environment::Exit(int) ()
#5  0x0000000000949e50 in node::Exit(v8::FunctionCallbackInfo<v8::Value> const&) ()
#6  0x0000000000bdd5c8 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#7  0x0000000000bddfe6 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#8  0x000000000194b8ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit ()
#9  0x00002b40e810816e in ?? ()
#10 0x00003f1e650825a1 in ?? ()
#11 0x00001acc695a2d49 in ?? ()
#12 0x0000000600000000 in ?? ()
#13 0x00003f1e65082681 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) thr 1
[Switching to thread 1 (Thread 0x7f57d67fc700 (LWP 24413))]
#0  0x00007f580e9ac207 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007f580e9ac207 in raise () from /lib64/libc.so.6
#1  0x00007f580e9ad8f8 in abort () from /lib64/libc.so.6
#2  0x00000000009784d1 in node::Abort() ()
#3  0x000000000097912c in node::OnFatalError(char const*, char const*) ()
#4  0x0000000000b4a01a in v8::Utils::ReportApiFailure(char const*, char const*) ()
#5  0x0000000000a302d5 in node::(anonymous namespace)::Initialize ()
#6  0x000000000095a54c in node::binding::GetInternalBinding(v8::FunctionCallbackInfo<v8::Value> const&)
    ()
#7  0x0000000000bdd5c8 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#8  0x0000000000bddfe6 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#9  0x000000000194b8ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit ()

these could be pre-existing conditions that arise only with worker_threads and can be investigated / incorporated independent of this PR, but worthwhile to see if this is something easily fixable?

@addaleax
Copy link
Member Author

@gireeshpunathil Yes, I assume those are pre-existing problems with calling worker.terminate() (which is essentially what this PR is doing)…

For the second one, do you have a way to tell which node::(anonymous namespace)::Initialize() the stack trace is referring to?

The first one is just me having written Obviously Broken™ code and you catching it. I’ll open a separate PR to fix that. 👍

@@ -857,10 +857,13 @@ void Environment::AsyncHooks::grow_async_ids_stack() {
uv_key_t Environment::thread_local_env = {};

void Environment::Exit(int exit_code) {
if (is_main_thread())
if (is_main_thread()) {
stop_sub_worker_contexts();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing this method and its callee chains, and have this doubt:

  • Worker::Exit | Isolate::TerminateExecution - are these callable from another thread? Of course, it is possible to do so, and any thread is able to modify the data structures pertinent to a worker - but we are doing nothing to stop the execution of the thread that is represented by this worker? In other words, how / at what point are we physically stopping the worker thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker::Exit | Isolate::TerminateExecution - are these callable from another thread?

Yes, they are essentially worker.terminate() – in particular, Isolate::TerminateExecution() is one of the few thread-safe V8 functions.

In other words, how / at what point are we physically stopping the worker thread?

We’re stopping the event loop and disallowing JS execution, waiting for everything to finish and then join the thread … as far as I could tell, that’s the best we can do here.

I guess this might be problematic when some threads have environment cleanup handlers registered via addons that block indefinitely? That would not really be a bug on our side, though…

@gireeshpunathil
Copy link
Member

For the second one, do you have a way to tell which node::(anonymous namespace)::Initialize() the stack trace is referring to?

It is uv

@addaleax
Copy link
Member Author

@gireeshpunathil I’m having trouble reproducing the second failure … do you have the stderr somewhere? Usually v8::Utils::ReportApiFailure makes us print something helpful for debugging

@gireeshpunathil
Copy link
Member

@addaleax - FATAL ERROR: v8::ToLocalChecked Empty MaybeLocal. is the message that it prints. Its reproduce frequency increases if you replace the interval mode messaging with a for loop of say 10K or so, in my test case. However, I was unable to recreate it with debug build, and hence the line no. etc. are missing in the call stack.

If you are looking for further information on that, I can attempt further, one of 2 things:

  • recreate with debug build
  • interpret the disassembly of Initialize to figure out which source line maps to the failure.

Pls let me know.

@addaleax
Copy link
Member Author

@gireeshpunathil I think the second option might work well enough? The function is probably huge due to the way we initialize the err_map object…

@gireeshpunathil
Copy link
Member

   0xa3027c <Initialize+23740>:	call   0xb4b310 <_ZN2v82V812ToLocalEmptyEv>
   0xa30281 <Initialize+23745>:	mov    rax,QWORD PTR [rbp-0x48]
   0xa30285 <Initialize+23749>:	jmp    0xa2c962 <Initialize+9122>
   0xa3028a <Initialize+23754>:	nop    WORD PTR [rax+rax*1+0x0]
   0xa30290 <Initialize+23760>:	call   0xb4b310 <_ZN2v82V812ToLocalEmptyEv>
   0xa30295 <Initialize+23765>: jmp    0xa2c945 <Initialize+9093>
   0xa3029a <Initialize+23770>:	nop    WORD PTR [rax+rax*1+0x0]
   0xa302a0 <Initialize+23776>:	mov    QWORD PTR [rbp-0x48],rax
   0xa302a4 <Initialize+23780>:	call   0xb4b310 <_ZN2v82V812ToLocalEmptyEv>
   0xa302a9 <Initialize+23785>:	mov    rax,QWORD PTR [rbp-0x48]
   0xa302ad <Initialize+23789>:	jmp    0xa2c906 <Initialize+9030>
   0xa302b2 <Initialize+23794>:	nop    WORD PTR [rax+rax*1+0x0]
   0xa302b8 <Initialize+23800>:	mov    QWORD PTR [rbp-0x48],rax
   0xa302bc <Initialize+23804>:	call   0xb4b310 <_ZN2v82V812ToLocalEmptyEv>
   0xa302c1 <Initialize+23809>:	mov    rax,QWORD PTR [rbp-0x48]
   0xa302c5 <Initialize+23813>:	jmp    0xa2c8e5 <Initialize+8997>
   0xa302ca <Initialize+23818>:	nop    WORD PTR [rax+rax*1+0x0]
=>   0xa302d0 <Initialize+23824>:	call   0xb4b310 <_ZN2v82V812ToLocalEmptyEv>
(gdb) x/30i 0xa2c8e5
   0xa2c8e5 <Initialize+8997>:	xor    edx,edx
   0xa2c8e7 <Initialize+8999>:	mov    ecx,0xffffffff
   0xa2c8ec <Initialize+9004>:	mov    esi,0x19e7696
   0xa2c8f1 <Initialize+9009>:	mov    rdi,rbx
   0xa2c8f4 <Initialize+9012>:	mov    QWORD PTR [rbp-0x40],rax
   0xa2c8f8 <Initialize+9016>:	call   0xb562c0 <_ZN2v86String14NewFromOneByteEPNS_7IsolateEPKhNS_13NewStringTypeEi>
   0xa2c8fd <Initialize+9021>:	test   rax,rax
   0xa2c900 <Initialize+9024>:	je     0xa302a0 <Initialize+23776>
   0xa2c906 <Initialize+9030>:	lea    rsi,[rbp-0x40]
   0xa2c90a <Initialize+9034>:	mov    edx,0x2
   0xa2c90f <Initialize+9039>:	mov    rdi,rbx
   0xa2c912 <Initialize+9042>:	mov    QWORD PTR [rbp-0x38],rax
   0xa2c916 <Initialize>+9046>:	call   0xb58440 <_ZN2v85Array3NewEPNS_7IsolateEPNS_5LocalINS_5ValueEEEm>
   0xa2c91b <Initialize+9051>:	mov    esi,0xfffffff3
   0xa2c920 <Initialize+9056>:	mov    r15,rax
   0xa2c923 <Initialize+9059>:	mov    rdi,rbx
   0xa2c926 <Initialize+9062>:	call   0xb5c9e0 <_ZN2v87Integer3NewEPNS_7IsolateEi>
   0xa2c92b <Initialize+9067>:	mov    rcx,r15
   0xa2c92e <Initialize+9070>:	mov    rdx,rax
   0xa2c931 <Initialize+9073>:	mov    rsi,r13
   0xa2c934 <Initialize+9076>:	mov    rdi,r14
   0xa2c937 <Initialize+9079>:	call   0xb6fc20 <_ZN2v83Map3SetENS_5LocalINS_7ContextEEENS1_INS_5ValueEEES5_>
   0xa2c93c <Initialize+9084>:	test   rax,rax
   0xa2c93f <Initialize+9087>:	je     0xa30290 <Initialize+23760>
(gdb) x/s 0x19e7696
0x19e7696:	"permission denied"
(gdb) 

with the macro inside the method, it has expanded into a mammoth code!

As far as I can tell, the macro expands into a huge code that creates a number of array objects each contain a mapping between an error code and an error string.

It fails while it composes EACCES | permission denied | UV_EACCESS combinations. (in one core file)

For me, this looks an arbitrary, and the whole point could be that isolate became NULL at some point of its execution?

addaleax added a commit to addaleax/node that referenced this pull request Dec 17, 2018
This can fail when terminating a Worker that loads
the `uv` binding at the same time.

Refs: nodejs#25061 (comment)
@addaleax
Copy link
Member Author

@gireeshpunathil I think the interesting bit would be to know what the function call before the je 0xa302d0 line (which is not in your disassemblies) is?

For me, this looks an arbitrary, and the whole point could be that isolate became NULL at some point of its execution?

Yes, it’s probably somewhat arbitrary, but I don’t think isolate can ever become NULL.

So far, I was under the assumption that Maybe/MaybeLocal calls wouldn’t fail unless they actually called into JS … it looks like this isn’t true for Map::Set (or, maybe, the method calls internal V8 JS code?):

diff --git a/src/uv.cc b/src/uv.cc
index 27daed556ecc..294cb8deeb51 100644
--- a/src/uv.cc
+++ b/src/uv.cc
@@ -64,6 +64,8 @@ void Initialize(Local<Object> target,
                 void* priv) {
   Environment* env = Environment::GetCurrent(context);
   Isolate* isolate = env->isolate();
+  isolate->TerminateExecution();
+  OnScopeLeave cancel([&]() { isolate->CancelTerminateExecution(); });
   target->Set(env->context(),
               FIXED_ONE_BYTE_STRING(isolate, "errname"),
               env->NewFunctionTemplate(ErrName)

This makes the issue reproducible with a debug build. I’m opening a PR to address this: #25079

(Side note: Thanks for going through all of these issues!)

@gireeshpunathil
Copy link
Member

@addaleax - thanks, with this, #25076 and #25079 all my tests are passing.

@Trott
Copy link
Member

Trott commented Dec 17, 2018

@Trott Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 17, 2018
@gireeshpunathil
Copy link
Member

please don't land this yet, I may have a test failing scenario that I suspect is root-caused by this. I may be wrong; but can confirm in couple of hours and then clarify with @addaleax too. thanks!

@Trott Trott added the wip Issues and PRs that are still a work in progress. label Dec 18, 2018
@Trott
Copy link
Member

Trott commented Dec 18, 2018

please don't land this yet, I may have a test failing scenario that I suspect is root-caused by this. I may be wrong; but can confirm in couple of hours and then clarify with @addaleax too. thanks!

I've added the work in progress (WIP) label. Feel free to remove it if you find that there is no problem and the code can land.

@gireeshpunathil
Copy link
Member

ok, I completed my testing. This was my suspected scenario:

(gdb) where
#0  0x0000000000954fe6 in std::_Rb_tree<std::string, std::pair<std::string const, node::UnionBytes>, std::_Select1st<std::pair<std::string const, node::UnionBytes> >, std::less<std::string>, std::allocator<std::pair<std::string const, node::UnionBytes> > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, node::UnionBytes> >*) ()
#1  0x00007fb38a7acb69 in __run_exit_handlers () from /lib64/libc.so.6
#2  0x00007fb38a7acbb7 in exit () from /lib64/libc.so.6
#3  0x00000000009206b1 in node::Environment::Exit(int) ()
#4  0x0000000000949e80 in node::Exit(v8::FunctionCallbackInfo<v8::Value> const&) ()
#5  0x0000000000bdd738 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#6  0x0000000000bde156 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#7  0x000000000194b8ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit ()
#8  0x0000091797c0816e in ?? ()
#9  0x000017a3001825a1 in ?? ()
#10 0x00003d89b76a2d41 in ?? ()
#11 0x0000000600000000 in ?? ()
#12 0x000017a300182681 in ?? ()
#13 0x0000000000000000 in ?? ()
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fb35bfff700 (LWP 53215))]
#0  0x00007fb38a7a9207 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007fb38a7a9207 in raise () from /lib64/libc.so.6
#1  0x00007fb38a7aa8f8 in abort () from /lib64/libc.so.6
#2  0x0000000000abdf34 in uv_mutex_lock (mutex=<optimized out>) at ../deps/uv/src/unix/thread.c:286
#3  0x0000000000978a52 in node::EnvGetter(v8::Local<v8::Name>, v8::PropertyCallbackInfo<v8::Value> const&) ()
#4  0x000000000109c15c in v8::internal::(anonymous namespace)::GetPropertyWithInterceptorInternal(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::InterceptorInfo>, bool*) ()
#5  0x00000000010ad8ef in v8::internal::Object::GetProperty(v8::internal::LookupIterator*, v8::internal::OnNonExistent) ()
#6  0x0000000000ff08dd in v8::internal::LoadIC::Load(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Name>) ()
#7  0x0000000000ff3ac7 in v8::internal::Runtime_LoadIC_Miss(int, v8::internal::Object**, v8::internal::Isolate*) ()
#8  0x000000000194b8ae in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit ()
#9  0x000000000197182a in Builtins_LdaNamedPropertyHandler ()

gireeshpunathil@6aba136 is the test case I used. testing with various scenarios I figured out that:

  • this failure is not caused by this PR changes
  • this failure is also not caused changes from e273abc and 9f7e3a4 . In short, this is a pre-existing scenario.

In summary, this PR addresses all the current issues reported against many test cases (in AIX and Linux mostly) while does not cause any new issues.

there could be corner cases that still need to be fixed, for which there is no identified CI failure yet, so this is good to go!

@gireeshpunathil gireeshpunathil removed the wip Issues and PRs that are still a work in progress. label Dec 18, 2018
@addaleax
Copy link
Member Author

there could be corner cases that still need to be fixed, for which there is no identified CI failure yet, so this is good to go!

This PR does have the potential to greatly amplify such problems, so I do think they should be looked into.

I also personally feel reluctant to land this without first having understood why e273abc introduced the issues we’re seeing, because this PR is most likely not a direct fix for it…

@gireeshpunathil
Copy link
Member

I have evidence that the issue this PR resolves exists in v10.x lines, as revealed through #25814 , but don't know the backport cadence rules. @nodejs/lts

@MylesBorins
Copy link
Contributor

@gireeshpunathil I've backported in #26048

General rule is if it is semver-patch and has been on current for more than 2 weeks without major regressions it can be backported to LTS

BethGriggs pushed a commit that referenced this pull request Feb 13, 2019
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: #24403
Refs: #25007

Backport-PR-URL: #26048
PR-URL: #25061
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@MylesBorins
Copy link
Contributor

This change seems to have cause a bunch of breakage in the 10.15.2 PR, we are backing it out for now and reopening the backport to track why

BethGriggs pushed a commit to MylesBorins/node that referenced this pull request Mar 27, 2019
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: nodejs#24403
Refs: nodejs#25007

PR-URL: nodejs#25061
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BethGriggs pushed a commit that referenced this pull request Mar 28, 2019
Calling `process.exit()` calls the C `exit()` function, which in turn
calls the destructors of static C++ objects. This can lead to race
conditions with other concurrently executing threads; disposing of all
Worker threads and then the V8 platform instance helps with this
(although it might not be a full solution for all problems of
this kind).

Refs: #24403
Refs: #25007

Backport-PR-URL: #26048
PR-URL: #25061
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
BethGriggs pushed a commit that referenced this pull request Apr 29, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
@BethGriggs BethGriggs mentioned this pull request May 1, 2019
BethGriggs pushed a commit that referenced this pull request May 10, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
MylesBorins pushed a commit that referenced this pull request May 16, 2019
A number of tests that were `flaked` recently are proved
to have failing reason identified in
#25007 and resolution
identified in #25061

Revoke flaky designation of all these tests as the said
PR is landed.

PR-URL: #25611
Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Beth Griggs <Bethany.Griggs@uk.ibm.com>
Trott pushed a commit to Trott/io.js that referenced this pull request Jul 26, 2019
This can fail when terminating a Worker that loads
the `uv` binding at the same time.

Refs: nodejs#25061 (comment)
Fixes: nodejs#25134
PR-URL: nodejs#25079
Backport-PR-URL: nodejs#28832
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
BethGriggs pushed a commit that referenced this pull request Jul 30, 2019
This can fail when terminating a Worker that loads
the `uv` binding at the same time.

Refs: #25061 (comment)
Fixes: #25134
PR-URL: #25079
Backport-PR-URL: #28832
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants