Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js v10.15.0 segfault in BackgroundRunner → CancelableTask::Run → ConcurrentMarking::Run #25814

Closed
Cabalbl4 opened this issue Jan 30, 2019 · 28 comments

Comments

@Cabalbl4
Copy link

Cabalbl4 commented Jan 30, 2019

  • Version: v10.15.0
  • Platform: docker with linux-alpine on centos
  • Subsystem: BackgroundRunner ?

Node.js v10.15.0 segfault in BackgroundRunner → CancelableTask::Run → ConcurrentMarking::Run

We are running node.js in docker on centos nodes:

$ uname -a 
Linux *redacted* 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux 
$ docker --version 
Docker version 18.06.1-ce, build e68fc7a
$ cat /etc/centos-release
CentOS Linux release 7.5.1804 (Core) 

Recently, we migrated our image to new node version:

FROM node:8.12.0-alpine → FROM node:10.15.0-alpine

We started to observe lots of segfaults in prod:

[Wed Jan 30 13:16:31 2019] node[19293]: segfault at 55717a726770 ip 000055717a726770 sp 00007f7e965317f8 error 15 
[Wed Jan 30 13:16:31 2019] node[19292]: segfault at 55717a726770 ip 000055717a726770 sp 00007f7e96d347f8 error 15 


[Wed Jan 30 13:29:02 2019] node[2609]: segfault at 560f719ce130 ip 0000560f719ce130 sp 00007f895ffe17f8 error 15 
[Wed Jan 30 13:29:02 2019] node[2608]: segfault at 560f719ce130 ip 0000560f719ce130 sp 00007f89607e47f8 error 15 
[Wed Jan 30 13:29:02 2019] node[2607]: segfault at 560f719ce130 ip 0000560f719ce130 sp 00007f8960fe77f8 error 15 

[Wed Jan 30 13:29:02 2019] node[2610]: segfault at 560f719ce130 ip 0000560f719ce130 sp 00007f895f7de7f8 error 15 


[Wed Jan 30 13:42:49 2019] node[30532]: segfault at 55ef5d41a090 ip 000055ef5d41a090 sp 00007f7910e378e8 error 15

We use node to spawn a lot of puppeteer scrapers (adding this, because puppeteer/puppeteer#2872 may be related)

I was able to get a few core dumps from inside container, here is the stack:


(llnode) bt all
* thread #1, name = 'node', stop reason = signal SIGSEGV
  * frame #0: 0x00007fa93ebebae0 node`node::PromiseWrap::~PromiseWrap()
    frame #1: 0x000055fcfe72a87f node`v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) + 9855
    frame #2: 0x000055fcfe42af7d node`v8::internal::CancelableTask::Run() + 61
    frame #3: 0x000055fcfe1b37fd node`node::BackgroundRunner(void*) + 317
  thread #2, stop reason = signal 0
    frame #0: 0x00007fa93e65e5e4 node
  thread #3, stop reason = signal 0
    frame #0: 0x00007fa93e65e5e4 node
  thread #4, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #5, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #6, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #7, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #8, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #9, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #10, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #11, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #12, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #13, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #14, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #15, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #16, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #17, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #18, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #19, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #20, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #21, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #22, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #23, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #24, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #25, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #26, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #27, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #28, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #29, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #30, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #31, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #32, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #33, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #34, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #35, stop reason = signal 0
    frame #0: 0x00007fa93e65e5e4 node
  thread #36, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #37, stop reason = signal 0
    frame #0: 0x00007fa93e65e5e4 node
  thread #38, stop reason = signal 0
    frame #0: 0x00007fa93e65e5e4 node
  thread #39, stop reason = signal 0
    frame #0: 0x00007fa93ebebae0 node`node::PromiseWrap::~PromiseWrap()
    frame #1: 0x000055fcfe72a87f node`v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) + 9855
    frame #2: 0x000055fcfe42af7d node`v8::internal::CancelableTask::Run() + 61
    frame #3: 0x000055fcfe1b37fd node`node::BackgroundRunner(void*) + 317
  thread #40, stop reason = signal 0
    frame #0: 0x00007fa93e62adc3 node
    frame #1: node`uv_run(loop=0xffffffffffffffff, mode=UV_RUN_DEFAULT) at core.c:370
    frame #2: 0x000055fcfe1b7029 node`node::BackgroundTaskRunner::DelayedTaskScheduler::Start()::'lambda'(void*)::_FUN(void*) + 137
  thread #41, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #42, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #43, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #44, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #45, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #46, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #47, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #48, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #49, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #50, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #51, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #52, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #53, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #54, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #55, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #56, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #57, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #58, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #59, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #60, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #61, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #62, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #63, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #64, stop reason = signal 0
    frame #0: 0x00007fa93e65c636 node
  thread #65, stop reason = signal 0
    frame #0: 0x00007fa93ebebae0 node`node::PromiseWrap::~PromiseWrap()
    frame #1: 0x000055fcfe72a87f node`v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) + 9855
    frame #2: 0x000055fcfe42af7d node`v8::internal::CancelableTask::Run() + 61
    frame #3: 0x000055fcfe1b37fd node`node::BackgroundRunner(void*) + 317

Other core dumps also contained ConcurrentMarking::Run as last instruction, ~PromiseWrap was not always there.

Env parameters that may be useful:

PUPPETEER_NO_SANDBOX=1
--ulimit nofile=100000:100000
UV_THREADPOOL_SIZE=64
@Cabalbl4 Cabalbl4 changed the title Node.js v10.15.0 segfault in BackgroundRunner →CancelableTask::Run -> ConcurrentMarking::Run Node.js v10.15.0 segfault in BackgroundRunner → CancelableTask::Run → ConcurrentMarking::Run Jan 30, 2019
@gireeshpunathil
Copy link
Member

just wondering where is the main thread! its state at the time of fault may be the key

@gireeshpunathil
Copy link
Member

@Cabalbl4 - could you do thread info to get the long list of threads and identify the one whose tid matches the pid of the faulted node, switch to that thread, and do bt?

@Cabalbl4
Copy link
Author

Core dump pid was 14894

thread #38: tid = 14894, 0x00007fa93e65e5e4 node, stop reason = signal 0
(llnode) thread select 38
(llnode) * thread #38, stop reason = signal 0
    frame #0: 0x00007fa93e65e5e4 node
->  0x7fa93e65e5e4: addb   %al, (%rax)
    0x7fa93e65e5e6: addb   %al, (%rax)
    0x7fa93e65e5e8: addb   %al, (%rax)
    0x7fa93e65e5ea: addb   %al, (%rax)
(llnode) bt
* thread #38, stop reason = signal 0
  * frame #0: 0x00007fa93e65e5e4 node
(llnode) frame info
frame #0: 0x00007fa93e65e5e4 node

@gireeshpunathil
Copy link
Member

thanks! but unfortunately that does not reveal anything - main thread is supposed to be having some frames (at least node::Start) on it while helpers are alive.

Could you try with gdb please?

@Cabalbl4
Copy link
Author

No, can not find node::Start in gdb

Core was generated by `node --nouse_idle_notification --expose_gc --no-deprecation --no-warnings updat'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa93e898ae0 in ?? ()
[Current thread is 1 (LWP 14899)]
(gdb) thread apply all bt

Thread 65 (LWP 14896):
#0  0x00007fa93e898ae0 in ?? ()
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x000055fcfe42af7d in v8::internal::CancelableTask::Run() ()
#3  0x000055fcfe1b37fd in node::BackgroundRunner(void*) ()
#4  0x00007fa93e65c6c2 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 64 (LWP 15051):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa93662ea90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e682d73 in ?? ()
#7  0x00007fa93662e9cc in ?? ()
#8  0x00007fa93662e9b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa93662ea88 in ?? ()
#11 0x00007fa93662eaa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa93662e9e8 in ?? ()
#14 0x0000020000000004 in ?? ()
---Type <return> to continue, or q <return> to quit---
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa93662e9cc in ?? ()
#18 0x00007fa93662e990 in ?? ()
#19 0x00007fa93662e98c in ?? ()
#20 0x00007fa93662e9cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x000000003662ea10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 63 (LWP 15052):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa935e2ba90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e683098 in ?? ()
#7  0x00007fa935e2b9cc in ?? ()
#8  0x00007fa935e2b9b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
---Type <return> to continue, or q <return> to quit---
#10 0x00007fa935e2ba88 in ?? ()
#11 0x00007fa935e2baa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa935e2b9e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa935e2b9cc in ?? ()
#18 0x00007fa935e2b990 in ?? ()
#19 0x00007fa935e2b98c in ?? ()
#20 0x00007fa935e2b9cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x0000000035e2ba10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 62 (LWP 15045):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa939640a90 in ?? ()
---Type <return> to continue, or q <return> to quit---
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e682389 in ?? ()
#7  0x00007fa9396409cc in ?? ()
#8  0x00007fa9396409b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa939640a88 in ?? ()
#11 0x00007fa939640aa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa9396409e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa9396409cc in ?? ()
#18 0x00007fa939640990 in ?? ()
#19 0x00007fa93964098c in ?? ()
#20 0x00007fa9396409cc in ?? ()
#21 0x0000002200000000 in ?? ()
#22 0x0000000039640a10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 61 (LWP 15095):
---Type <return> to continue, or q <return> to quit---
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 60 (LWP 15097):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 59 (LWP 15085):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 58 (LWP 15086):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 57 (LWP 15087):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 56 (LWP 15088):
---Type <return> to continue, or q <return> to quit---
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 55 (LWP 15089):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 54 (LWP 15090):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 53 (LWP 15091):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 52 (LWP 15092):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 51 (LWP 15093):
---Type <return> to continue, or q <return> to quit---
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 50 (LWP 15078):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 49 (LWP 15079):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 48 (LWP 15080):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 47 (LWP 15082):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 46 (LWP 15081):
---Type <return> to continue, or q <return> to quit---
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 45 (LWP 15083):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 44 (LWP 15084):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 43 (LWP 15075):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 42 (LWP 15076):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 41 (LWP 15077):
---Type <return> to continue, or q <return> to quit---
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 40 (LWP 14895):
#0  0x00007fa93e62adc3 in ?? ()
#1  0x000055fd0101fc88 in ?? ()
#2  0x000055fd0101fb00 in ?? ()
#3  0x00007fa93e0a4980 in ?? ()
#4  0x000055fcfe2b2103 in uv__io_poll (loop=loop@entry=0x55fd0101faa8, timeout=-1) at ../deps/uv/src/unix/linux-core.c:275
#5  0x000055fcfe2a118b in uv_run (loop=0x55fd0101faa8, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:370
#6  0x000055fcfe1b7029 in node::BackgroundTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*) ()
#7  0x00007fa93e65c6c2 in ?? ()
#8  0x0000000000000000 in ?? ()

Thread 39 (LWP 14897):
#0  0x00007fa93e898ae0 in ?? ()
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x000055fcfe42af7d in v8::internal::CancelableTask::Run() ()
#3  0x000055fcfe1b37fd in node::BackgroundRunner(void*) ()
#4  0x00007fa93e65c6c2 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 38 (LWP 14894):
#0  0x00007fa93e65e5e4 in ?? ()
---Type <return> to continue, or q <return> to quit---
#1  0x00007fa93e65bd2d in ?? ()
#2  0x00007fa93e89abd0 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 37 (LWP 14898):
#0  0x00007fa93e65e5e4 in ?? ()
#1  0x00007fa93e65bd2d in ?? ()
#2  0x00007fa93c89bb30 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 36 (LWP 15053):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa935628a90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e6831b0 in ?? ()
#7  0x00007fa9356289cc in ?? ()
#8  0x00007fa9356289b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa935628a88 in ?? ()
#11 0x00007fa935628aa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa9356289e8 in ?? ()
---Type <return> to continue, or q <return> to quit---
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa9356289cc in ?? ()
#18 0x00007fa935628990 in ?? ()
#19 0x00007fa93562898c in ?? ()
#20 0x00007fa9356289cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x0000000035628a10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 35 (LWP 15054):
#0  0x00007fa93e65e5e4 in ?? ()
#1  0x00007fa93e65bd2d in ?? ()
#2  0x00007fa934e26b30 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 34 (LWP 15064):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 33 (LWP 15063):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 32 (LWP 15065):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 31 (LWP 15068):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 30 (LWP 15066):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 29 (LWP 15070):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 28 (LWP 15069):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 27 (LWP 15074):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 26 (LWP 15073):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 25 (LWP 15067):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 24 (LWP 15072):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 23 (LWP 15071):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 22 (LWP 15050):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa936e31a90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e682c2a in ?? ()
#7  0x00007fa936e319cc in ?? ()
#8  0x00007fa936e319b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa936e31a88 in ?? ()
#11 0x00007fa936e31aa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa936e319e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa936e319cc in ?? ()
---Type <return> to continue, or q <return> to quit---
#18 0x00007fa936e31990 in ?? ()
#19 0x00007fa936e3198c in ?? ()
#20 0x00007fa936e319cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x0000000036e31a10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 21 (LWP 15049):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa937634a90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e68299d in ?? ()
#7  0x00007fa9376349cc in ?? ()
#8  0x00007fa9376349b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa937634a88 in ?? ()
#11 0x00007fa937634aa8 in ?? ()
#12 0x000009c400000004 in ?? ()
---Type <return> to continue, or q <return> to quit---
#13 0x00007fa9376349e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa9376349cc in ?? ()
#18 0x00007fa937634990 in ?? ()
#19 0x00007fa93763498c in ?? ()
#20 0x00007fa9376349cc in ?? ()
#21 0x0000002200000000 in ?? ()
#22 0x0000000037634a10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 20 (LWP 15048):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa937e37a90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e68289c in ?? ()
#7  0x00007fa937e379cc in ?? ()
---Type <return> to continue, or q <return> to quit---
#8  0x00007fa937e379b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa937e37a88 in ?? ()
#11 0x00007fa937e37aa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa937e379e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa937e379cc in ?? ()
#18 0x00007fa937e37990 in ?? ()
#19 0x00007fa937e3798c in ?? ()
#20 0x00007fa937e379cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x0000000037e37a10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 19 (LWP 15047):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
---Type <return> to continue, or q <return> to quit---
#3  0x0000000200000001 in ?? ()
#4  0x00007fa93863aa90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e6825eb in ?? ()
#7  0x00007fa93863a9cc in ?? ()
#8  0x00007fa93863a9b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa93863aa88 in ?? ()
#11 0x00007fa93863aaa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa93863a9e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa93863a9cc in ?? ()
#18 0x00007fa93863a990 in ?? ()
#19 0x00007fa93863a98c in ?? ()
#20 0x00007fa93863a9cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x000000003863aa10 in ?? ()
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 18 (LWP 15046):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e645b2a in ?? ()
#2  0x0000000000000002 in ?? ()
#3  0x0000000200000001 in ?? ()
#4  0x00007fa938e3da90 in ?? ()
#5  0x00007fa93e644269 in ?? ()
#6  0x000001689e682502 in ?? ()
#7  0x00007fa938e3d9cc in ?? ()
#8  0x00007fa938e3d9b0 in ?? ()
#9  0x00000000000009c4 in ?? ()
#10 0x00007fa938e3da88 in ?? ()
#11 0x00007fa938e3daa8 in ?? ()
#12 0x000009c400000004 in ?? ()
#13 0x00007fa938e3d9e8 in ?? ()
#14 0x0000020000000004 in ?? ()
#15 0x0000000000000200 in ?? ()
#16 0x0000000000001388 in ?? ()
#17 0x00007fa938e3d9cc in ?? ()
#18 0x00007fa938e3d990 in ?? ()
#19 0x00007fa938e3d98c in ?? ()
#20 0x00007fa938e3d9cc in ?? ()
#21 0x0000003200000000 in ?? ()
#22 0x0000000038e3da10 in ?? ()
---Type <return> to continue, or q <return> to quit---
#23 0x000000103e651b82 in ?? ()
#24 0x0001000100000013 in ?? ()
#25 0x00007fa93e64561f in ?? ()
#26 0x0000000000000013 in ?? ()
#27 0x0000000000000000 in ?? ()

Thread 17 (LWP 15108):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 16 (LWP 15107):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 15 (LWP 15106):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 14 (LWP 15105):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 13 (LWP 15104):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 12 (LWP 15103):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 11 (LWP 15102):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 10 (LWP 15101):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 9 (LWP 15100):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 8 (LWP 15099):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 7 (LWP 15098):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 6 (LWP 15096):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 5 (LWP 15094):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (LWP 15062):
#0  0x00007fa93e65c636 in ?? ()
#1  0x00007fa93e65e5b1 in ?? ()
#2  0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---

Thread 3 (LWP 15055):
#0  0x00007fa93e65e5e4 in ?? ()
#1  0x00007fa93e65bd2d in ?? ()
#2  0x00007fa934623b30 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (LWP 14900):
#0  0x00007fa93e65e5e4 in ?? ()
#1  0x00007fa93e65bd2d in ?? ()
#2  0x00007fa93e896b30 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 1 (LWP 14899):
#0  0x00007fa93e898ae0 in ?? ()
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x000055fcfe42af7d in v8::internal::CancelableTask::Run() ()
#3  0x000055fcfe1b37fd in node::BackgroundRunner(void*) ()
#4  0x00007fa93e65c6c2 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb) 
(gdb) 
(gdb) thread 38
[Switching to thread 38 (LWP 14894)]
#0  0x00007fa93e65e5e4 in ?? ()
(gdb) bt
#0  0x00007fa93e65e5e4 in ?? ()
#1  0x00007fa93e65bd2d in ?? ()
#2  0x00007fa93e89abd0 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb) 

@Cabalbl4
Copy link
Author

Cabalbl4 commented Jan 30, 2019

I examined other dumps I have and see same picture, but with less threads.
Here is other example (pid 32247):

(gdb) thread apply all bt

Thread 7 (LWP 32249):
#0  0x00005583a584bb90 in ?? ()
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005583a1e00f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005583a1b897fd in node::BackgroundRunner(void*) ()
#4  0x00007f0e3cc0f6c2 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 6 (LWP 32247):
#0  0x00007f0e3cc115e4 in ?? ()
#1  0x00007f0e3cc0ed2d in ?? ()
#2  0x00007f0e3ce4dbd0 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 5 (LWP 32252):
#0  0x00005583a584bb90 in ?? ()
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005583a1e00f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005583a1b897fd in node::BackgroundRunner(void*) ()
#4  0x00007f0e3cc0f6c2 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 4 (LWP 32251):
#0  0x00007f0e3cc115e4 in ?? ()
---Type <return> to continue, or q <return> to quit---
#1  0x00007f0e3cc0ed2d in ?? ()                                                                                                                                                                                                                                                                                                                                           
#2  0x00007f0e3ae4eb30 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#3  0x0000000000000000 in ?? ()                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                          
Thread 3 (LWP 32248):                                                                                                                                                                                                                                                                                                                                                     
#0  0x00007f0e3cbdddc3 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#1  0x00005583a56deb68 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#2  0x00005583a56de9e0 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#3  0x00007f0e3c657980 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#4  0x00005583a1c88103 in uv__io_poll (loop=loop@entry=0x5583a56de988, timeout=-1) at ../deps/uv/src/unix/linux-core.c:275                                                                                                                                                                                                                                                
#5  0x00005583a1c7718b in uv_run (loop=0x5583a56de988, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:370                                                                                                                                                                                                                                                             
#6  0x00005583a1b8d029 in node::BackgroundTaskRunner::DelayedTaskScheduler::Start()::{lambda(void*)#1}::_FUN(void*) ()                                                                                                                                                                                                                                                    
#7  0x00007f0e3cc0f6c2 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#8  0x0000000000000000 in ?? ()                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                          
Thread 2 (LWP 32253):                                                                                                                                                                                                                                                                                                                                                     
#0  0x00007f0e3cc115e4 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#1  0x00007f0e3cc0ed2d in ?? ()                                                                                                                                                                                                                                                                                                                                           
#2  0x00007f0e3ce49b30 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#3  0x0000000000000000 in ?? ()                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                                                                          
Thread 1 (LWP 32250):                                                                                                                                                                                                                                                                                                                                                     
#0  0x00005583a584bb90 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005583a1e00f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005583a1b897fd in node::BackgroundRunner(void*) ()
#4  0x00007f0e3cc0f6c2 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb) 

@gireeshpunathil
Copy link
Member

@Cabalbl4 - thanks.

  1. The frames that are executing instructions in the range 0x00007f... most probably belongs to libc and libpthread libraries, and my gut feeling is that the debugger is not able to show symbols because either those are not loaded or not compatible with the core. Are you launching it in another system than where it failed? (people usually dont debug in production so that is fine).

If so, could you please do ldd node in production and get all its dependents into the debug box and retry?

On the fact the main thread is not showing any symbols, one reason could be that it is executing JIT compiled code, but llnode is supposed to address that!

If we can ascertain that main thread indeed in JS land, that would eliminate one of the suspect I had on the crash. By any chance these crashes are observed when the application was about to close? Or is it a webapp with a service loop?

or

  1. If you can recreate with minimal code and if you can share it with me I am happy to debug.

or

  1. Detecting the immediate reason for the crash from the context and walking backwards. This involves figuring out the jump target into frame 0 from frame 1, and dumping disassembly from that point upto the crashing instruction. Then for me to map that in the source and make some meaning out of it. It could be an iterative work and the effort will roughly depend on the offset of failing code from the beginning of the routine.

Please let me know which way you want to go.

also pinging @nodejs/v8 to see if they have a better proposal.

@Cabalbl4
Copy link
Author

@gireeshpunathil I will try to get all dependencies from production and load them into debugger.

It is really hard to pin-point JS source of problem, since program is more than 10k lines :(

This will take some time (have some urgent tasks), I will ping you once all is ready. Most likely start of next week. Sorry for that.

@gireeshpunathil
Copy link
Member

no issues, thanks! meanwhile we may also hear from others if they have a say on this.

@Cabalbl4
Copy link
Author

Cabalbl4 commented Jan 31, 2019

@gireeshpunathil I re-created container locally, installed gdb, and produced more useful stacks, since all production deps are loaded.

ldd /usr/local/bin/node 
       /lib/ld-musl-x86_64.so.1 (0x7f1ad0166000) 
       libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x7f1acdbbf000) 
       libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7f1acd9ad000) 
       libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f1ad0166000)

I attach those as files.
stack.pid476.txt
stack.pid14894.txt
stack.pid32247.txt

@Cabalbl4
Copy link
Author

I see some "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" in those new stacks.
Maybe some problem with pointers?

@gireeshpunathil
Copy link
Member

to see if all 3 dumps show the same pattern / location, can you issue:
x/15i ($pc-40) on the failing thread, on each?

@Cabalbl4
Copy link
Author

Cabalbl4 commented Feb 4, 2019

@gireeshpunathil

Looks like same problem for threads where backtrace failed:

PID: 14894

(gdb) thread 40
[Switching to thread 40 (LWP 14895)]                                                                                                                                                                                                                                                                                                                                      
#0  0x00007fa93e62adc3 in epoll_pwait () from /lib/ld-musl-x86_64.so.1                                                                                                                                                                                                                                                                                                    
(gdb) bt                                                                                                                                                                                                                                                                                                                                                                  
#0  0x00007fa93e62adc3 in epoll_pwait () from /lib/ld-musl-x86_64.so.1                                                                                                                                                                                                                                                                                                    
#1  0x000055fd0101fb00 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#2  0x00007fa93e0a4980 in ?? ()                                                                                                                                                                                                                                                                                                                                           
#3  0x000055fcfe2b2103 in uv__io_poll (loop=0x55fd0101faa8, timeout=1818322490) at ../deps/uv/src/unix/linux-core.c:275                                                                                                                                                                                                                                                   
Backtrace stopped: previous frame inner to this frame (corrupt stack?)                                                                                                                                                                                                                                                                                                    
(gdb) x/15i ($pc-40)                                                                                                                                                                                                                                                                                                                                                      
   0x7fa93e62ad9b <epoll_ctl+24>:       mov    %eax,%edi                                                                                                                                                                                                                                                                                                                  
   0x7fa93e62ad9d <epoll_ctl+26>:       callq  0x7fa93e629d61                                                                                                                                                                                                                                                                                                             
   0x7fa93e62ada2 <epoll_ctl+31>:       pop    %rdx                                                                                                                                                                                                                                                                                                                       
   0x7fa93e62ada3 <epoll_ctl+32>:       retq                                                                                                                                                                                                                                                                                                                              
   0x7fa93e62ada4 <epoll_pwait>:        push   %rbp                                                                                                                                                                                                                                                                                                                       
   0x7fa93e62ada5 <epoll_pwait+1>:      push   %rbx
   0x7fa93e62ada6 <epoll_pwait+2>:      movslq %ecx,%rbx
   0x7fa93e62ada9 <epoll_pwait+5>:      movslq %edx,%rdx
   0x7fa93e62adac <epoll_pwait+8>:      movslq %edi,%rdi
   0x7fa93e62adaf <epoll_pwait+11>:     mov    %rbx,%r10
   0x7fa93e62adb2 <epoll_pwait+14>:     sub    $0x8,%rsp
   0x7fa93e62adb6 <epoll_pwait+18>:     mov    $0x8,%r9d
   0x7fa93e62adbc <epoll_pwait+24>:     mov    $0x119,%eax
   0x7fa93e62adc1 <epoll_pwait+29>:     syscall 
=> 0x7fa93e62adc3 <epoll_pwait+31>:     cmp    $0xffffffda,%eax
 
PID: 32247

(gdb) thread 3
[Switching to thread 3 (LWP 32248)]
#0  0x00007f0e3cbdddc3 in epoll_pwait () from /lib/ld-musl-x86_64.so.1
(gdb) bt
#0  0x00007f0e3cbdddc3 in epoll_pwait () from /lib/ld-musl-x86_64.so.1
#1  0x00005583a56de9e0 in ?? ()
#2  0x00007f0e3c657980 in ?? ()
#3  0x00005583a1c88103 in uv__io_poll (loop=0x5583a56de988, timeout=1296779298) at ../deps/uv/src/unix/linux-core.c:275
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) x/15i ($pc-40)
   0x7f0e3cbddd9b <epoll_ctl+24>:       mov    %eax,%edi
   0x7f0e3cbddd9d <epoll_ctl+26>:       callq  0x7f0e3cbdcd61
   0x7f0e3cbddda2 <epoll_ctl+31>:       pop    %rdx
   0x7f0e3cbddda3 <epoll_ctl+32>:       retq   
   0x7f0e3cbddda4 <epoll_pwait>:        push   %rbp
   0x7f0e3cbddda5 <epoll_pwait+1>:      push   %rbx
   0x7f0e3cbddda6 <epoll_pwait+2>:      movslq %ecx,%rbx
   0x7f0e3cbddda9 <epoll_pwait+5>:      movslq %edx,%rdx
   0x7f0e3cbdddac <epoll_pwait+8>:      movslq %edi,%rdi
   0x7f0e3cbdddaf <epoll_pwait+11>:     mov    %rbx,%r10
   0x7f0e3cbdddb2 <epoll_pwait+14>:     sub    $0x8,%rsp
   0x7f0e3cbdddb6 <epoll_pwait+18>:     mov    $0x8,%r9d
   0x7f0e3cbdddbc <epoll_pwait+24>:     mov    $0x119,%eax
   0x7f0e3cbdddc1 <epoll_pwait+29>:     syscall 
=> 0x7f0e3cbdddc3 <epoll_pwait+31>:     cmp    $0xffffffda,%eax


PID: 476

(gdb) thread 63
[Switching to thread 63 (LWP 477)]
#0  0x00007fd00de80dc3 in epoll_pwait () from /lib/ld-musl-x86_64.so.1
(gdb) bt
#0  0x00007fd00de80dc3 in epoll_pwait () from /lib/ld-musl-x86_64.so.1
#1  0x00005565f9b880c0 in ?? ()
#2  0x00007fd00d8fa980 in ?? ()
#3  0x00005565f7ff8103 in uv__io_poll (loop=0x5565f9b88068, timeout=21861) at ../deps/uv/src/unix/linux-core.c:275
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) x/15i ($pc-40)
   0x7fd00de80d9b <epoll_ctl+24>:       mov    %eax,%edi
   0x7fd00de80d9d <epoll_ctl+26>:       callq  0x7fd00de7fd61
   0x7fd00de80da2 <epoll_ctl+31>:       pop    %rdx
   0x7fd00de80da3 <epoll_ctl+32>:       retq   
   0x7fd00de80da4 <epoll_pwait>:        push   %rbp
   0x7fd00de80da5 <epoll_pwait+1>:      push   %rbx
   0x7fd00de80da6 <epoll_pwait+2>:      movslq %ecx,%rbx
   0x7fd00de80da9 <epoll_pwait+5>:      movslq %edx,%rdx
   0x7fd00de80dac <epoll_pwait+8>:      movslq %edi,%rdi
   0x7fd00de80daf <epoll_pwait+11>:     mov    %rbx,%r10
   0x7fd00de80db2 <epoll_pwait+14>:     sub    $0x8,%rsp
   0x7fd00de80db6 <epoll_pwait+18>:     mov    $0x8,%r9d
   0x7fd00de80dbc <epoll_pwait+24>:     mov    $0x119,%eax
   0x7fd00de80dc1 <epoll_pwait+29>:     syscall 
=> 0x7fd00de80dc3 <epoll_pwait+31>:     cmp    $0xffffffda,%eax

@gireeshpunathil
Copy link
Member

@Cabalbl4 - sorry; but in this case none of the 3 threads are the failing threads! We are interested in the disassembly of only the failing one. Can you dump that? thanks!

@Cabalbl4
Copy link
Author

Cabalbl4 commented Feb 4, 2019

@gireeshpunathil you need the threads with segfaults, right? Will do.

@Cabalbl4
Copy link
Author

Cabalbl4 commented Feb 4, 2019

@gireeshpunathil

PID: 14894

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa93e898ae0 in __bss_start () from /lib/ld-musl-x86_64.so.1
[Current thread is 1 (LWP 14899)]
(gdb) bt
#0  0x00007fa93e898ae0 in __bss_start () from /lib/ld-musl-x86_64.so.1
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x000055fcfe42af7d in v8::internal::CancelableTask::Run() ()
#3  0x000055fcfe1b37fd in node::BackgroundRunner(void*) ()
#4  0x00007fa93e65c6c2 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x0000000000000000 in ?? ()
(gdb) x/15i ($pc-40)
   0x7fa93e898ab8:      add    %al,(%rax)
   0x7fa93e898aba:      add    %al,(%rax)
   0x7fa93e898abc:      add    %al,(%rax)
   0x7fa93e898abe:      add    %al,(%rax)
   0x7fa93e898ac0:      sarb   (%rbx)
   0x7fa93e898ac2:      and    %al,(%rcx)
   0x7fa93e898ac4:      std    
   0x7fa93e898ac5:      push   %rbp
   0x7fa93e898ac6:      add    %al,(%rax)
   0x7fa93e898ac8:      nop
   0x7fa93e898ac9:      mov    $0x55fd010a,%eax
   0x7fa93e898ace:      add    %al,(%rax)
   0x7fa93e898ad0:      add    %al,(%rax)
   0x7fa93e898ad2:      add    %al,(%rax)
   0x7fa93e898ad4:      add    %al,(%rax)
(gdb) 


PID: 32247

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005583a584bb90 in ?? ()
[Current thread is 1 (LWP 32250)]
(gdb) bt
#0  0x00005583a584bb90 in ?? ()
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005583a1e00f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005583a1b897fd in node::BackgroundRunner(void*) ()
#4  0x00007f0e3cc0f6c2 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x0000000000000000 in ?? ()
(gdb) x/15i ($pc-40)
   0x5583a584bb68:      add    %ah,0x0(%rsi)
   0x5583a584bb6b:      add    %al,(%rax)
   0x5583a584bb6d:      add    %al,(%rax)
   0x5583a584bb6f:      jae    0x5583a584bb72
   0x5583a584bb71:      add    %eax,(%rax)
   0x5583a584bb73:      add    %al,(%rax)
   0x5583a584bb75:      add    %al,(%rax)
   0x5583a584bb77:      add    %ah,(%rcx)
   0x5583a584bb79:      add    %al,(%rax)
   0x5583a584bb7b:      add    %al,(%rax)
   0x5583a584bb7d:      add    %al,(%rax)
   0x5583a584bb7f:      add    %dh,0x6f(%rdx)
   0x5583a584bb82:      outsl  %ds:(%rsi),(%dx)
   0x5583a584bb83:      je     0x5583a584bb85
   0x5583a584bb85:      push   %rbp

   
PID: 476

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005565f9ddac70 in ?? ()
[Current thread is 1 (LWP 481)]
(gdb) bt
#0  0x00005565f9ddac70 in ?? ()
#1  0x00005565f847087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005565f8170f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005565f7ef97fd in node::BackgroundRunner(void*) ()
#4  0x00007fd00deb26c2 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x0000000000000000 in ?? ()
(gdb)  x/15i ($pc-40)
   0x5565f9ddac48:      adc    %esi,(%rdx,%rbx,1)
   0x5565f9ddac4b:      movhps %xmm1,(%rdi)
   0x5565f9ddac4e:      add    %al,(%rax)
   0x5565f9ddac50:      test   $0x5,%eax
   0x5565f9ddac55:      add    %al,(%rax)
   0x5565f9ddac57:      add    %al,-0x1d1e1cba(%rdx)
   0x5565f9ddac5d:      add    %eax,(%rax)
   0x5565f9ddac5f:      add    %bh,0x7(%rax)
   0x5565f9ddac62:      sar    $0x65,%ecx
   0x5565f9ddac65:      push   %rbp
   0x5565f9ddac66:      add    %al,(%rax)
   0x5565f9ddac68:      je     0x5565f9ddac6a
   0x5565f9ddac6a:      add    %al,(%rax)
   0x5565f9ddac6c:      add    %al,(%rax)
   0x5565f9ddac6e:      add    %al,(%rax)

@gireeshpunathil
Copy link
Member

thanks for the quick revert; all looks bad sequence to me; so I am looking at a wild branch from frame 1.

  • switch to frame 1
    (gdb) frame 1
  • dump last few instructions
    (gdb) x/15i ($pc-40)
  • locate the instruction for the callsite
    (gdb) x/i 0x00005565f847087f (for the last dump in you previous comment)
    see which register is involved is holding the jump target (mostly $rax)
    (gdb) x/i $rax

sorry if it is tedious!

also
(gdb) info proc mappings
this will give the image bounds, and will help ascertain whether the target branch was good or not.

@Cabalbl4
Copy link
Author

Cabalbl4 commented Feb 4, 2019

No problem, will do once have time, today or tomorrow at worst.

@Cabalbl4
Copy link
Author

Cabalbl4 commented Feb 4, 2019

@gireeshpunathil

PID: 14894

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa93e898ae0 in __bss_start () from /lib/ld-musl-x86_64.so.1
[Current thread is 1 (LWP 14899)]
(gdb) frame 1
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
(gdb) x/15i ($pc-40)
   0x55fcfe72a857 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9815>:      callq  0x55fcfe0e1250 <_Znwm@plt>
   0x55fcfe72a85c <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9820>:      mov    -0x1138(%rbp),%rdx
   0x55fcfe72a863 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9827>:      movq   $0x0,0x8(%rax)
   0x55fcfe72a86b <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9835>:      mov    %rax,(%rdx)
   0x55fcfe72a86e <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9838>:      jmpq   0x55fcfe728853 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+1619>
   0x55fcfe72a873 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9843>:      lea    0xc6de7e(%rip),%rsi        # 0x55fcff3986f8
   0x55fcfe72a87a <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9850>:      mov    %rax,%rdi
   0x55fcfe72a87d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9853>:      callq  *%rdx
=> 0x55fcfe72a87f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
   0x55fcfe72a882 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9858>:      jmpq   0x55fcfe72a2d9 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+8409>
   0x55fcfe72a887 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9863>:      movq   $0x0,-0x1080(%rbp)
   0x55fcfe72a892 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9874>:      movq   $0x0,-0x1078(%rbp)
   0x55fcfe72a89d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9885>:      xor    %r14d,%r14d
   0x55fcfe72a8a0 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9888>:      callq  0x55fcfea87b10 <_ZN2v88internal7tracing16TraceEventHelper20GetTracingControllerEv>
   0x55fcfe72a8a5 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9893>:      mov    (%rax),%rdx
(gdb) bt
#0  0x00007fa93e898ae0 in __bss_start () from /lib/ld-musl-x86_64.so.1
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x000055fcfe42af7d in v8::internal::CancelableTask::Run() ()
#3  0x000055fcfe1b37fd in node::BackgroundRunner(void*) ()
#4  0x00007fa93e65c6c2 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x0000000000000000 in ?? ()
(gdb) x/i 0x000055fcfe72a87f
=> 0x55fcfe72a87f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
(gdb) x/i $rax
   0x55fd0101ee20:      push   %rax

PID: 32247

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005583a584bb90 in ?? ()
[Current thread is 1 (LWP 32250)]
(gdb) frame 1
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
(gdb)  x/15i ($pc-40)
   0x5583a2100857 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9815>:      callq  0x5583a1ab7250 <_Znwm@plt>
   0x5583a210085c <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9820>:      mov    -0x1138(%rbp),%rdx
   0x5583a2100863 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9827>:      movq   $0x0,0x8(%rax)
   0x5583a210086b <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9835>:      mov    %rax,(%rdx)
   0x5583a210086e <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9838>:      jmpq   0x5583a20fe853 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+1619>
   0x5583a2100873 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9843>:      lea    0xc6de7e(%rip),%rsi        # 0x5583a2d6e6f8
   0x5583a210087a <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9850>:      mov    %rax,%rdi
   0x5583a210087d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9853>:      callq  *%rdx
=> 0x5583a210087f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
   0x5583a2100882 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9858>:      jmpq   0x5583a21002d9 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+8409>
   0x5583a2100887 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9863>:      movq   $0x0,-0x1080(%rbp)
   0x5583a2100892 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9874>:      movq   $0x0,-0x1078(%rbp)
   0x5583a210089d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9885>:      xor    %r14d,%r14d
   0x5583a21008a0 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9888>:      callq  0x5583a245db10 <_ZN2v88internal7tracing16TraceEventHelper20GetTracingControllerEv>
   0x5583a21008a5 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9893>:      mov    (%rax),%rdx
(gdb) bt
#0  0x00005583a584bb90 in ?? ()
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005583a1e00f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005583a1b897fd in node::BackgroundRunner(void*) ()
#4  0x00007f0e3cc0f6c2 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x0000000000000000 in ?? ()
(gdb) x/i 0x00005583a210087f
=> 0x5583a210087f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
(gdb) x/i $rax
   0x5583a56ddd00:      loopne 0x5583a56ddcbc

PID: 476

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005565f9ddac70 in ?? ()
[Current thread is 1 (LWP 481)]
(gdb) frame 1
#1  0x00005565f847087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
(gdb) x/15i ($pc-40)
   0x5565f8470857 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9815>:      callq  0x5565f7e27250 <_Znwm@plt>
   0x5565f847085c <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9820>:      mov    -0x1138(%rbp),%rdx
   0x5565f8470863 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9827>:      movq   $0x0,0x8(%rax)
   0x5565f847086b <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9835>:      mov    %rax,(%rdx)
   0x5565f847086e <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9838>:      jmpq   0x5565f846e853 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+1619>
   0x5565f8470873 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9843>:      lea    0xc6de7e(%rip),%rsi        # 0x5565f90de6f8
   0x5565f847087a <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9850>:      mov    %rax,%rdi
   0x5565f847087d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9853>:      callq  *%rdx
=> 0x5565f847087f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
   0x5565f8470882 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9858>:      jmpq   0x5565f84702d9 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+8409>
   0x5565f8470887 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9863>:      movq   $0x0,-0x1080(%rbp)
   0x5565f8470892 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9874>:      movq   $0x0,-0x1078(%rbp)
   0x5565f847089d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9885>:      xor    %r14d,%r14d
   0x5565f84708a0 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9888>:      callq  0x5565f87cdb10 <_ZN2v88internal7tracing16TraceEventHelper20GetTracingControllerEv>
   0x5565f84708a5 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9893>:      mov    (%rax),%rdx
(gdb) bt
#0  0x00005565f9ddac70 in ?? ()
#1  0x00005565f847087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
#2  0x00005565f8170f7d in v8::internal::CancelableTask::Run() ()
#3  0x00005565f7ef97fd in node::BackgroundRunner(void*) ()
#4  0x00007fd00deb26c2 in ?? () from /lib/ld-musl-x86_64.so.1
#5  0x0000000000000000 in ?? ()
(gdb) x/i 0x5565f847087f
=> 0x5565f847087f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
(gdb) x/i $rax
   0x5565f9b873e0:      loopne 0x5565f9b873cc

@gireeshpunathil
Copy link
Member

@Cabalbl4 - if you can just dump %rsi in frame #1 that would reveal a lot:
(gdb) x/s %rsi
or
(gdb) x/s 0x5565f90de6f8

If the crash is same as my suspect, we should find a string value there: disabled-by-default-..

there is only one callsite in the whole of inlined method v8::internal::ConcurrentMarking::Run that is not devirtualized by the compiler, as the receiver object is available only at runtime, and has known base class:
v8::internal::tracing::TraceEventHelper::GetTracingController()->GetCategoryGroupEnabled()

The wild branch upon invoking GetCategoryGroupEnabledcould only mean that the return from GetTracingController() was garbage.

This reminds issues discovered in master (#25007) and with matching context with #25007 (comment)

But will wait for @Cabalbl4 's output to confirm.

@Cabalbl4
Copy link
Author

Cabalbl4 commented Feb 4, 2019

@gireeshpunathil
PID 14894

[Current thread is 1 (LWP 14899)]
(gdb) frame 1
#1  0x000055fcfe72a87f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
(gdb)  x/s %rsi
A syntax error in expression, near `%rsi'.
(gdb)  x/15i ($pc-40)
   0x55fcfe72a857 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9815>:      callq  0x55fcfe0e1250 <_Znwm@plt>
   0x55fcfe72a85c <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9820>:      mov    -0x1138(%rbp),%rdx
   0x55fcfe72a863 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9827>:      movq   $0x0,0x8(%rax)
   0x55fcfe72a86b <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9835>:      mov    %rax,(%rdx)
   0x55fcfe72a86e <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9838>:      jmpq   0x55fcfe728853 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+1619>
   0x55fcfe72a873 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9843>:      lea    0xc6de7e(%rip),%rsi        # 0x55fcff3986f8
   0x55fcfe72a87a <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9850>:      mov    %rax,%rdi
   0x55fcfe72a87d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9853>:      callq  *%rdx
=> 0x55fcfe72a87f <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9855>:      mov    %rax,%r13
   0x55fcfe72a882 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9858>:      jmpq   0x55fcfe72a2d9 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+8409>
   0x55fcfe72a887 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9863>:      movq   $0x0,-0x1080(%rbp)
   0x55fcfe72a892 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9874>:      movq   $0x0,-0x1078(%rbp)
   0x55fcfe72a89d <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9885>:      xor    %r14d,%r14d
   0x55fcfe72a8a0 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9888>:      callq  0x55fcfea87b10 <_ZN2v88internal7tracing16TraceEventHelper20GetTracingControllerEv>
   0x55fcfe72a8a5 <_ZN2v88internal17ConcurrentMarking3RunEiPNS1_9TaskStateE+9893>:      mov    (%rax),%rdx
(gdb) x/s 0x55fcff3986f8
0x55fcff3986f8: "disabled-by-default-v8.gc"

PID 32247

[Current thread is 1 (LWP 32250)]
(gdb) frame 1
#1  0x00005583a210087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
(gdb) info registers rsi
rsi            0x5583a2d6e6f8   94023861069560
(gdb) x/s  0x5583a2d6e6f8
0x5583a2d6e6f8: "disabled-by-default-v8.gc"

PID 476

(gdb) frame 1
#1  0x00005565f847087f in v8::internal::ConcurrentMarking::Run(int, v8::internal::ConcurrentMarking::TaskState*) ()
(gdb) info registers rsi
rsi            0x5565f90de6f8   93896458495736
(gdb) x/s 0x5565f90de6f8
0x5565f90de6f8: "disabled-by-default-v8.gc"

@gireeshpunathil
Copy link
Member

Thanks @Cabalbl4 for the quick revert! that confirms the issue; and concludes the debugging. Please look out for a recommendation on this thread.

So we have exit-race in v10.x! I am not sure the list of PRs that we will need to be backporting other than #25061

@addaleax @MylesBorins @targos @BethGriggs

@MylesBorins
Copy link
Contributor

@gireeshpunathil #25061 (specifically 4da7e6e) has not landed on 10.x, was that a mistype?

@gireeshpunathil
Copy link
Member

@MylesBorins - no, I meant #25061 itself. what I meant to say is; we want 4da7e6e for sure; but not sure what else would be needed - as we had a number of race related issues and I don't have a mapping between issue and the commit matching its resolution.

gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Feb 8, 2019
Insert a NULLCHECK prior to return. Ideally we do this in the caller,
but the TraceController object is somewhat special as:
1. It is accessed by most threads
2. It's life cycle is managed by Agent::Agent
3. It's getter is invoked through Base Methods (upstream)

Refs: nodejs#25814
pull bot pushed a commit to Rachelmorrell/node that referenced this issue Feb 8, 2019
Insert a NULLCHECK prior to return. Ideally we do this in the caller,
but the TraceController object is somewhat special as:
1. It is accessed by most threads
2. It's life cycle is managed by Agent::Agent
3. It's getter is invoked through Base Methods (upstream)

Refs: nodejs#25814

PR-URL: nodejs#25943
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Masashi Hirano <shisama07@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
addaleax pushed a commit that referenced this issue Feb 8, 2019
Insert a NULLCHECK prior to return. Ideally we do this in the caller,
but the TraceController object is somewhat special as:
1. It is accessed by most threads
2. It's life cycle is managed by Agent::Agent
3. It's getter is invoked through Base Methods (upstream)

Refs: #25814

PR-URL: #25943
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Masashi Hirano <shisama07@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@gireeshpunathil
Copy link
Member

@Cabalbl4 - v10.15.x will have the fix for this, can you please try?

@genisd
Copy link

genisd commented Jun 24, 2019

There is also a 10.16.0 release which could include the fix for this.
Changelog mentions some segfault fixes

A service which we suspect had the same issue has been running without error with 10.16.0. But again we cannot be certain it's the same issue

@gireeshpunathil
Copy link
Member

ping @Cabalbl4

@gireeshpunathil
Copy link
Member

inactive, closing. please re-open if this is still outstanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants