Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

mbgl-node segfaults in node 10 when outstanding http requests complete after map is GCed #12252

Closed
dBitech opened this issue Jun 28, 2018 · 16 comments
Labels
crash Node.js node-mapbox-gl-native

Comments

@dBitech
Copy link

dBitech commented Jun 28, 2018

I receive random segmentation faults. this occurs under node10, using both the latest release of mapbox-gl-native and the latest code from git. I do not have a self-contained test case where it occurs, however, I have attached what debugging information I do have. Concurrency does not appear to be a factor, in reproducing it. Sometimes it will seg on the first run, sometimes an ab -c 800 -n100000 will run 500 times with no issue.

darcy@HOST:~/src$ node -v
v10.5.0

darcy@HOST:~/src$ uname -a
Linux HOST 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

PID 25796 received SIGSEGV for address: 0x0
/home/darcy/src/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2ed7)[0x7fbfaaab5ed7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fbfad476890]
node(_ZN4node16EmitAsyncDestroyEPN2v87IsolateENS_13async_contextE+0x15)[0x86a785]
/home/darcy/src/node_modules/@mapbox/mapbox-gl-native/lib/mbgl-node.abi-64.node(+0x565b0)[0x7fbf922325b0]
/home/darcy/src/node_modules/@mapbox/mapbox-gl-native/lib/mbgl-node.abi-64.node(+0x6b0c4)[0x7fbf922470c4]
/home/darcy/src/node_modules/@mapbox/mapbox-gl-native/lib/mbgl-node.abi-64.node(+0x69386)[0x7fbf92245386]
/home/darcy/src/node_modules/@mapbox/mapbox-gl-native/lib/mbgl-node.abi-64.node(+0x693b6)[0x7fbf922453b6]
/home/darcy/src/node_modules/@mapbox/mapbox-gl-native/lib/mbgl-node.abi-64.node(+0x57414)[0x7fbf92233414]
node(_ZN2v88internal13GlobalHandles31DispatchPendingPhantomCallbacksEb+0xc3)[0xe42a23]
node(_ZN2v88internal13GlobalHandles31PostGarbageCollectionProcessingENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x2a)[0xe42c4a]
node(_ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE+0x1eb)[0xe80d7b]
node(_ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE+0x194)[0xe81c74]
node(_ZN2v88internal4Heap20AllocateRawWithRetryEiNS0_15AllocationSpaceENS0_19AllocationAlignmentE+0x45)[0xe845a5]
node(_ZN2v88internal7Factory15NewFillerObjectEibNS0_15AllocationSpaceE+0x24)[0xe4cad4]
node(_ZN2v88internal26Runtime_AllocateInNewSpaceEiPPNS0_6ObjectEPNS0_7IsolateE+0x6e)[0x10ecdbe]
[0x3638ff7841bd]

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x000000000086a785 in node::EmitAsyncDestroy(v8::Isolate*, node::async_context) ()
(gdb) bt
#0 0x000000000086a785 in node::EmitAsyncDestroy(v8::Isolate*, node::async_context) ()
#1 0x00007fffdbaf85b0 in Nan::AsyncResource::~AsyncResource (this=0x50b51b0, __in_chrg=) at ../../headers/nan/2.10.0/nan.h:513
#2 0x00007fffdbb0d0c4 in Nan::AsyncWorker::~AsyncWorker (this=0x5156ed8, __in_chrg=) at ../../headers/nan/2.10.0/nan.h:1801
#3 0x00007fffdbb0b386 in node_mbgl::NodeRequest::~NodeRequest (this=0x5156ec0, __in_chrg=)
at ../../../platform/node/src/node_request.cpp:18
#4 0x00007fffdbb0b3b6 in node_mbgl::NodeRequest::~NodeRequest (this=0x5156ec0, __in_chrg=)
at ../../../platform/node/src/node_request.cpp:25
#5 0x00007fffdbaf9414 in Nan::ObjectWrap::WeakCallback (info=...) at ../../headers/nan/2.10.0/nan_object_wrap.h:126
#6 0x0000000000e42a23 in v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) ()
#7 0x0000000000e42c4a in v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#8 0x0000000000e80d7b in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#9 0x0000000000e81c74 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#10 0x0000000000e845a5 in v8::internal::Heap::AllocateRawWithRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) ()
#11 0x0000000000e4cad4 in v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) ()
#12 0x00000000010ecdbe in v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) ()
#13 0x00001729be5841bd in ?? ()
#14 0x00001729be584121 in ?? ()
#15 0x00007fffffff8ce0 in ?? ()
#16 0x0000000000000006 in ?? ()
#17 0x00007fffffff8d70 in ?? ()
#18 0x00001729be87a96e in ?? ()
#19 0x0000016800000000 in ?? ()
#20 0x0000000000000037 in ?? ()
---Type to continue, or q to quit---
#21 0x000000000000001a in ?? ()
#22 0x0000000000000037 in ?? ()
#23 0x0000000000000020 in ?? ()
#24 0x00000a89dd582201 in ?? ()
#25 0x00000a89dd582289 in ?? ()
#26 0x00000a89dd582381 in ?? ()
#27 0x00000a4317a1e219 in ?? ()
#28 0x000007d8dfe79869 in ?? ()
#29 0x00000a89dd5823b9 in ?? ()
#30 0x00000a89dd582411 in ?? ()
#31 0x00000a89dd582451 in ?? ()
#32 0x00007fffffff8da0 in ?? ()
#33 0x00001729be648f00 in ?? ()
#34 0x000005d6347022e1 in ?? ()
#35 0x000005d6347022e1 in ?? ()
#36 0x00000a89dd5824f9 in ?? ()
#37 0x00000a89dd582539 in ?? ()
#38 0x00007fffffff8df0 in ?? ()
#39 0x00001729be635ed7 in ?? ()
#40 0x000005d6347022e1 in ?? ()
#41 0x000005d6347022e1 in ?? ()
#42 0x0000000000000003 in ?? ()
#43 0x0000000002415398 in ?? ()
---Type to continue, or q to quit---
#44 0x00007fffffff8ef0 in ?? ()
#45 0x0000000000000003 in ?? ()
#46 0x00000a89dd582589 in ?? ()
#47 0x00000a89dd5825c9 in ?? ()
#48 0x00007fffffff8e28 in ?? ()
#49 0x00001729be58c5a3 in ?? ()
#50 0x000005d6347022e1 in ?? ()
#51 0x0000000000000000 in ?? ()

@kkaefer kkaefer added the Node.js node-mapbox-gl-native label Jul 31, 2018
@flurin
Copy link

flurin commented Oct 23, 2018

I'm getting the same error, anything I can do to clarify the issue further?

@ChrisLoer
Copy link
Contributor

This is a stab in the dark, but the fact that the call trace shows a segfault inside ~NodeRequest being triggered by garbage collection reminds me of the mechanism of crash we found in #11281.

Can you provide more detail on the version you're using, like a tag or version number? The latest release (4.1.0 at https://www.npmjs.com/package/@mapbox/mapbox-gl-native) has the fix to issue #11281, although this could definitely be a different crash following the same pattern.

@flurin
Copy link

flurin commented Oct 23, 2018 via email

@flurin
Copy link

flurin commented Oct 23, 2018

Ok, so here we go.

I'm running the generation of the tiles within a micro service. I'm currently not even doing a map.render(). Below you'll find the minimal demo I made. This crashes after 50 requests give or take.

➜  ~ uname -a
Darwin MacBook-Pro.ph16 17.7.0 Darwin Kernel Version 17.7.0: Fri Jul  6 19:54:51 PDT 2018; root:xnu-4570.71.3~2/RELEASE_X86_64 x86_64
➜  ~ node -v
v10.11.0

Output of SegfaultHandler

PID 25514 received SIGSEGV for address: 0x3e8
0   segfault-handler.node               0x00000001071ea3c8 _ZL16segfault_handleriP9__siginfoPv + 312
1   libsystem_platform.dylib            0x00007fff52311f5a _sigtramp + 26
2   ???                                 0x0000000000000000 0x0 + 0
3   mbgl.node                           0x0000000108121db9 _ZN3Nan11AsyncWorkerD2Ev + 153
4   mbgl.node                           0x0000000108121ce5 _ZN9node_mbgl11NodeRequestD2Ev + 101
5   mbgl.node                           0x0000000108121e3e _ZN9node_mbgl11NodeRequestD0Ev + 14
6   node                                0x00000001004206ef _ZN2v88internal13GlobalHandles22PendingPhantomCallback6InvokeEPNS0_7IsolateE + 73
7   node                                0x0000000100420b80 _ZN2v88internal13GlobalHandles31DispatchPendingPhantomCallbacksEb + 116
8   node                                0x0000000100420d60 _ZN2v88internal13GlobalHandles31PostGarbageCollectionProcessingENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 44
9   node                                0x000000010043f043 _ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE + 1361
10  node                                0x000000010043dfd8 _ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE + 652
11  node                                0x0000000100446527 _ZN2v88internal4Heap25AllocateRawWithLigthRetryEiNS0_15AllocationSpaceENS0_19AllocationAlignmentE + 61
12  node                                0x0000000100446570 _ZN2v88internal4Heap26AllocateRawWithRetryOrFailEiNS0_15AllocationSpaceENS0_19AllocationAlignmentE + 28
13  node                                0x0000000100426978 _ZN2v88internal7Factory15NewFillerObjectEibNS0_15AllocationSpaceE + 36
14  node                                0x0000000100600ae9 _ZN2v88internal26Runtime_AllocateInNewSpaceEiPPNS0_6ObjectEPNS0_7IsolateE + 100
15  ???                                 0x00003f89400dc01d 0x0 + 69858717712413
16  ???                                 0x00003f894016effa 0x0 + 69858718314490

Demo source (styleSourcePath points to the standard openmaptiles osm-bright-gl-style.json)

const mbgl = require('@mapbox/mapbox-gl-native')

const styleSourcePath = "..."

const mbglOptions = {
  request: function(req, callback) {
    callback(null, {data: new Buffer("")})
  },
  ratio: 1
};

let counter = 0;

module.exports = (req,res) => {
  counter += 1;
  console.log(counter, "times")
  const map = new mbgl.Map(mbglOptions);

  map.load(require(styleSourcePath));

  return ""
}

@ChrisLoer
Copy link
Contributor

Thanks @flurin, that's useful information. I haven't gotten a local reproduction of this working yet, and to be honest it takes me a long time whenever I have to re-wrap my head around node/v8 (and I haven't debugged this with node 10 before).

The immediate thing that jumps out at me is that the NodeMap object is eligible for GC immediately after you call the map.load and return "", but the map.load will trigger some asynchronous requests (for instance it will request the sprite sheet for the style). The intended behavior is somethign like:

  • If the asynchronous callbacks complete before GC, they should work (even if they're effectively going to be thrown away).
  • If GC hits the NodeMap before the asynchronous callbacks run, they should abort.

Until one of us gets a chance to dig in to this, is it possible as a workaround for you to modify your code to hold onto the map references so they don't get GCed while there are outstanding requests?

@flurin
Copy link

flurin commented Oct 24, 2018

Thanks @ChrisLoer! However I'm having no luck determining wether or not a request is still outstanding. Especially since in the example above the callbacks are called immediately. Can you point me in the right direction?

I managed to create a simpler reproduction path though that has no other dependencies except mapbox-gl-native. This example consistently fails after about 24 times.

const mbgl = require('@mapbox/mapbox-gl-native')

const styleSourcePath = "..."

const delay = () => {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      resolve();
    }, 100);
  })
}

const mbglOptions = {
  request: function(req, callback) {
    callback(null, {data: Buffer.from("")})
  },
  ratio: 1
};

let count = 0;
const arr = new Array(2000).fill(1);

arr.reduce((acc, v) => {
  return acc.then(() => {
    count += 1;
    console.log("Run", count);
    const map = new mbgl.Map(mbglOptions);
    map.load(require(styleSourcePath));
  }).then(delay)
}, delay());

@ChrisLoer
Copy link
Contributor

However I'm having no luck determining wether or not a request is still outstanding.

The map has a loaded method that will tell you if the initial load has completed, so you'd have to poll that to check. I think if you make a render request it has a completion callback.

The way we use the gl-native node module when we do server side rendering is we keep a pool of map objects around, and when a request comes in we acquire one of the maps from the pool and use it to render. On top of sidestepping these lifetime issues, keeping the map objects around avoids lots of redundant work that would otherwise happen during the map object creation.

Thanks for the even simpler reproduction case! Incidentally, is it easy for you to try your experiment with Node 8? I know there were changes related to the async interface in Node 10 (@kkaefer just recently updated our version of nan.h to be able to handle them), but I don't think we have any production experience running gl-native on Node 10.

@hkrutzer
Copy link

Can reproduce in Docker node:10-stretch, not in node:8-stretch.

@flurin
Copy link

flurin commented Oct 24, 2018

Using a pool of maps is an interesting idea. That may also reduce our render times. I'll look in to that! Thanks for the tip!

For the reproduction it is as @hkrutzer says, the above minimal script runs fine in Node 8. So it's definitely something to do with Node 10.

Let me know if I can help out further in tracking this down.

root@f571f3db4238:/# uname -a
Linux f571f3db4238 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 GNU/Linux
root@f571f3db4238:/# node -v
v8.12.0

@ChrisLoer ChrisLoer changed the title random Segfaults in mbgl-node mbgl-node segfaults in node 10 when outstanding http requests complete after map is GCed Oct 24, 2018
@flurin
Copy link

flurin commented Oct 24, 2018

As a workaround using a pool and keeping the reference works for now. Also much faster this way. The only tricky bit is that you cannot call map.release().

@hkrutzer
Copy link

Running the script above with --expose-gc and calling global.gc() after each map.load also prevents the crash.

@flurin
Copy link

flurin commented Oct 25, 2018

Further experimenting still with Node 10 and a global pool of map instances still gives a segfault now and then (a lot less admittedly). I'm switching back to Node 8 for now.

@stale
Copy link

stale bot commented Apr 23, 2019

This issue has been automatically detected as stale because it has not had recent activity and will be archived. Thank you for your contributions.

@kkaefer
Copy link
Contributor

kkaefer commented Apr 26, 2019

This hasn't been fixed yet.

@flippmoke
Copy link
Member

@flurin does #14847 solve your issues?

@flippmoke
Copy link
Member

I believe this is now fixed by #14847 please re-open if issues continue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
crash Node.js node-mapbox-gl-native
Projects
None yet
Development

No branches or pull requests

6 participants