Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on mipsel64 from v12.x #31118

Closed
whhif opened this issue Dec 28, 2019 · 23 comments
Closed

crash on mipsel64 from v12.x #31118

whhif opened this issue Dec 28, 2019 · 23 comments
Labels
mips Issues and PRs related to the MIPS architecture.

Comments

@whhif
Copy link

whhif commented Dec 28, 2019

  • Version: 12.14.0
  • Platform: Linux localhost.localdomain 3.10.0 deps: update openssl to 1.0.1j #1 SMP PREEMPT Mon Sep 16 21:11:02 CST mips64 mips64 mips64 GNU/Linux
  • Subsystem: gcc-4.9.3-3 gcc-7.3.0

I just build nodejs@12.14.0 on mips64el (loongson 3A3000) with --with-snapshot ,the result was successed. When I run nodejs, it crashed. The messages are as follows:

#

# Fatal error in ../deps/v8/src/execution/isolate.cc, line 232

# Embedded blob checksum verification failed. This indicates that the embedded blob has been modified since compilation time. A common cause is a debugging breakpoint set within builtin code.

#

#

#

#FailureMessage Object: 0xffffffacc0

Program received signal SIGTRAP, Trace/breakpoint trap.

v8::base::OS::Abort () at ../deps/v8/src/base/platform/platform-posix.cc:406

But nodejs@v10.x and nodejs@v11.x works fine. I found that when I build nodejs@v12.x with --without-snapshot,then nodejs works fine. Maybe mips64el has some problems with v8 snapshot.

The versions of v8 and nodejs are :

  1. nodejs@12.14.0 -> v8@7.7.299.13
  2. nodejs@11.15.1 -> v8@7.0.276.38
  3. nodejs@10.17.0 -> v8@6.8.275.32

According to the nodejs building document, have to build nodejs with gcc>=6.3,but I build it with gcc 4.9.3-3 without errors. after crash,I use gcc@7.3.0 to build nodejs,the result are same.

I also tried to build nodejs with crossing compiler gcc@4.9.4 and gcc@7.3.1, nodejs was crashed too.You can download them from

http://www.loongnix.org/index.php/Cross-compile , and direct download url gcc@4.9.4 and gcc@7.3.1 ,put the date directory to /usr/loca/

the crossing build script as follows(host: ubuntu 18.04 x86_64 and gcc@7.4.0):

#!/bin/bash
export PREFIX=/usr/local/mips-loongson-gcc4.9-linux-gnu/bin/mips-linux-gnu-
export CC=${PREFIX}"gcc -march=gs464e -mips64r2 -mabi=64"
export CXX=${PREFIX}"c++ -march=gs464e -mips64r2 -mabi=64"
export LINK=$CXX
export LD=${PREFIX}ld
export AR=${PREFIX}ar
export AS=${PREFIX}as
export RANLIB=${PREFIX}ranlib
export CROSS_COMPILE=mips-loongson
export ARCH=mips64el

# Native compilers
export AR_host="ar"
export CC_host="gcc"
export CXX_host="g++"
export LINK_host="g++"

export AR_HOST="ar"
export CC_HOST="gcc"
export CXX_HOST="g++"
export LINK_HOST="g++"

# extras for convenience.
export OBJD=${PREFIX}objdump
export GDB=${PREFIX}gdb
export RDE=${PREFIX}readelf

./configure --dest-cpu=mips64el --cross-compiling --with-mips-arch-variant=r2 --dest-os=linux --openssl-no-asm --verbose
make -j$(grep -c ^processor /proc/cpuinfo 2>/dev/null || 1)

Maybe nodejs crashed in v8 builtin code,But when I use ./configure xxxx and --gdb, build failed on src\deps\v8\src\diagnostics\gdb-jit.cc@629 #error Unsupported target architecture.

  void WriteHeader(Writer* w) {
    DCHECK_EQ(w->position(), 0);
    Writer::Slot<ELFHeader> header = w->CreateSlotHere<ELFHeader>();
#if (V8_TARGET_ARCH_IA32 || V8_TARGET_ARCH_ARM)
    const uint8_t ident[16] = {0x7F, 'E', 'L', 'F', 1, 1, 1, 0,
                               0,    0,   0,   0,   0, 0, 0, 0};
#elif V8_TARGET_ARCH_X64 && V8_TARGET_ARCH_64_BIT || \
    V8_TARGET_ARCH_PPC64 && V8_TARGET_LITTLE_ENDIAN
    const uint8_t ident[16] = {0x7F, 'E', 'L', 'F', 2, 1, 1, 0,
                               0,    0,   0,   0,   0, 0, 0, 0};
#elif V8_TARGET_ARCH_PPC64 && V8_TARGET_BIG_ENDIAN && V8_OS_LINUX
    const uint8_t ident[16] = {0x7F, 'E', 'L', 'F', 2, 2, 1, 0,
                               0,    0,   0,   0,   0, 0, 0, 0};
#elif V8_TARGET_ARCH_S390X
    const uint8_t ident[16] = {0x7F, 'E', 'L', 'F', 2, 2, 1, 3,
                               0,    0,   0,   0,   0, 0, 0, 0};
#elif V8_TARGET_ARCH_S390
    const uint8_t ident[16] = {0x7F, 'E', 'L', 'F', 1, 2, 1, 3,
                               0,    0,   0,   0,   0, 0, 0, 0};
#else
#error Unsupported target architecture. <--
#endif

Hi @bnoordhuis:

Thank you for help. Yes, mips(el) is not an officially supported
architecture, but less than nothing.

@richardlau richardlau added the mips Issues and PRs related to the MIPS architecture. label Dec 29, 2019
@bnoordhuis
Copy link
Member

That "Embedded blob checksum verification failed" error is a debug check, it's not enabled in release builds, so I'm kind of surprised you're getting that message. Where did you obtain the source code from?

@whhif
Copy link
Author

whhif commented Dec 30, 2019

When nodejs crashed, I build nodejs with --debug flag ,so I get the error message. But I can not built it with -gdb flag.

@bnoordhuis
Copy link
Member

--gdb is for debugging jitted code (the machine code V8 emits when it compiles a JS method), you probably don't need that but if you do, you can probably make it work with minor tweaks to gdb-jit.cc.

You mention release builds crash? What does the backtrace look like in gdb?

@FlyGoat
Copy link

FlyGoat commented Feb 26, 2020

So I just dig the issue deeper. It crashed at node_mksnapshot.
With GDB, it told me

Thread 1 "node_mksnapshot" received signal SIGSEGV, Segmentation fault.
0x000000aaab6dace0 in Builtins_ConstructProxy () at ../../deps/v8/../../deps/v8/src/builtins/base.tq:412
412       elements: FixedArrayBase;

(gdb) info registers 
                  zero               at               v0               v1
 R0   0000000000000000 0000000000000001 000000aaab6dace0 0000000000010000 
                    a0               a1               a2               a3
 R4   000000aaaf2b1fe0 000000d3c5ec04b9 000000143ad03119 000000d3c5ec04b9 
                    a4               a5               a6               a7
 R8   0000000000000003 000000ffffff27f0 0000000000000000 0000000000000003 
                    t0               t1               t2               t3
 R12  0000000000000000 0000000000000000 0000000000000000 00000000000c0004 
                    s0               s1               s2               s3
 R16  000000aaaf2b1ee0 000000ffffff2560 000000aaaf259fe0 0000000000000005 
                    s4               s5               s6               s7
 R20  000000ffffff24d0 000000d3c5ec04b9 000000143ad03119 000000aaab6dace0 
                    t8               t9               k0               k1
 R24  0000000000000038 000000aaab6dace0 000000fff7bd0000 0000000000000000 
                    gp               sp               s8               ra
 R28  000000aaaf20e758 000000ffffff2480 000000ffffff2480 000000aaac33d630 
                status               lo               hi         badvaddr
      000000000400ccf3 0e1b9099b653a189 0000000000000001 000000aaac33d62f 
                 cause               pc
      0000000010000004 000000aaab6dace0 
                  fcsr              fir          restart
              000c0004         00f70501 0000000000000000 

With no backtrace frame.

Disassembly at that point seems strange. It should be padding or something, not code.

   0x000000aaab6dac20 <+848>:   sd      v0,-48(s8)
   0x000000aaab6dac24 <+852>:   sd      a5,-64(s8)
   0x000000aaab6dac28 <+856>:   sd      a7,-72(s8)
   0x000000aaab6dac2c <+860>:   sd      a4,-80(s8)
   0x000000aaab6dac30 <+864>:   ld      t9,25640(s6)
   0x000000aaab6dac34 <+868>:   daddiu  t9,t9,63
   0x000000aaab6dac38 <+872>:   jalr    t9
   0x000000aaab6dac3c <+876>:   nop
   0x000000aaab6dac40 <+880>:   move    t0,v0
   0x000000aaab6dac44 <+884>:   ld      a6,-40(s8)
   0x000000aaab6dac48 <+888>:   ld      v0,-48(s8)
   0x000000aaab6dac4c <+892>:   ld      a5,-64(s8)
   0x000000aaab6dac50 <+896>:   ld      a7,-72(s8)
   0x000000aaab6dac54 <+900>:   ld      a4,-80(s8)
   0x000000aaab6dac58 <+904>:   b       0xaaab6daa84 <Builtins_ConstructProxy+436>
   0x000000aaab6dac5c <+908>:   nop
   0x000000aaab6dac60 <+912>:   daddiu  sp,sp,-16
   0x000000aaab6dac64 <+916>:   sd      v0,0(sp)
   0x000000aaab6dac68 <+920>:   li      a2,0x3b
   0x000000aaab6dac6c <+924>:   dsll32  a2,a2,0x1
   0x000000aaab6dac70 <+928>:   sd      a2,8(sp)
   0x000000aaab6dac74 <+932>:   ld      a1,9616(s6)
   0x000000aaab6dac78 <+936>:   daddiu  a0,zero,2
   0x000000aaab6dac7c <+940>:   ld      s7,-40(s8)
   0x000000aaab6dac80 <+944>:   ld      t9,30488(s6)
   0x000000aaab6dac84 <+948>:   daddiu  t9,t9,63
   0x000000aaab6dac88 <+952>:   jalr    t9
   0x000000aaab6dac8c <+956>:   nop
   0x000000aaab6dac90 <+960>:   break   0x150,0x321
   0x000000aaab6dac94 <+964>:   nop
   0x000000aaab6dac98 <+968>:   srav    zero,zero,zero
   0x000000aaab6dac9c <+972>:   srl     zero,zero,0x0
   0x000000aaab6daca0 <+976>:   sll     zero,zero,0x2
   0x000000aaab6daca4 <+980>:   sd      ra,-1(ra)
   0x000000aaab6daca8 <+984>:   sd      ra,-1(ra)
   0x000000aaab6dacac <+988>:   0x204
   0x000000aaab6dacb0 <+992>:   sd      ra,-1(ra)
   0x000000aaab6dacb4 <+996>:   sd      ra,-1(ra)
   0x000000aaab6dacb8 <+1000>:  dsll32  zero,zero,0xa
   0x000000aaab6dacbc <+1004>:  sd      ra,-1(ra)
   0x000000aaab6dacc0 <+1008>:  sd      ra,-1(ra)
   0x000000aaab6dacc4 <+1012>:  0x2dc
   0x000000aaab6dacc8 <+1016>:  sd      ra,-1(ra)
   0x000000aaab6daccc <+1020>:  sd      ra,-1(ra)
   0x000000aaab6dacd0 <+1024>:  0x348
   0x000000aaab6dacd4 <+1028>:  sd      ra,-1(ra)
   0x000000aaab6dacd8 <+1032>:  sd      ra,-1(ra)
   0x000000aaab6dacdc <+1036>:  tge     zero,zero,0xd
=> 0x000000aaab6dace0 <+1040>:  sd      ra,-1(ra)
   0x000000aaab6dace4 <+1044>:  sd      ra,-1(ra)
   0x000000aaab6dace8 <+1048>:  sll     zero,zero,0xf
   0x000000aaab6dacec <+1052>:  sd      ra,-1(ra)
   0x000000aaab6dacf0 <+1056>:  sd      ra,-1(ra)
   0x000000aaab6dacf4 <+1060>:  0x2001a8
   0x000000aaab6dacf8 <+1064>:  0x1b80000
   0x000000aaab6dacfc <+1068>:  0x1be0000
   0x000000aaab6dad00 <+1072>:  pref    0xc,0(a2)
   0x000000aaab6dad04 <+1076>:  pref    0xc,-13108(a2)
   0x000000aaab6dad08 <+1080>:  pref    0xc,-13108(a2)
   0x000000aaab6dad0c <+1084>:  pref    0xc,-13108(a2)

So I tried to trace RA register, and got the real call routine.

Thread 1 "node_mksnapshot" hit Breakpoint 2, 0x000000aaac33d4ac in v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) ()
(gdb) bt
#0  0x000000aaac33d4ac in v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) ()
#1  0x000000aaac33db2c in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) ()
#2  0x000000aaac293918 in v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) ()
#3  0x000000aaababf584 in node::InitializePrimordials (context=...) at ../src/api/environment.cc:504
#4  0x000000aaababe90c in node::GetPerContextExports (context=...) at ../src/api/environment.cc:409
#5  0x000000aaababf264 in node::InitializePrimordials (context=...) at ../src/api/environment.cc:482
#6  0x000000aaababf06c in node::InitializeContextForSnapshot (context=...) at ../src/api/environment.cc:465
#7  0x000000aaababf670 in node::InitializeContext (context=...) at ../src/api/environment.cc:516
#8  0x000000aaababeabc in node::NewContext (isolate=0xaaaf2b1ee0, object_template=...) at ../src/api/environment.cc:422
#9  0x000000aaabab86d4 in node::SnapshotBuilder::Generate (args=std::vector of length 1, capacity 1 = {...}, exec_args=std::vector of length 0, capacity 0) at ../tools/snapshot/snapshot_builder.cc:90
#10 0x000000aaabab6dec in main (argc=2, argv=0xffffff30c8) at ../tools/snapshot/node_mksnapshot.cc:47

Looks like it jumped into wrong address. Any further hint on debugging is appreciated.

Thanks.

@bnoordhuis

@FlyGoat
Copy link

FlyGoat commented Feb 26, 2020

It appears to be a GCC only regression.
Clang build works fine, but there are some test failures with OpenSSL, like:

=== release test-tls-honorcipherorder ===                        
Path: parallel/test-tls-honorcipherorder
_tls_common.js:129
      c.context.setCert(cert);
                ^

Error: error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small
    at Object.createSecureContext (_tls_common.js:129:17)
    at Server.setSecureContext (_tls_wrap.js:1312:27)
    at new Server (_tls_wrap.js:1176:8)
    at Object.createServer (_tls_wrap.js:1219:10)
    at test (/home/flygoat/nodejs/test/parallel/test-tls-honorcipherorder.js:30:22)
    at Object.<anonymous> (/home/flygoat/nodejs/test/parallel/test-tls-honorcipherorder.js:60:1)
    at Module._compile (internal/modules/cjs/loader.js:1204:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1224:10)
    at Module.load (internal/modules/cjs/loader.js:1053:32)
    at Function.Module._load (internal/modules/cjs/loader.js:948:14) {
  library: 'SSL routines',
  function: 'SSL_CTX_use_certificate',
  reason: 'ee key too small',
  code: 'ERR_SSL_EE_KEY_TOO_SMALL'
}

Will do some investigation.

@bnoordhuis
Copy link
Member

#31118 (comment) looks like a C++ -> JS function call gone wrong but I can't really tell more from the backtrace.

It looks like this function call:

node/src/api/environment.cc

Lines 462 to 471 in db125c5

MaybeLocal<Function> maybe_fn =
native_module::NativeModuleEnv::LookupAndCompile(
context, *module, &parameters, nullptr);
if (maybe_fn.IsEmpty()) {
return false;
}
Local<Function> fn = maybe_fn.ToLocalChecked();
MaybeLocal<Value> result =
fn->Call(context, Undefined(isolate),
arraysize(arguments), arguments);

cc @joyeecheung - that code seems to have undergone quite some changes recently so perhaps this is a known and already fixed issue?

@FlyGoat You might want to try out the HEAD of the v12.x-staging branch, see if that works better.

@FlyGoat
Copy link

FlyGoat commented Feb 26, 2020

@bnoordhuis still happens on v12.x-staging with GCC.
But I can't understand why clang just works fine.

@joyeecheung
Copy link
Member

joyeecheung commented Feb 26, 2020

that code seems to have undergone quite some changes recently so perhaps this is a known and already fixed issue?

Not that I know of, but I can't tell from the stack trace which part of the snapshot building was particularly relevant - I believe the call on the stack frames was just the first time V8 in our builds would ever try to call from C++ to JS through the API? It might help to just create a regular v8 context, compile a simple JS function and call into it in node::SnapshotBuilder::Generate() right after the registration of isolates to figure out if it's just general C++ -> JS call failure:

per_process::v8_platform.Platform()->RegisterIsolate(isolate,

See this test on how to do this:

TEST(CompileFunctionInContext) {

@joyeecheung
Copy link
Member

And, you can also try building with the configure option --without-node-snapshot and see if the node binary fails when it starts

@FlyGoat
Copy link

FlyGoat commented Feb 27, 2020

And, you can also try building with the configure option --without-node-snapshot and see if the node binary fails when it starts

Thanks for your suggestions, node binary without snapshot crashed at the same point.
It looks like a v8 issue. I'm going to run some v8 tests.

@xen0n
Copy link

xen0n commented Mar 8, 2020

Bisected down to commit f579e11, v8 broke between 7.3.492.25 and 7.4.288.13. Need to investigate deeper.

@xwafish
Copy link

xwafish commented Apr 14, 2020

on mips platform, we always build v8 use llvm before, so we do not notice this issue.
I will do some investigation about this.

@xwafish
Copy link

xwafish commented Apr 15, 2020

This should be a bug of GCC assembler on mips.
v8 snapshot use file "embeded.S" as a input file, if v8 do not insert source location information into this file(".loc"), everything goes right, all code are writed as .octa xxxxx , but if v8 insert ".loc" to "embeded.S", the code will be writed by ".octa xxx" along with ".byte xxxx" see https://github.com/nodejs/node/blob/v12.x/deps/v8/src/snapshot/embedded/embedded-file-writer.cc#L165

looks like gcc assembler do not handle this situation successfully, but clang can handle successfully, I have report this to our compiler team.

do not insert source information to this file can workaround for this issue temporary, patch is attached.
1586947195106

@xen0n
Copy link

xen0n commented Apr 15, 2020

It's very nice of you to finish the investigation! We'll see how to best fix things upstream.

@FlyGoat
Copy link

FlyGoat commented Apr 15, 2020

Filled an v8 upstream issue

As it's currently restricted to Googlers, I'll paste content below.

The root cause is embedded-file-writer inserted a .byte between two .octa, and assembler's auto-align function added padding to let .octa aligned with 128bit boundary broken relative address offset between code.

What is auto-align on MIPS assembler?

Demo code.
.octa ~0x0
.word 0xdeadbeef
.octa ~0x0

on other archtectures, the binary likely to generate is:

addr   content
0x0    0xffffffff
0x4    0xffffffff
0x8    0xffffffff
0xc    0xffffffff
0x10   0xdeadbeef
0x14   0xffffffff
0x18   0xffffffff
0x1c   0xffffffff
0x20   0xffffffff

However, on MIPS, what will be generated is:
addr   content
0x0    0xffffffff
0x4    0xffffffff
0x8    0xffffffff
0xc    0xffffffff
0x10   0xdeadbeef
0x14   0x00000000
0x18   0x00000000
0x1c   0x00000000
0x20   0xffffffff
0x24   0xffffffff
0x28   0xffffffff
0x2c   0xffffffff

0x14~0x1c is auto-align padding added by the assembler and unfortunately, we can't turn it off. It will align the start of all directives into their nature boundary (128-bit for .octa).

My suggestion is we can let embedded-file-writer use 32-bit .word instead of .octa. As all MIPS (And most of other RISCs) instructions are 32-bit and .word have a 32-bit align boundary, with .word auto-align won't fill anything break our code.

@FlyGoat
Copy link

FlyGoat commented Apr 15, 2020

@bnoordhuis
As now we've addressed the issue, after the workaround, NodeJS managed to pass most of the tests on mips64el, is it possible to push mips64el into an experimental or Tier 2 level supported architecture?

@xen0n and I can provide help with general MIPS issues, @xwafish is maintaining MIPS v8 upstream, and @wzssyqa from Debian can help with toolchain & system environment issues.

We can also provide mips64el Cl machine hosted in China.

Thanks.

@bnoordhuis
Copy link
Member

@FlyGoat If that machine can be set up in a way where it's managed by our Build WG in order that they can run Jenkins etc. on it, then promoting mips64el to experimental shouldn't be a problem.

Tier 2 status means test failures block releases but the MIPS user base isn't large enough to warrant that.

@FlyGoat
Copy link

FlyGoat commented Apr 16, 2020

@bnoordhuis Who should I get in touch for that?
Should I open a new issue?
Thanks.

@bnoordhuis
Copy link
Member

@FlyGoat Can you open an issue over at https://github.com/nodejs/build/issues explaining you want to donate a machine, what specs it has, etc.? The build people will take it from there.

(Technically, I'm one of the build people but I'm no expert on how to provision machines.)

@FlyGoat
Copy link

FlyGoat commented Apr 17, 2020

@bnoordhuis
Copy link
Member

@FlyGoat You can open a back-port of the bug fix if you want. The process is outlined in https://github.com/nodejs/node/blob/master/doc/guides/maintaining-V8.md, specifically the "Backporting to Abandoned Branches" section.

@whhif
Copy link
Author

whhif commented Apr 27, 2020

The v8 patch is working on v12.x. Thank you all. @FlyGoat should open a back-port of the bug.

@whhif whhif closed this as completed Apr 27, 2020
@kapouer
Copy link
Contributor

kapouer commented May 31, 2020

It appears to be a GCC only regression.
Clang build works fine, but there are some test failures with OpenSSL, like:

=== release test-tls-honorcipherorder ===                        
Path: parallel/test-tls-honorcipherorder
_tls_common.js:129
      c.context.setCert(cert);
                ^

Error: error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small
    at Object.createSecureContext (_tls_common.js:129:17)
    at Server.setSecureContext (_tls_wrap.js:1312:27)
    at new Server (_tls_wrap.js:1176:8)
    at Object.createServer (_tls_wrap.js:1219:10)
    at test (/home/flygoat/nodejs/test/parallel/test-tls-honorcipherorder.js:30:22)
    at Object.<anonymous> (/home/flygoat/nodejs/test/parallel/test-tls-honorcipherorder.js:60:1)
    at Module._compile (internal/modules/cjs/loader.js:1204:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1224:10)
    at Module.load (internal/modules/cjs/loader.js:1053:32)
    at Function.Module._load (internal/modules/cjs/loader.js:948:14) {
  library: 'SSL routines',
  function: 'SSL_CTX_use_certificate',
  reason: 'ee key too small',
  code: 'ERR_SSL_EE_KEY_TOO_SMALL'
}

Will do some investigation.

about that ssl test failure: the tests are supposed to be run using openssl.cnf distributed with nodejs,
which doesn't happen if built against shared openssl, in which case one has to set:
OPENSSL_CONF=./deps/openssl/openssl/apps/openssl.cnf make test-js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mips Issues and PRs related to the MIPS architecture.
Projects
None yet
Development

No branches or pull requests

8 participants