Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

malloc error under OS X when installed from source #3165

Closed
TrevorBurnham opened this issue Apr 23, 2012 · 27 comments
Closed

malloc error under OS X when installed from source #3165

TrevorBurnham opened this issue Apr 23, 2012 · 27 comments

Comments

@TrevorBurnham
Copy link

I experienced the exact error described here recently:

node(9913,0x7fff7b374960) malloc: *** error for object 0x125400618: incorrect checksum for freed object - object was probably modified after being freed.

Steps to replicate:

  1. brew install node (as of Node 0.6.15); this just does a simple install from source
  2. node -e "(require('fs')).watch('a.txt', function(){});" (in a folder where a.txt exists)

When Node is installed via the Mac installer, the error does not occur and fs.watch operates normally.

Related: #2061. Pinging @bnoordhuis.

@bnoordhuis
Copy link
Member

Cannot reproduce.

$ $ brew info node
node 0.6.15
http://nodejs.org/
/usr/local/Cellar/node/0.6.15 (80 files, 7.6M) *
https://github.com/mxcl/homebrew/commits/master/Library/Formula/node.rb

==> Caveats
Homebrew has NOT installed npm. We recommend the following method of
installation:
  curl http://npmjs.org/install.sh | sh

After installing, add the following path to your NODE_PATH environment
variable to have npm libraries picked up:
  /usr/local/lib/node_modules
$ file /usr/local/bin/node
/usr/local/bin/node: Mach-O 64-bit executable x86_64
$ /usr/local/bin/node -v
v0.6.15

Can you post the output of brew install -v node and xxx -v where xxx is the compiler that's used to compile node? In my case that's the llvm-g++ that ships with xcode 4.1 (which is probably outdated, brew is complaining about that).

$ /usr/bin/llvm-g++ -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2335.15~25/src/configure --disable-checking --enable-werror --prefix=/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2335.15~25/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)

@TrevorBurnham
Copy link
Author

My compiler appears to be identical:

$ /usr/bin/llvm-g++ -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2335.15~25/src/configure --disable-checking --enable-werror --prefix=/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2335.15~25/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)

Here's the output from brew install -v node: https://gist.github.com/2480684

@bnoordhuis
Copy link
Member

Is the compiled binary 32 or 64 bits?

@TrevorBurnham
Copy link
Author

64, same as yours.

On Apr 24, 2012, at 11:52 AM, Ben Noordhuisreply@reply.github.com wrote:

Is the compiled binary 32 or 64 bits?


Reply to this email directly or view it on GitHub:
#3165 (comment)

@bnoordhuis
Copy link
Member

Okay, in that case it must be something with your local setup because it works for me. Valgrind is not complaining either.

@TrevorBurnham
Copy link
Author

Alright, but it's not just me; Googling the exact error message yields several other reports. Here's what gdb gives me when I backtrace:

#0  0x00007fff962aace2 in __pthread_kill ()
#1  0x00007fff95a0f7d2 in pthread_kill ()
#2  0x00007fff95a00a7a in abort ()
#3  0x00007fff95a224ac in szone_error ()
#4  0x00007fff95a224e8 in free_list_checksum_botch ()
#5  0x00007fff95a2953e in tiny_malloc_from_free_list ()
#6  0x00007fff95a2a00e in szone_malloc_should_clear ()
#7  0x00007fff95a5f3c8 in malloc_zone_malloc ()
#8  0x00007fff95a601a4 in malloc ()
#9  0x00007fff9215068e in operator new ()
#10 0x00007fff921506db in operator new[] ()
#11 0x00000001001afb1e in v8::internal::LiteralBuffer::AddChar ()

I hope that helps.

@TrevorBurnham
Copy link
Author

By the way, additional testing finds that the error is not deterministic. Running my simple fs.watch, I get the reported error ~70% of the time, a Segmentation fault: 11 about 20% of the time, and success about 10% of the time. So, you may want to try it several times in order to replicate.

Also, the error appears to occur only in node 0.6.12+.

@bnoordhuis
Copy link
Member

Trevor, is that the whole backtrace?

So, you may want to try it several times in order to replicate.

I ran it 100 times in a loop (added a one second timeout). It never segfaults or aborts.

@TrevorBurnham
Copy link
Author

Yes, that's the entire backtrace corresponding to the "incorrect checksum for freed object" error. Running it a few more times, here's another one:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00000010025c0aa0
0x00007fff95a29648 in tiny_malloc_from_free_list ()

(gdb) backtrace

#0  0x00007fff95a29648 in tiny_malloc_from_free_list ()
#1  0x00007fff95a2a00e in szone_malloc_should_clear ()
#2  0x00007fff95a5f3c8 in malloc_zone_malloc ()
#3  0x00007fff95a601a4 in malloc ()
#4  0x00007fff9215068e in operator new ()
#5  0x00007fff921506db in operator new[] ()
#6  0x00000001001afb1e in v8::internal::LiteralBuffer::AddChar ()

@bnoordhuis
Copy link
Member

Okay, that's odd because v8::internal::LiteralBuffer::AddChar is only used when parsing script source or JSON. Can you try two things?

  1. Test a debug build. Please post the output of backtrace full.
  2. Run it through valgrind and check for invalid reads/writes.

I speculate that something corrupts the free list some time before the parser allocates scratch space.

@TrevorBurnham
Copy link
Author

Test a debug build.

Sorry, could you tell me precisely how you want me to do the build?

Run it through valgrind

Done, check it out.

@bnoordhuis
Copy link
Member

Sorry, could you tell me precisely how you want me to do the build?

Run configure with --debug (you'll probably have to hack the brew formula for that).

Done, check it out.

I think you have a busted valgrind install because it's failing inside valgrind itself.... That illegal opcode it's complaining about is an IRET (interrupt return), the address (0x1000) is conspicuous too.

@TrevorBurnham
Copy link
Author

I think you have a busted valgrind install because it's failing inside valgrind itself

Hmm, well it does report problems for even node -e "", but it doesn't report any errors for, say, node -h, so I'd say something's up. Here's another report.

@TrevorBurnham
Copy link
Author

OK, I did a --debug build of Node 0.6.15; here's the gdb backtrace: https://gist.github.com/2485151

@bnoordhuis
Copy link
Member

Trevor, I believe brew installs a node_g binary (that's the one with debug symbols) if you run the formula with --enable-debug. Could you try it with that?

@TrevorBurnham
Copy link
Author

Huh, so it looks like there's a bug in Node's build script: the node_g binary isn't copied to the destination directory.

In any event, once I found node_g in the source directory, I was able to run gdb on it. Here's the output.

@bnoordhuis
Copy link
Member

Thanks, Trevor. I've been going over the code but I can't find anything that could reasonably explain the behavior that you're seeing. Does it also happen when you build 0.6.15 from source yourself (i.e. from the nodejs.org tarball or the repository)?

@TrevorBurnham
Copy link
Author

Does it also happen when you build 0.6.15 from source yourself (i.e. from the nodejs.org tarball or the repository)?

Yes. I just tried cloning the repo, checking out 0.6.16, and running

$ ./configure --prefix=~/node-install --without-npm --debug
$ make
$ make install

Then I copied node_g to ~/node-install/bin, ls-ed to ~/node-install, touched a.txt, and had the following interaction:

$ ./bin/node_g -e "require('fs').watch('a.txt', function(){});"
Segmentation fault: 11
$ ./bin/node_g -e "require('fs').watch('a.txt', function(){});"
Segmentation fault: 11
$ ./bin/node_g -e "require('fs').watch('a.txt', function(){});"
node_g(86636,0x7fff7b374960) malloc: *** error for object 0x125b00098: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Would you like another gdb backtrace?

@bnoordhuis
Copy link
Member

Would you like another gdb backtrace?

Yes please. Is this version of node built with the same compiler?

@TrevorBurnham
Copy link
Author

Is this version of node built with the same compiler?

No, I found my way to Xcode 4.3.2's command line tools installer, which means my LLVM build is slightly newer than previously reported:

$ /usr/bin/llvm-g++ -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.9~22/src/configure --disable-checking --enable-werror --prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.9~22/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00)

Here's the gdb backtrace for the malloc error: https://gist.github.com/2603860

And here's another that (I think) corresponds to the segmentation fault error: https://gist.github.com/2603829

@bnoordhuis
Copy link
Member

This is something of a long shot but does -fno-strict-aliasing resolve the issue? Here is a patch:

diff --git a/deps/v8/SConstruct b/deps/v8/SConstruct
index fc67dc5..22e2242 100644
--- a/deps/v8/SConstruct
+++ b/deps/v8/SConstruct
@@ -928,6 +928,7 @@ def GuessVisibility(env):


 def GuessStrictAliasing(env):
+  return 'off'
   # There seems to be a problem with gcc 4.5.x.
   # See http://code.google.com/p/v8/issues/detail?id=884
   # It can be worked around by disabling strict aliasing.

Make sure you do a make distclean after applying it. Do you also see crashes with the current master?

@TrevorBurnham
Copy link
Author

Interestingly, it does not happen on master (0.7.x), but it is still happening under 0.6.17. Trying the patch now...

@TrevorBurnham
Copy link
Author

The patch had no apparent effect. Any other ideas?

@jcrugzz
Copy link

jcrugzz commented Jul 4, 2012

Just encountered the same issue with node v.8.1 installing from source (via Homebrew). Uninstalled it and used the OS X installer from nodejs.org and it works fine. Any word as to what could have caused this?

@bnoordhuis
Copy link
Member

I can speculate (different compilers, different compiler flags, etc.) but frankly, I've stopped second-guessing what brew does.

Second-guessing and supporting, really. I used to get a lot of bug reports from brew users whose problems went away the moment they started using the official build.

@TrevorBurnham
Copy link
Author

@bnoordhuis But we've established that Homebrew isn't doing anything special; it just runs ./configure --prefix=/usr/local --without-npm and make install. (See https://github.com/lavoiesl/homebrew/blob/master/Library/Formula/node.rb.) I was able to replicate the issue by running those commands manually.

Having said that, I'm unable to replicate the issue as of 0.8.x. I've installed it via Homebrew with no apparent problems. @jcrugzz, can you be more specific about what you're encountering? It may be a separate issue.

@jcrugzz
Copy link

jcrugzz commented Jul 5, 2012

@TrevorBurnham Yes I did speak to soon as the problem did not go away as I wished it did hah. I believe I found the root of my problem. I was doing the following:

var uuid = require('node-uuid'),
mongoose = require('mongoose'),
Buf = mongoose.Types.Buffer;

socket.on('set', function(data, callback) {
    var uId = uuid.v1();
    client.set('uId', uId);
    var newUser = new User({'_id': new Buf(uId).toObject(), data: data});
    newUser.save() etc...

All i needed to add was a var nUid = uId; after uId was created and set the next line to client.set('uId', nUid); and there was no issue. I did not know that the original string object was modified in creating the binary object causing it to fail.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants