Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption sometimes. #1326

Open
vext01 opened this issue Jul 29, 2024 · 7 comments
Open

Memory corruption sometimes. #1326

vext01 opened this issue Jul 29, 2024 · 7 comments
Assignees

Comments

@vext01
Copy link
Contributor

vext01 commented Jul 29, 2024

With a release build of yklua you can shake various memory corruption crashes out when running the test suite.

I've only observed this with a release build and with YKD_SERIALISE_COMPILATION=0 together.

Example:

$ YKD_SERIALISE_COMPILATION=0 ../src/lua -e"_U=true" all.lua
...
free(): invalid size
[Thread 0x7fffe67fc6c0 (LWP 2471092) exited]

Thread 1 "lua" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff7c1ee8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7bcffb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7bba472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7c13430 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7d2d459 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x00007ffff7c287aa in malloc_printerr (str=str@entry=0x7ffff7d2b0c9 "free(): invalid size") at ./malloc/malloc.c:5660
#6  0x00007ffff7c2a544 in _int_free (av=<optimized out>, p=0xe212c0, have_lock=have_lock@entry=0) at ./malloc/malloc.c:4439
#7  0x00007ffff7c2ce8f in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3385
#8  0x00007ffff7efb106 in __replace_stack () from /home/vext01/research/yk/bin/../target/release/deps/libykcapi.so
#9  0x00007ffff7a7dcf8 in ?? ()
#10 0x00007ffff7a7dd50 in ?? ()
#11 0x00007ffff7a7dd70 in ?? ()
#12 0x00007ffff7a7dc6c in ?? ()
#13 0x0000000000000000 in ?? ()

Discussed briefly with @ptersilie and this particular trace looks like free(3) crashing when releasing scratch memory used during stack reconstruction.

@ltratt
Copy link
Contributor

ltratt commented Jul 29, 2024

@vext01 When #1318 merges, can you retry this please? There is a fix (in ykllvm) which might fix this problem.

@vext01
Copy link
Contributor Author

vext01 commented Jul 29, 2024

Using yk master as of about 10 minutes ago (65e45bea3762a), I was able to get similar crashes in release mode.

The trace looks pretty much the same:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff7c1de8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7bcefb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7bb9472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7c12430 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7d2c459 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x00007ffff7c277aa in malloc_printerr (str=str@entry=0x7ffff7d2ebb8 "munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5660
#6  0x00007ffff7c2796c in munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3054
#7  0x00007ffff7c2bed8 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3375
#8  0x00007ffff7f19d76 in __replace_stack () from /home/vext01/research/yk/bin/../target/release/deps/libykcapi.so
#9  0x00007ffff7a7ccf8 in ?? ()
#10 0x00007ffff7a7cd50 in ?? ()
#11 0x00007ffff7a7cd70 in ?? ()
#12 0x00007ffff7a7cc6c in ?? ()
#13 0x00007ffff7a7c4b0 in ?? ()
#14 0x0000000000000000 in ?? ()

@ltratt
Copy link
Contributor

ltratt commented Jul 31, 2024

I can't replicate this. I suggest trying (in order):

  1. make clean and make your yklua. If it was built with an old ykllvm, all bets are off.
  2. A totally fresh yk + ykllvm clone.

My guess is that (1) is the most likely culprit, but I'm not really sure.

@vext01
Copy link
Contributor Author

vext01 commented Jul 31, 2024

I always make clean, but today I'll try to make a reproducible script showing the problem.

@vext01
Copy link
Contributor Author

vext01 commented Jul 31, 2024

I was unable to reproduce this using the following script:

#!/bin/sh

set -xeu

unset YKB_YKLLVM_BIN_DIR

if [ ! -e yk ]; then
	git clone --recurse-submodules https://github.com/ykjit/yk
fi

if [ ! -e yklua ]; then
	git clone https://github.com/ykjit/yklua
fi

if [ ! -e try_repeat ]; then
	git clone https://github.com/ltratt/try_repeat
fi

cd yk
cargo test --release
export PATH=$(pwd)/bin:${PATH}
cd ..

cd yklua
export YK_BUILD_TYPE=release

make clean
make -j $(nproc)
cd tests
YKD_SERIALISE_COMPILATION=0 ../../try_repeat/try_repeat 100 \
	../src/lua -e"_U=true" all.lua
YKD_SERIALISE_COMPILATION=1 ../../try_repeat/try_repeat 100 \
	../src/lua -e"_U=true" all.lua
cd ..

# Repeat using YKB_YKLLVM_BIN_DIR in case it has to do with that.
make clean
export YKB_YKLLVM_BIN_DIR=$(pwd)/../yk/target/release/ykllvm/bin
make -j $(nproc)
cd tests
YKD_SERIALISE_COMPILATION=0 ../../try_repeat/try_repeat 100 \
	../src/lua -e"_U=true" all.lua
YKD_SERIALISE_COMPILATION=1 ../../try_repeat/try_repeat 100 \
	../src/lua -e"_U=true" all.lua

So I can only assume it's something up the local setup I was using, or a recent commit fixed it.

Will close.

@vext01
Copy link
Contributor Author

vext01 commented Aug 2, 2024

We just saw some pointer corruption in a CI job. This time it was while running the awfy benchmarks. Reopening.

ykjit/yk-benchmarks#3 (comment)

@vext01 vext01 reopened this Aug 2, 2024
@vext01 vext01 changed the title Memory corruption in release runs of yklua test suite. Memory corruption sometimes. Aug 2, 2024
@ltratt
Copy link
Contributor

ltratt commented Aug 2, 2024

I think try_repeat and gdb might be in order?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants