Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug symbols missing when running with valgrind #896

Closed
andrewrk opened this issue Apr 5, 2018 · 9 comments
Closed

debug symbols missing when running with valgrind #896

andrewrk opened this issue Apr 5, 2018 · 9 comments
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. downstream An issue with a third party project that uses Zig.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Apr 5, 2018

pub fn main() void {
    foo().* += 1;
}
fn foo() *i32 {
    return @intToPtr(*i32, 10000000);
}
andy@xps ~/tmp> valgrind ./test
==15761== Memcheck, a memory error detector
==15761== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==15761== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==15761== Command: ./test
==15761== 
==15761== Invalid read of size 4
==15761==    at 0x21DDA8: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD88: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD1A: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DBAB: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DB2F: ??? (in /home/andy/tmp/test)
==15761==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd
==15761== 
==15761== 
==15761== Process terminating with default action of signal 11 (SIGSEGV)
==15761==  Access not within mapped region at address 0x989680
==15761==    at 0x21DDA8: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD88: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DD1A: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DBAB: ??? (in /home/andy/tmp/test)
==15761==    by 0x21DB2F: ??? (in /home/andy/tmp/test)
==15761==  If you believe this happened as a result of a stack
==15761==  overflow in your program's main thread (unlikely but
==15761==  possible), you can try to increase the size of the
==15761==  main thread stack using the --main-stacksize= flag.
==15761==  The main thread stack size used in this run was 8388608.
==15761== 
==15761== HEAP SUMMARY:
==15761==     in use at exit: 0 bytes in 0 blocks
==15761==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==15761== 
==15761== All heap blocks were freed -- no leaks are possible
==15761== 
==15761== For counts of detected and suppressed errors, rerun with: -v
==15761== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
fish: “valgrind ./test” terminated by signal SIGSEGV (Address boundary error)

however, with gdb the symbols are read just fine.

@andrewrk andrewrk added this to the 0.3.0 milestone Apr 5, 2018
@andrewrk
Copy link
Member Author

andrewrk commented Apr 5, 2018

I can reproduce this with trunk clang:

static int *foo(void) {
    return (int *)10000000;
}

void _start(void) {
    int *x = foo();
    *x += 1;
}
[nix-shell:~/tmp]$ ~/local/llvm7-debug/bin/clang -o test test.c  -nostdlib -g
[nix-shell:~/tmp]$ valgrind ./test
==18185== Memcheck, a memory error detector
==18185== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18185== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==18185== Command: ./test
==18185== 
==18185== Invalid read of size 4
==18185==    at 0x400135: ??? (in /home/andy/tmp/test)
==18185==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd

@bnoordhuis
Copy link
Contributor

Does it work if you run objcopy --decompress-debug-sections on the binary afterwards?

Ref: https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1567219

@andrewrk
Copy link
Member Author

andrewrk commented Apr 5, 2018

Same behavior after --decompress-debug-sections.

More info: The above C code reproduces the problem with GCC as well as clang.

From #valgrind-dev on freenode:

<tomhughes> that is (more or less by accident) producing a statically linked executable and those don't generally work very well with valgrind
<tomhughes> in particular in this case I think something about it is stopping valgrind recognising it as ELF when it gets mapped so it never tries to read symbols or debug data from it

I believe this is a valgrind issue. Perhaps we can send a patch.

@andrewrk andrewrk added upstream An issue with a third party project that Zig uses. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. labels Apr 5, 2018
@andrewrk
Copy link
Member Author

andrewrk commented Apr 5, 2018

I've narrowed this down to linking with LLD vs linking with binutils ld.

static int *foo(void) {
    return (int *)10000000;
}

int main(void) {
    int *x = foo();
    *x += 1;
}

Create an .o file:

~/local/llvm7-debug/bin/clang -c test.c -g

Find out link command zig is using:

zig build-exe test.zig  --library c --verbose-link

Use the same link command printed from above step, but replace LLD with binutils ld, and replace zig-cache/test.o with the test.o produced from clang above:

ld --gc-sections -m elf_x86_64 -o test /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/Scrt1.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crti.o /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtbegin.o -L /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib -L /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0 -dynamic-linker /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/ld-linux-x86-64.so.2 test.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lm -lgcc --as-needed -lgcc_s --no-as-needed /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtend.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crtn.o

Note that valgrind sees debug info:

==23143==    at 0x400577: main (test.c:7)

Now repeat link but with LLD:

~/local/llvm7-debug/bin/ld.lld --gc-sections -m elf_x86_64 -o test /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/Scrt1.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crti.o /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtbegin.o -L /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib -L /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0 -dynamic-linker /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/ld-linux-x86-64.so.2 test.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lm -lgcc --as-needed -lgcc_s --no-as-needed /nix/store/cm5znbhvylrh702hhsy9hnad7wk3v7xv-gcc-7.3.0/lib/gcc/x86_64-unknown-linux-gnu/7.3.0/crtend.o /nix/store/84h2zni7h805k0i1ys2bba3dsp1cqnhh-glibc-2.26-131/lib/crtn.o

Note that valgrind does not see debug info:

==23136==    at 0x2010F7: ??? (in /home/andy/tmp/test)

@andrewrk
Copy link
Member Author

andrewrk commented Apr 5, 2018

upstream bug report: https://bugs.llvm.org/show_bug.cgi?id=37021

@andrewrk
Copy link
Member Author

andrewrk commented Apr 5, 2018

Workaround suggested by llvm-dev mailing list:

diff --git a/src/link.cpp b/src/link.cpp
index 3c6e27e3..14e8cb52 100644
--- a/src/link.cpp
+++ b/src/link.cpp
@@ -217,6 +217,7 @@ static void construct_linker_job_elf(LinkJob *lj) {
         lj->args.append(g->linker_script);
     }
 
+    lj->args.append("--no-rosegment");
     lj->args.append("--gc-sections");
 
     lj->args.append("-m");

I'm considering if we should use this. I don't fully understand the implications yet, but it does fix valgrind's debug info.

@andrewrk
Copy link
Member Author

andrewrk commented Apr 5, 2018

--rosegment puts read-only data and executable text in a single segment and make it executable, so your read-only code becomes executable. I think that shouldn't be a default behavior. If valgrind needs that, we can't fix it by changing the default.

But I can't think of a reason that valgrind needs executable data segment. It is likely that valgrind assumes some specific segment layout that the GNU linker creates and can't handle other layout even if it is valid.

I think this makes sense. I'm now convinced this is a bug in valgrind and I will work with the valgrind devs to fix the issue.

@andrewrk
Copy link
Member Author

andrewrk commented Apr 8, 2018

https://sourceforge.net/p/valgrind/mailman/message/36286103/

It looks like the devs would appreciate a patch.

This patch does some of the work and should be a clue as to what further changes need to be made for valgrind to support executables compiled with clang and linked with LLD:

diff --git a/coregrind/m_debuginfo/debuginfo.c b/coregrind/m_debuginfo/debuginfo.c
index c8a6124a2..dbe6da10a 100644
--- a/coregrind/m_debuginfo/debuginfo.c
+++ b/coregrind/m_debuginfo/debuginfo.c
@@ -1126,9 +1126,7 @@ ULong VG_(di_notify_mmap)( Addr a, Bool allow_SkFileV, Int use_fd )
 #    error "Unknown platform"
 #  endif
 
-#  if defined(VGP_x86_darwin) && DARWIN_VERS >= DARWIN_10_7
    is_ro_map = seg->hasR && !seg->hasW && !seg->hasX;
-#  endif
 
 #  if defined(VGO_solaris)
    is_rx_map = seg->hasR && seg->hasX && !seg->hasW;
diff --git a/coregrind/m_debuginfo/readelf.c b/coregrind/m_debuginfo/readelf.c
index 70c28e629..106a6ec0c 100644
--- a/coregrind/m_debuginfo/readelf.c
+++ b/coregrind/m_debuginfo/readelf.c
@@ -1797,6 +1797,12 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
          TRACE_SYMTAB("rw_map:  avma %#lx   size %lu  foff %ld\n",
                       map->avma, map->size, map->foff);
    }
+   for (i = 0; i < VG_(sizeXA)(di->fsm.maps); i++) {
+      const DebugInfoMapping* map = VG_(indexXA)(di->fsm.maps, i);
+      if (map->ro)
+         TRACE_SYMTAB("ro_map:  avma %#lx   size %lu  foff %ld\n",
+                      map->avma, map->size, map->foff);
+   }
 
    if (phdr_mnent == 0
        || !ML_(img_valid)(mimg, phdr_mioff, phdr_mnent * phdr_ment_szB)) {
@@ -1877,7 +1883,7 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
                Bool loaded = False;
                for (j = 0; j < VG_(sizeXA)(di->fsm.maps); j++) {
                   const DebugInfoMapping* map = VG_(indexXA)(di->fsm.maps, j);
-                  if (   (map->rx || map->rw)
+                  if (   (map->rx || map->rw || map->ro)
                       && map->size > 0 /* stay sane */
                       && a_phdr.p_offset >= map->foff
                       && a_phdr.p_offset <  map->foff + map->size
@@ -1908,6 +1914,16 @@ Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
                            i, (UWord)item.bias);
                         loaded = True;
                      }
+                     if (map->ro
+                         && (a_phdr.p_flags & (PF_R | PF_W | PF_X))
+                            == PF_R) {
+                        item.exec = False;
+                        VG_(addToXA)(svma_ranges, &item);
+                        TRACE_SYMTAB(
+                           "PT_LOAD[%ld]:   acquired as ro, bias 0x%lx\n",
+                           i, (UWord)item.bias);
+                        loaded = True;
+                     }
                   }
                }
                if (!loaded) {

@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Apr 8, 2018
andrewrk added a commit that referenced this issue Apr 18, 2018
this provides a workaround for #896
until valgrind adds support for clang/LLD
(equivalent to gcc/gold -rosegment)
@andrewrk andrewrk added downstream An issue with a third party project that uses Zig. and removed upstream An issue with a third party project that Zig uses. labels Aug 22, 2018
@andrewrk andrewrk modified the milestones: 0.4.0, 0.5.0 Feb 7, 2019
@andrewrk
Copy link
Member Author

This issue appears to have been fixed with Valgrind 3.14 or LLVM 7 or the combination of both.

andrewrk added a commit that referenced this issue Feb 19, 2019
See #896

Zig 0.3.0+ and Valgrind 3.14+ do not need the workaround.
@andrewrk andrewrk modified the milestones: 0.5.0, 0.4.0 Apr 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor friendly This issue is limited in scope and/or knowledge of Zig internals. downstream An issue with a third party project that uses Zig.
Projects
None yet
Development

No branches or pull requests

2 participants