Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug symbols not recognized by Valgrind if .data section is omitted #15254

Open
squeek502 opened this issue Apr 12, 2023 · 3 comments
Open

Debug symbols not recognized by Valgrind if .data section is omitted #15254

squeek502 opened this issue Apr 12, 2023 · 3 comments
Labels
bug Observed behavior contradicts documented or intended behavior downstream An issue with a third party project that uses Zig.
Milestone

Comments

@squeek502
Copy link
Collaborator

squeek502 commented Apr 12, 2023

EDIT: Here's a potential workaround:

// force a .data section in the executable to ensure Valgrind debug info works,
// see https://github.com/ziglang/zig/issues/15254
export var foo: usize = 1;

pub fn main() !void {
    // ...
    foo += 1; // make sure the foo variable doesn't get optimized out
}

There might be a better way to go about it, but this has worked for me.


Zig Version

0.11.0-dev.2546+cb54e9a3c

Steps to Reproduce and Observed Behavior

Similar to #896 but doesn't seem to be the same cause (the --no-rosegment workaround does not change anything).

Same test file as #896:

pub fn main() void {
    foo().* += 1;
}
fn foo() *i32 {
    return @intToPtr(*i32, 10000000);
}

Debug symbols do not work with Zig 0.11.0-dev.2546+cb54e9a3c and Valgrind 3.20.0 (tested with older Valgrinds [3.17.0, 3.13.0] and they all work the same so this does not seem to be a Valgrind regression):

$ ~/Downloads/zig-linux-x86_64-0.11.0-dev.2546+cb54e9a3c/zig build-exe main.zig --verbose-link
LLD Link... ld.lld --error-limit=0 -O0 -z stack-size=16777216 --gc-sections -znow -m elf_x86_64 -static -o main main.o /home/ryan/.cache/zig/o/127027172500ef2ec1954339a99cad40/libc.a --as-needed /home/ryan/.cache/zig/o/56279d70bb76b973f322c635848360b7/libcompiler_rt.a
$ valgrind ./main
==2331200== Invalid read of size 4
==2331200==    at 0x20B504: ??? (in /home/ryan/Programming/zig/tmp/valgrind-test/main)
==2331200==    by 0x20AA75: ??? (in /home/ryan/Programming/zig/tmp/valgrind-test/main)
==2331200==    by 0x20A521: ??? (in /home/ryan/Programming/zig/tmp/valgrind-test/main)
==2331200==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd

But debug symbols do work with Zig 0.8.0 (this seems to be the latest version that it still worked with, 0.9.0 contains the regression):

$ ~/Downloads/zig-linux-x86_64-0.8.0/zig build-exe main.zig --verbose-link
ld.lld -error-limit=0 -z stack-size=16777216 --gc-sections -m elf_x86_64 -static -o main src/zig-cache/o/a8d722ab2e9f87901a4eaa9324ec91db/main.o /home/ryan/.cache/zig/o/2f79cc6f403d84cbd92f58b396afde89/libc.a /home/ryan/.cache/zig/o/0c6fb8a0904a42c388dd6494d726db60/libcompiler_rt.a
$ valgrind ./main
==2867009== Invalid read of size 4
==2867009==    at 0x22A395: main (main.zig:2)
==2867009==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd

However, if linking libc (statically with musl or dynamically with glibc), the debug symbols will work fine again:

$ ~/Downloads/zig-linux-x86_64-0.11.0-dev.2546+cb54e9a3c/zig build-exe main.zig --verbose-link -target x86_64-linux-musl -lc
LLD Link... ld.lld --error-limit=0 -O0 -z stack-size=16777216 --gc-sections -znow -m elf_x86_64 -static -o main /home/ryan/.cache/zig/o/7e08d8a5a02fcdc41c637dfd258dfd9b/crt1.o /home/ryan/.cache/zig/o/2471e2ac2b5e9b845351f84f88b64476/crti.o main.o --as-needed /home/ryan/.cache/zig/o/908dbd08225e4f05289d64d63425d966/libc.a /home/ryan/.cache/zig/o/0778243584f6504ffef50f624aea0590/libcompiler_rt.a /home/ryan/.cache/zig/o/6563952968b463fb667b5a0882efdf52/crtn.o --allow-shlib-undefined
$ valgrind ./main
==3801512== Invalid read of size 4
==3801512==    at 0x20A081: main.main (main.zig:2)
==3801512==    by 0x20A5FF: callMain (start.zig:618)
==3801512==    by 0x20A5FF: initEventLoopAndCallMain (start.zig:562)
==3801512==    by 0x20A5FF: callMainWithArgs (start.zig:512)
==3801512==    by 0x20A5FF: main (start.zig:527)
==3801512==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd

Expected Behavior

Debug symbols to work with Valgrind when not linking libc.

@squeek502 squeek502 added the bug Observed behavior contradicts documented or intended behavior label Apr 12, 2023
@squeek502
Copy link
Collaborator Author

squeek502 commented Apr 12, 2023

Here's a zip with a working exe (compiled with 0.8.0), a non-working exe (compiled with 0.11.0-dev.2546+cb54e9a3c), and a working exe linked against musl (compiled with 0.11.0-dev.2546+cb54e9a3c) if that is helpful to anyone that would gain something from comparing them:

valgrind-regression-exes.zip

@squeek502 squeek502 changed the title Valgrind regression: Debug symbols not recognized unless exe linked against libc Debug symbols not recognized by Valgrind unless exe linked against libc Apr 12, 2023
@squeek502
Copy link
Collaborator Author

squeek502 commented Apr 12, 2023

Found a reproduction without the libc linking. If you modify main.zig to:

export var bar: usize = 1;

pub fn main() void {
    foo().* += 1;
}
fn foo() *i32 {
    bar += 1;
    return @intToPtr(*i32, 10000000);
}

then the exe will include a .data section, which Valgrind seems to need for some reason. If the .data section is omitted, then we see the lack of debug symbols. So, likely to be a downstream Valgrind bug I think.

(Zig 0.8.0 unconditionally added a .data section which explains why it worked then. This also explains why another project of mine did not experience this problem with newer Zig versions than 0.8.0 since it happened to need a .data section so the debug info worked fine)


EDIT: It might be more complicated than just the .data section being omitted. If I compile a .c file with clang that ends up without a .data section (e.g. with -nostdlib -Wl,--gc-sections), Valgrind still finds the debug symbols. If Zig compiles the .c file, Valgrind does not find the debug symbols (and if I copy the exact commands from --verbose-cc and --verbose-link and run them directly with clang and ld.lld then I can reproduce the problem).

EDIT#2: Got a reproduction with clang/ld.lld:

// test.c
static int *foo(void) {
    return (int *)10000000;
}

int main(void) {
    int *x = foo();
    *x += 1;
}

__attribute__((force_align_arg_pointer))
void _start() {
    main();

    asm("movl $1,%eax;"
        "xorl %ebx,%ebx;"
        "int  $0x80"
    );
    __builtin_unreachable();
}
$ clang test.c -g -c -o test.o
$ ld.lld -o test test.o
$ readelf --hex-dump=.data test
readelf: Warning: Section '.data' was not dumped because it does not exist!
$ valgrind ./test
==3509349== Invalid read of size 4
==3509349==    at 0x2011B5: ??? (in /home/ryan/Programming/zig/tmp/valgrind-test/test)
==3509349==    by 0x2011EC: ??? (in /home/ryan/Programming/zig/tmp/valgrind-test/test)
==3509349==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd

But if -pie is added to the ld.lld command:

$ clang test.c -g -c -o test.o
$ ld.lld -o test test.o -pie
$ readelf --hex-dump=.data test
readelf: Warning: Section '.data' was not dumped because it does not exist!
$ valgrind ./test
==3709531== Invalid read of size 4
==3709531==    at 0x1092A5: main (test.c:8)
==3709531==  Address 0x989680 is not stack'd, malloc'd or (recently) free'd

So something to do with .data and/or -pie and/or some combination.

EDIT#3: It seems to be some combination (or some other confounding factor), as zig build-exe main.zig -fPIE without a .data section still has unrecognized debug symbols, and zig build-exe main.zig -fno-PIE with a .data section forced via export var has working debug symbols.

@squeek502 squeek502 changed the title Debug symbols not recognized by Valgrind unless exe linked against libc Debug symbols not recognized by Valgrind if .data section is omitted Apr 12, 2023
@andrewrk andrewrk added the downstream An issue with a third party project that uses Zig. label Jul 23, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Jul 23, 2023
@andrewrk
Copy link
Member

I suppose we could solve this by automatically emitting a data section, even if it is empty, in the case of valgrind being enabled.

@andrewrk andrewrk modified the milestones: 0.14.0, 0.15.0 Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior downstream An issue with a third party project that uses Zig.
Projects
None yet
Development

No branches or pull requests

2 participants