-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zig cc: frexp() segfaults on Windows with bad call address #9845
Comments
Good finds. Compiling with x86_64-windows-msvc also solves the problem for me. I've been poking at this in a debugger for a while and found that the address is being rewritten by a function called dumpbin says:
On my machine, the executable loads at 00007FF7072C0000. On load the call instruction is
Adding one byte for the opcode gives an RVA for the address, 1068. We can look at memory located at typedef struct {
DWORD sym;
DWORD target;
DWORD flags;
} runtime_pseudo_reloc_item_v2; In my case, sym = 00018B28 (RVA of frexp) Then the relocator looks at the call operand, does some bitwise magic, and writes a new operand. The instruction becomes
I suppose I'll try to raise this with the people over at mingw and see what they have to say. |
TL;DRZig's libmsvcrt.a is missing Background/explanation of dllimport technicalities:When referencing a symbol
When calling a function which wasn't declared dllimport in the source code (like The same goes when accessing data:
When accessing a variable, and the generated code didn't have the variable flagged dllimport, but turns out to need to be imported from another DLL, one can't just jump through a linker-provided thunk. Instead, the linker adds entries to the runtime pseudo relocation list, which the mingw runtime then runs through at startup, and rewrites the addresses. If the addresses are located in the text section, this requires remapping the text sections as read-write-execute for a while, do the fixups, and then remap them as readonly-execute again. Also, if the code expected the target address to be within 4 GB from the source address (with 32 bit RIP-relative addressing), but the address turned out to actually be further away than that, in the 64 bit address space, the runtime pseudo relocation can't fix that. For data variables that might need to be dllimported, the compiler (both GCC and Clang) actually generate something to ease this:
So in this case, the full 64 bit address to Back to the issue at handIn the case of In that commit, a new statically linked If using UCRT instead of msvcrt.dll, then the regular Now when Zig's libmsvcrt.a didn't contain the |
TL;DR 2: There's lots of very subtle interactions between the import libs/def files and statically linked helper functions (scattered across libmingwex.a, libmsvcrt.a, libucrt.a, libws2_32.a, etc), and the helpers are moved from one lib to another occasionally. AFAIK the strategy from Zig so far has been to just include the things that empirically has been noted as needed, adding more files one by one, but at this point I think it might be worthwhile to do a full sweep trying to match what mingw-w64-crt's Makefile.am does, to avoid needing to spend time on debugging them one by one. (Most of them are probably easier to debug and analyze, but this one was a bit non-obvious.) |
Thank you for taking the time to write up such a detailed explanation; it turns out the fix was much simpler than tracking down the problem. |
Alright so after about a week of testing with mingw's frexp.c this issue has appeared again (haven't updated zig since then, I've just using the patch in the above PR). I suspect the fix never worked and there's some element of randomness to how the linker goes about resolving the function, or I'm misunderstanding how zig's msvcrt is built. |
If the mingw-w64-crt implementation of For a build that fails, can you 1) doublecheck that there really is an object file that provides |
For #1, how can you verify an object file actually contains an implementation? Based on the linker output I'm guessing it should be finding
As far as I can tell,
Linker output:
|
I'd suggest trying Also, do note that you're not supposed to have frexp.c built into libmingwex, it's supposed to be in libmsvcrt: https://github.com/mingw-w64/mingw-w64/blob/master/mingw-w64-crt/Makefile.am#L183 (It's added to the list of object files added to libmsvcrt.a.) This, because mingwex is shared across all CRT variants (msvcrt.dll and UCRT); with UCRT we don't want to provide the mingw-w64 version but just use the UCRT version, but with msvcrt.dll, we want to bundle the frexp implementation along with it. |
Alright, thanks for the guidance. I will try rebuilding zig with frexp.c as part of msvcrt. Presumably there's a difference in flags that could cause this behavior to occur. Here's the output of
|
Well it won't affect this particular bug, but it could have other effects elsewhere down the line (like unnecessarily linking in the extra helper function if you switch to UCRT).
Ok, so this shows that you've got an object file that doesn't export any public functions here, it should have e.g. |
After recompiling it works again. Now there is definitely an frexp implementation:
Unfortunately I can't see a way to print the exact compile commands zig uses when building libc for the first time. The process takes place entirely within the zig compiler. I'll ask around in case you're still interested; personally I do not understand how building frexp.c into mingwex.lib does not include the implementation. Based on
That command produces an object file with an frexp implementation for me, so I don't know if there's some other reason it's failing. |
@mstorsjo here's the compile command from the faulty build:
And the compile command from the working build for comparison:
|
[snip]
[snip] The only difference between the two, modulo other random IDs, is that one of them has |
Normally, when correctly configured, the pseudo relocations should be in fields that are large enough to hold the full target offset/address. But if the relocations nevertheless end up truncated, error out clearly instead of running into a hard to diagnose crash at runtime. The pseudo relocations can be applied both on absolute pointers and relative offsets, so when writing a N bit number, we don't know if the limits for it are unsigned or signed. Thus carefully allow values from -(2^(N-1)) to (2^N)-1, covering the full range for both signed and unsigned N bit numbers. This won't catch all cases where offsets are out of bounds, but should catch the vast majority, allowing a clearer error message in those situations. By default, GCC builds for x86_64 with the medium code model, which adds .refptr stubs when referencing addresses that might end up autoimported (i.e. when referencing addresses that can be out of range for a 32 bit offset). Some users, who don't expect to be autoimporting any data symbols, might be building with -mcmodel=small [1], which avoids this extra indirection - but which then silently breaks things if actually ending up autoimporting data symbols from another DLL. This can also happen if calling a function which is marked "DATA" in the def files as it's not meant to be called/used normally (because we provide a replacement in libmingwex or lib*crt* that we think should be used instead). If the function that is meant to be called is missing (this can happen in misconfigured builds where the libraries are lacking symbols that we expect to provide, see [2]), the linker can end up doing an autoimport of the function into a 32 bit RIP-relative offset. (This only happens with Clang; GCC creates a .refptr stub for the function in these cases, while Clang expects such stubs not to be needed for functions, only for data.) [1] https://code.videolan.org/videolan/dav1d/-/commit/8f7af99687533d15a9b5d16abc7b9d7b0cd4dcd0 [2] ziglang/zig#9845 Signed-off-by: Martin Storsjö <martin@martin.st>
Platform: Windows 10 20H2 x64.
Zig 1f2f9f0 built with MSVC 16.10.2.30804 using llvm+clang+lld-12.0.1-rc1-x86_64-windows-msvc-release-mt from the wiki page.
Compiling with that same version of clang produces a working executable.
Code:
Compile with
zig cc -g test.c
This prints:
frexp at 00007ff5b19ee9d0
and then segfaults. It doesn't matter if the build is -O0 or -O3.lldb says:
The call right before the arrow is suspect,
callq 0x7ff5b19ee9d0
This address is not executable (and it is not mapped), note that all code exists above
0x7ff6089a0000
:I'm not really sure how to go about debugging this further, any assistance would be greatly appreciated.
The text was updated successfully, but these errors were encountered: