-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703
Comments
Honestly this is a nontrivial issue to solve imo, probably something that needs investigating into how it should be solved in stage2 too, unless there has been a decision made to use 2 registers for slices in there. |
This whole thing is actually one of the parts of Zig having aliasing problems. Simply fixing it so it works with slices just moves the pitfalls somewhere else. Register allocation and graph colouring is unfortunately a hard problem. And I think if Zig wants to have fancy semantics with the whole automatic magic pointer passing optimization we need to have a nice long think about the formal semantics of how to make it work. I'll see if I can come up with something that makes sense. No guarantees whatsoever though. |
@AssortedFantasy @N00byEdge A proper working escape analysis can detect this, but it has a compilation-time + space as price. But maybe there are only some logical bugs. LLVM side Zig side Probably skipping the complexity by only checking for noalias annotations on a forced tail call ( |
Wait, isn't it as easy as when doing allocation for arguments, to never place anything on the stack before a tailcall? |
Are you suggesting to move everything to the heap instead? That would be slow. From my point of view there are 2 different types of stuff one needs to check:
The provided examples uses property 2.2 not supported by LLVM, since it requires to find a linear chain that describes the bound behavior, which is closely related to solving termination aka the halting problem. Unfortunately internet seems to be cluttered with poor information on any advanceed TCO, since even the phrasing is screwed up. |
No, I'm suggesting the slice should be passed in two registers instead of on the stack, which would be even more performant. |
@N00byEdge Yes. This boils down to a simplification of 2.2 for the use case of (simple) repeated slicing (the The compiler has no concept of register or stack until the register allocation phase. |
I mean I don't know if it's even that complicated, as long as the number of registers used doesn't go too high. You could just pass everything in through registers, and you don't need to read back the new value (since arguments are immutable). Passing a pointer with argument type |
If I understand the current implementation correctly, this uses something in LLVM called musttail which significantly limits tailcalls (in particular the type signature of the callee has to match the type signature of the caller). The only thing that must match is the return type. What should happen is that the parameters on the stack/registers (including the return address) for the caller function should be shuffled as necessary to match the callee, then a jump. This can be a little tricky even in the simple case where the stack/register size is the same, but gets worse if there are fewer or more parameters in the callee. |
Yup. the Rust folks suggest rust-lang/rfcs#2691 (comment). Interestingly, the comment explains widely how that's no problem (ignores generalized tce) and also the rfc mentions it explicitly "Supporting general tail calls, the current RFC restricts function signatures which can be loosened independently in the future." without a plan how to make that happen. |
This issue was never fixed, it was just that the calling conventions for slices were changed. New repro: const std = @import("std");
const T = struct {
a: u32,
pad: [32]u8 = undefined,
};
noinline fn thing(arg: T) u32 {
@call(.never_inline, std.debug.print, .{"thing({d})\n", .{arg.a}});
var next_arg = T{.a = arg.a};
if(arg.a < 1) {
@panic("Big bad!");
}
return @call(.always_tail, passer, .{next_arg});
}
noinline fn passer(arg: T) u32 {
return @call(.always_tail, thing, .{arg});
}
pub fn main() void {
_ = thing(.{.a = 1});
} View on Godbolt |
Another reproduction for 0.12 and 0.13.0-dev.211+6a65561e3. When run in ReleaseSafe this panics with access of inactive field inside const std = @import("std");
const SideData = union {
foo: u8,
bar: u8,
};
const Inst = struct {
func: Gadget,
data: SideData,
};
const Emulator = struct {
memory: [128]u8,
code: [64]Inst,
pc: u8 = 0,
};
const Gadget = *const fn (*Emulator, SideData) void;
fn foo(emulator: *Emulator, side_data: SideData) void {
_ = emulator;
std.log.info("foo {}", .{side_data.foo});
}
fn bar(emulator: *Emulator, side_data: SideData) void {
_ = emulator;
_ = side_data;
@trap();
}
fn invalid(emulator: *Emulator, _: SideData) void {
_ = emulator;
}
fn decode(emulator: *Emulator, _: SideData) void {
const raw_instruction = std.mem.readInt(u16, emulator.memory[emulator.pc..][0..2], .big);
const inst: Inst = switch ((raw_instruction >> 8) & 0xf) {
0xf => .{ .func = &foo, .data = .{ .foo = @truncate(raw_instruction) } },
0xb => .{ .func = &bar, .data = .{ .bar = @truncate(raw_instruction) } },
else => .{ .func = &invalid, .data = undefined },
};
@call(.always_tail, inst.func, .{ emulator, inst.data });
}
pub fn main() void {
var emulator = Emulator{
.memory = undefined,
.code = undefined,
};
@memset(&emulator.code, .{ .func = &decode, .data = undefined });
const rom = [_]u8{ 0x0f, 0x23, 0x0b, 0x80 };
@memcpy(emulator.memory[0..rom.len], &rom);
emulator.code[emulator.pc].func(&emulator, emulator.code[emulator.pc].data);
} |
I've provided a small example here how tailcalling corrupts the arugment, while calling it with
never_tail
makes it work: https://zig.godbolt.org/z/Erdj97serWith
.modifier = .always_tail
, the argument is corrupted:With
.modifier = .never_tail
, the arugment survives just fine:What happens is that the slice that's passed as the argument to
insertionSort()
when it tailcalls itself is placed on the stack, and just a pointer is sent to it. This is probably due to some internal assumption that the stack storage will be available within called functions, but that's not the case when functions are tailcalled (as you restore your callers stack frame before tailcalling).It should probably be considered if slices being backed by a pointer to a pointer and size in memory is hiding memory corruption bugs like this. If the slice just took up two registers, there would be no stack objects that were overwritten.
The text was updated successfully, but these errors were encountered: