Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703

Open
N00byEdge opened this issue Sep 8, 2021 · 12 comments · Fixed by #16572
Open
Labels
bug Observed behavior contradicts documented or intended behavior miscompilation The compiler reports success but produces semantically incorrect code.
Milestone

Comments

@N00byEdge
Copy link
Contributor

N00byEdge commented Sep 8, 2021

I've provided a small example here how tailcalling corrupts the arugment, while calling it with never_tail makes it work: https://zig.godbolt.org/z/Erdj97ser

const std = @import("std");

noinline fn insertionSort(data: []u64) void {
    std.log.err("Sorting the array at {any} with length {d}", .{data.ptr, data.len});
    if(data.len > 1) {
        var least_i: usize = 0;
        var i: usize = 1;
        while(i < data.len) : (i += 1) {
            if(data[i] < data[least_i])
                least_i = i;
        }
        std.mem.swap(u64, &data[0], &data[least_i]);

        // `data[1..]` is created on the stack
        // and pointed to by the first argument register
        // then stack is invalidated by the tailcall and 
        // overwritten by callee
        return @call(.{.modifier = .always_tail}, insertionSort, .{data[1..]});

        // For comparison, this works
        //@call(.{.modifier = .never_tail}, insertionSort, .{data[1..]});
    }
}

pub fn main() void {
    var data = [_]u64 { 1, 6, 2, 7, 1, 9, 3};
    insertionSort(data[0..]);
}

With .modifier = .always_tail, the argument is corrupted:

error: Sorting the array at u64@4000800d10 with length 7
error: Sorting the array at u64@0 with length 0

With .modifier = .never_tail, the arugment survives just fine:

error: Sorting the array at u64@4000800d10 with length 7
error: Sorting the array at u64@4000800d18 with length 6
error: Sorting the array at u64@4000800d20 with length 5
error: Sorting the array at u64@4000800d28 with length 4
error: Sorting the array at u64@4000800d30 with length 3
error: Sorting the array at u64@4000800d38 with length 2
error: Sorting the array at u64@4000800d40 with length 1

What happens is that the slice that's passed as the argument to insertionSort() when it tailcalls itself is placed on the stack, and just a pointer is sent to it. This is probably due to some internal assumption that the stack storage will be available within called functions, but that's not the case when functions are tailcalled (as you restore your callers stack frame before tailcalling).

It should probably be considered if slices being backed by a pointer to a pointer and size in memory is hiding memory corruption bugs like this. If the slice just took up two registers, there would be no stack objects that were overwritten.

@Vexu Vexu added bug Observed behavior contradicts documented or intended behavior miscompilation The compiler reports success but produces semantically incorrect code. stage1 The process of building from source via WebAssembly and the C backend. labels Sep 9, 2021
@Vexu Vexu added this to the 0.10.0 milestone Sep 9, 2021
@N00byEdge
Copy link
Contributor Author

N00byEdge commented Sep 10, 2021

Honestly this is a nontrivial issue to solve imo, probably something that needs investigating into how it should be solved in stage2 too, unless there has been a decision made to use 2 registers for slices in there.

@AssortedFantasy
Copy link

This whole thing is actually one of the parts of Zig having aliasing problems. Simply fixing it so it works with slices just moves the pitfalls somewhere else.

Register allocation and graph colouring is unfortunately a hard problem. And I think if Zig wants to have fancy semantics with the whole automatic magic pointer passing optimization we need to have a nice long think about the formal semantics of how to make it work.

I'll see if I can come up with something that makes sense. No guarantees whatsoever though.

@matu3ba
Copy link
Contributor

matu3ba commented Oct 25, 2021

@AssortedFantasy @N00byEdge A proper working escape analysis can detect this, but it has a compilation-time + space as price. But maybe there are only some logical bugs.

LLVM side
For TRE/TCE/TCO (primitive/simple tail recursion elimination is a better phrase) LLVM uses this: https://llvm.org/doxygen/TailRecursionElimination_8cpp_source.html (already >1k LOC in LLVM and only provides simple things) and the problem undecidable for general tail recursive functions (more than the function calling itself).
"If it can prove that callees do not access their caller stack frame, they are marked as eligible for tail call elimination (by the code generator). ... The algorithm we use to detect if callees access their caller stack frames is very primitive."

Zig side
(If zig wants to provide some aliasing analysis)
Proving pointer aliasing properties in the general case requires shape analysis with user annotations (ie Rust borrow checking), which boils down to derivation trees (out of scope).
So then the question remains what aliasing checks are feasible, which boils down to implementing escape analysis (which has a performance + space tradeoff and overall is slowish).

Probably skipping the complexity by only checking for noalias annotations on a forced tail call (.always_tail) might be the best option here.

@N00byEdge
Copy link
Contributor Author

Wait, isn't it as easy as when doing allocation for arguments, to never place anything on the stack before a tailcall?

@matu3ba
Copy link
Contributor

matu3ba commented Oct 26, 2021

Wait, isn't it as easy as when doing allocation for arguments, to never place anything on the stack before a tailcall?

Are you suggesting to move everything to the heap instead? That would be slow.

From my point of view there are 2 different types of stuff one needs to check:

  1. without pointer as arguments and function return its rather trivial, as there can not be pointers that access the callee stack frame.
  2. with pointer as arguments one needs to check that either
  • 2.1 there is no access of the stack frame (which rules out passing slices) or
  • 2.2 access of callee and caller never alias.

The provided examples uses property 2.2 not supported by LLVM, since it requires to find a linear chain that describes the bound behavior, which is closely related to solving termination aka the halting problem.

Unfortunately internet seems to be cluttered with poor information on any advanceed TCO, since even the phrasing is screwed up.

@N00byEdge
Copy link
Contributor Author

@matu3ba

Are you suggesting to move everything to the heap instead? That would be slow.

No, I'm suggesting the slice should be passed in two registers instead of on the stack, which would be even more performant.

@matu3ba
Copy link
Contributor

matu3ba commented Oct 27, 2021

@N00byEdge Yes. This boils down to a simplification of 2.2 for the use case of (simple) repeated slicing (the data[1..] part), which the compiler must be able to derive (LLVM can not do this currently).

The compiler has no concept of register or stack until the register allocation phase.

@N00byEdge
Copy link
Contributor Author

N00byEdge commented Oct 27, 2021

I mean I don't know if it's even that complicated, as long as the number of registers used doesn't go too high. You could just pass everything in through registers, and you don't need to read back the new value (since arguments are immutable). Passing a pointer with argument type *T with a pointer to the local frame into a tailcall could be diagnosed with a compile error, as that can never work. AFAIK that covers all the cases as long as you have enough registers for the arguments.

@andrewrk andrewrk modified the milestones: 0.14.0, 0.11.0 Jul 22, 2023
@dvmason
Copy link

dvmason commented Jul 23, 2023

If I understand the current implementation correctly, this uses something in LLVM called musttail which significantly limits tailcalls (in particular the type signature of the callee has to match the type signature of the caller). The only thing that must match is the return type.

What should happen is that the parameters on the stack/registers (including the return address) for the caller function should be shuffled as necessary to match the callee, then a jump. This can be a little tricky even in the simple case where the stack/register size is the same, but gets worse if there are fewer or more parameters in the callee.

@matu3ba
Copy link
Contributor

matu3ba commented Jul 23, 2023

Yup. the Rust folks suggest rust-lang/rfcs#2691 (comment).

Interestingly, the comment explains widely how that's no problem (ignores generalized tce) and also the rfc mentions it explicitly "Supporting general tail calls, the current RFC restricts function signatures which can be loosened independently in the future." without a plan how to make that happen.

@N00byEdge
Copy link
Contributor Author

This issue was never fixed, it was just that the calling conventions for slices were changed.

New repro:

const std = @import("std");

const T = struct {
    a: u32,
    pad: [32]u8 = undefined,
};

noinline fn thing(arg: T) u32 {
    @call(.never_inline, std.debug.print, .{"thing({d})\n", .{arg.a}});
    var next_arg = T{.a = arg.a};
    if(arg.a < 1) {
        @panic("Big bad!");
    }
    return @call(.always_tail, passer, .{next_arg});
}

noinline fn passer(arg: T) u32 {
    return @call(.always_tail, thing, .{arg});
}

pub fn main() void {
    _ = thing(.{.a = 1});
}

View on Godbolt
I messaged back in 2022 about this: https://discord.com/channels/605571803288698900/785499283368706060/981557478187753523

@andrewrk andrewrk reopened this Jul 27, 2023
@andrewrk andrewrk removed the stage1 The process of building from source via WebAssembly and the C backend. label Jul 27, 2023
@andrewrk andrewrk modified the milestones: 0.11.0, 0.11.1 Jul 27, 2023
@andrewrk andrewrk modified the milestones: 0.11.1, 0.12.0, 0.13.0 Jan 29, 2024
@190n
Copy link
Contributor

190n commented May 16, 2024

Another reproduction for 0.12 and 0.13.0-dev.211+6a65561e3. When run in ReleaseSafe this panics with access of inactive field inside foo. inst.data inside decode has tag .foo but in foo the tag is 0x4c.

const std = @import("std");

const SideData = union {
    foo: u8,
    bar: u8,
};

const Inst = struct {
    func: Gadget,
    data: SideData,
};

const Emulator = struct {
    memory: [128]u8,
    code: [64]Inst,
    pc: u8 = 0,
};

const Gadget = *const fn (*Emulator, SideData) void;

fn foo(emulator: *Emulator, side_data: SideData) void {
    _ = emulator;
    std.log.info("foo {}", .{side_data.foo});
}

fn bar(emulator: *Emulator, side_data: SideData) void {
    _ = emulator;
    _ = side_data;
    @trap();
}

fn invalid(emulator: *Emulator, _: SideData) void {
    _ = emulator;
}

fn decode(emulator: *Emulator, _: SideData) void {
    const raw_instruction = std.mem.readInt(u16, emulator.memory[emulator.pc..][0..2], .big);
    const inst: Inst = switch ((raw_instruction >> 8) & 0xf) {
        0xf => .{ .func = &foo, .data = .{ .foo = @truncate(raw_instruction) } },
        0xb => .{ .func = &bar, .data = .{ .bar = @truncate(raw_instruction) } },
        else => .{ .func = &invalid, .data = undefined },
    };
    @call(.always_tail, inst.func, .{ emulator, inst.data });
}

pub fn main() void {
    var emulator = Emulator{
        .memory = undefined,
        .code = undefined,
    };
    @memset(&emulator.code, .{ .func = &decode, .data = undefined });
    const rom = [_]u8{ 0x0f, 0x23, 0x0b, 0x80 };
    @memcpy(emulator.memory[0..rom.len], &rom);
    emulator.code[emulator.pc].func(&emulator, emulator.code[emulator.pc].data);
}

190n added a commit to 190n/zip8 that referenced this issue May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior miscompilation The compiler reports success but produces semantically incorrect code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants