Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703

N00byEdge · 2021-09-08T09:02:26Z

I've provided a small example here how tailcalling corrupts the arugment, while calling it with never_tail makes it work: https://zig.godbolt.org/z/Erdj97ser

const std = @import("std");

noinline fn insertionSort(data: []u64) void {
    std.log.err("Sorting the array at {any} with length {d}", .{data.ptr, data.len});
    if(data.len > 1) {
        var least_i: usize = 0;
        var i: usize = 1;
        while(i < data.len) : (i += 1) {
            if(data[i] < data[least_i])
                least_i = i;
        }
        std.mem.swap(u64, &data[0], &data[least_i]);

        // `data[1..]` is created on the stack
        // and pointed to by the first argument register
        // then stack is invalidated by the tailcall and 
        // overwritten by callee
        return @call(.{.modifier = .always_tail}, insertionSort, .{data[1..]});

        // For comparison, this works
        //@call(.{.modifier = .never_tail}, insertionSort, .{data[1..]});
    }
}

pub fn main() void {
    var data = [_]u64 { 1, 6, 2, 7, 1, 9, 3};
    insertionSort(data[0..]);
}

With .modifier = .always_tail, the argument is corrupted:

error: Sorting the array at u64@4000800d10 with length 7
error: Sorting the array at u64@0 with length 0

With .modifier = .never_tail, the arugment survives just fine:

error: Sorting the array at u64@4000800d10 with length 7
error: Sorting the array at u64@4000800d18 with length 6
error: Sorting the array at u64@4000800d20 with length 5
error: Sorting the array at u64@4000800d28 with length 4
error: Sorting the array at u64@4000800d30 with length 3
error: Sorting the array at u64@4000800d38 with length 2
error: Sorting the array at u64@4000800d40 with length 1

What happens is that the slice that's passed as the argument to insertionSort() when it tailcalls itself is placed on the stack, and just a pointer is sent to it. This is probably due to some internal assumption that the stack storage will be available within called functions, but that's not the case when functions are tailcalled (as you restore your callers stack frame before tailcalling).

It should probably be considered if slices being backed by a pointer to a pointer and size in memory is hiding memory corruption bugs like this. If the slice just took up two registers, there would be no stack objects that were overwritten.

The text was updated successfully, but these errors were encountered:

N00byEdge · 2021-09-10T20:18:23Z

Honestly this is a nontrivial issue to solve imo, probably something that needs investigating into how it should be solved in stage2 too, unless there has been a decision made to use 2 registers for slices in there.

AssortedFantasy · 2021-09-10T23:42:14Z

This whole thing is actually one of the parts of Zig having aliasing problems. Simply fixing it so it works with slices just moves the pitfalls somewhere else.

Register allocation and graph colouring is unfortunately a hard problem. And I think if Zig wants to have fancy semantics with the whole automatic magic pointer passing optimization we need to have a nice long think about the formal semantics of how to make it work.

I'll see if I can come up with something that makes sense. No guarantees whatsoever though.

matu3ba · 2021-10-25T22:40:26Z

@AssortedFantasy @N00byEdge A proper working escape analysis can detect this, but it has a compilation-time + space as price. But maybe there are only some logical bugs.

LLVM side
For TRE/TCE/TCO (primitive/simple tail recursion elimination is a better phrase) LLVM uses this: https://llvm.org/doxygen/TailRecursionElimination_8cpp_source.html (already >1k LOC in LLVM and only provides simple things) and the problem undecidable for general tail recursive functions (more than the function calling itself).
"If it can prove that callees do not access their caller stack frame, they are marked as eligible for tail call elimination (by the code generator). ... The algorithm we use to detect if callees access their caller stack frames is very primitive."

Zig side
(If zig wants to provide some aliasing analysis)
Proving pointer aliasing properties in the general case requires shape analysis with user annotations (ie Rust borrow checking), which boils down to derivation trees (out of scope).
So then the question remains what aliasing checks are feasible, which boils down to implementing escape analysis (which has a performance + space tradeoff and overall is slowish).

Probably skipping the complexity by only checking for noalias annotations on a forced tail call (.always_tail) might be the best option here.

N00byEdge · 2021-10-26T09:40:35Z

Wait, isn't it as easy as when doing allocation for arguments, to never place anything on the stack before a tailcall?

matu3ba · 2021-10-26T13:37:20Z

Wait, isn't it as easy as when doing allocation for arguments, to never place anything on the stack before a tailcall?

Are you suggesting to move everything to the heap instead? That would be slow.

From my point of view there are 2 different types of stuff one needs to check:

without pointer as arguments and function return its rather trivial, as there can not be pointers that access the callee stack frame.
with pointer as arguments one needs to check that either

2.1 there is no access of the stack frame (which rules out passing slices) or
2.2 access of callee and caller never alias.

The provided examples uses property 2.2 not supported by LLVM, since it requires to find a linear chain that describes the bound behavior, which is closely related to solving termination aka the halting problem.

Unfortunately internet seems to be cluttered with poor information on any advanceed TCO, since even the phrasing is screwed up.

N00byEdge · 2021-10-26T18:16:54Z

@matu3ba

Are you suggesting to move everything to the heap instead? That would be slow.

No, I'm suggesting the slice should be passed in two registers instead of on the stack, which would be even more performant.

matu3ba · 2021-10-27T12:18:52Z

@N00byEdge Yes. This boils down to a simplification of 2.2 for the use case of (simple) repeated slicing (the data[1..] part), which the compiler must be able to derive (LLVM can not do this currently).

The compiler has no concept of register or stack until the register allocation phase.

N00byEdge · 2021-10-27T16:29:40Z

I mean I don't know if it's even that complicated, as long as the number of registers used doesn't go too high. You could just pass everything in through registers, and you don't need to read back the new value (since arguments are immutable). Passing a pointer with argument type *T with a pointer to the local frame into a tailcall could be diagnosed with a compile error, as that can never work. AFAIK that covers all the cases as long as you have enough registers for the arguments.

dvmason · 2023-07-23T12:51:51Z

If I understand the current implementation correctly, this uses something in LLVM called musttail which significantly limits tailcalls (in particular the type signature of the callee has to match the type signature of the caller). The only thing that must match is the return type.

What should happen is that the parameters on the stack/registers (including the return address) for the caller function should be shuffled as necessary to match the callee, then a jump. This can be a little tricky even in the simple case where the stack/register size is the same, but gets worse if there are fewer or more parameters in the callee.

matu3ba · 2023-07-23T13:31:08Z

Yup. the Rust folks suggest rust-lang/rfcs#2691 (comment).

Interestingly, the comment explains widely how that's no problem (ignores generalized tce) and also the rfc mentions it explicitly "Supporting general tail calls, the current RFC restricts function signatures which can be loosened independently in the future." without a plan how to make that happen.

closes #9703

N00byEdge · 2023-07-27T19:49:29Z

This issue was never fixed, it was just that the calling conventions for slices were changed.

New repro:

const std = @import("std");

const T = struct {
    a: u32,
    pad: [32]u8 = undefined,
};

noinline fn thing(arg: T) u32 {
    @call(.never_inline, std.debug.print, .{"thing({d})\n", .{arg.a}});
    var next_arg = T{.a = arg.a};
    if(arg.a < 1) {
        @panic("Big bad!");
    }
    return @call(.always_tail, passer, .{next_arg});
}

noinline fn passer(arg: T) u32 {
    return @call(.always_tail, thing, .{arg});
}

pub fn main() void {
    _ = thing(.{.a = 1});
}

View on Godbolt
I messaged back in 2022 about this: https://discord.com/channels/605571803288698900/785499283368706060/981557478187753523

190n · 2024-05-16T18:18:05Z

Another reproduction for 0.12 and 0.13.0-dev.211+6a65561e3. When run in ReleaseSafe this panics with access of inactive field inside foo. inst.data inside decode has tag .foo but in foo the tag is 0x4c.

const std = @import("std");

const SideData = union {
    foo: u8,
    bar: u8,
};

const Inst = struct {
    func: Gadget,
    data: SideData,
};

const Emulator = struct {
    memory: [128]u8,
    code: [64]Inst,
    pc: u8 = 0,
};

const Gadget = *const fn (*Emulator, SideData) void;

fn foo(emulator: *Emulator, side_data: SideData) void {
    _ = emulator;
    std.log.info("foo {}", .{side_data.foo});
}

fn bar(emulator: *Emulator, side_data: SideData) void {
    _ = emulator;
    _ = side_data;
    @trap();
}

fn invalid(emulator: *Emulator, _: SideData) void {
    _ = emulator;
}

fn decode(emulator: *Emulator, _: SideData) void {
    const raw_instruction = std.mem.readInt(u16, emulator.memory[emulator.pc..][0..2], .big);
    const inst: Inst = switch ((raw_instruction >> 8) & 0xf) {
        0xf => .{ .func = &foo, .data = .{ .foo = @truncate(raw_instruction) } },
        0xb => .{ .func = &bar, .data = .{ .bar = @truncate(raw_instruction) } },
        else => .{ .func = &invalid, .data = undefined },
    };
    @call(.always_tail, inst.func, .{ emulator, inst.data });
}

pub fn main() void {
    var emulator = Emulator{
        .memory = undefined,
        .code = undefined,
    };
    @memset(&emulator.code, .{ .func = &decode, .data = undefined });
    const rom = [_]u8{ 0x0f, 0x23, 0x0b, 0x80 };
    @memcpy(emulator.memory[0..rom.len], &rom);
    emulator.code[emulator.pc].func(&emulator, emulator.code[emulator.pc].data);
}

Vexu added bug Observed behavior contradicts documented or intended behavior miscompilation The compiler reports success but produces semantically incorrect code. stage1 The process of building from source via WebAssembly and the C backend. labels Sep 9, 2021

Vexu added this to the 0.10.0 milestone Sep 9, 2021

andrewrk modified the milestones: 0.14.0, 0.11.0 Jul 22, 2023

andrewrk added a commit that referenced this issue Jul 27, 2023

add behavior test for tail calls

7b0c25c

closes #9703

andrewrk mentioned this issue Jul 27, 2023

add behavior test coverage for stage1 issues #16572

Merged

andrewrk added a commit that referenced this issue Jul 27, 2023

add behavior test for tail calls

8796123

closes #9703

andrewrk closed this as completed in #16572 Jul 27, 2023

andrewrk reopened this Jul 27, 2023

andrewrk removed the stage1 The process of building from source via WebAssembly and the C backend. label Jul 27, 2023

andrewrk modified the milestones: 0.11.0, 0.11.1 Jul 27, 2023

andrewrk modified the milestones: 0.11.1, 0.12.0, 0.13.0 Jan 29, 2024

190n added a commit to 190n/zip8 that referenced this issue May 16, 2024

add ziglang/zig#9703 reproduction

a328071

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703

Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703

N00byEdge commented Sep 8, 2021 •

edited

Loading

N00byEdge commented Sep 10, 2021 •

edited

Loading

AssortedFantasy commented Sep 10, 2021

matu3ba commented Oct 25, 2021 •

edited

Loading

N00byEdge commented Oct 26, 2021

matu3ba commented Oct 26, 2021 •

edited

Loading

N00byEdge commented Oct 26, 2021

matu3ba commented Oct 27, 2021

N00byEdge commented Oct 27, 2021 •

edited

Loading

dvmason commented Jul 23, 2023

matu3ba commented Jul 23, 2023

N00byEdge commented Jul 27, 2023

190n commented May 16, 2024

Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703

Bad codegen (or calling convention?): Arguments pointed to on stack into tailcall #9703

Comments

N00byEdge commented Sep 8, 2021 • edited Loading

N00byEdge commented Sep 10, 2021 • edited Loading

AssortedFantasy commented Sep 10, 2021

matu3ba commented Oct 25, 2021 • edited Loading

N00byEdge commented Oct 26, 2021

matu3ba commented Oct 26, 2021 • edited Loading

N00byEdge commented Oct 26, 2021

matu3ba commented Oct 27, 2021

N00byEdge commented Oct 27, 2021 • edited Loading

dvmason commented Jul 23, 2023

matu3ba commented Jul 23, 2023

N00byEdge commented Jul 27, 2023

190n commented May 16, 2024

N00byEdge commented Sep 8, 2021 •

edited

Loading

N00byEdge commented Sep 10, 2021 •

edited

Loading

matu3ba commented Oct 25, 2021 •

edited

Loading

matu3ba commented Oct 26, 2021 •

edited

Loading

N00byEdge commented Oct 27, 2021 •

edited

Loading