Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

illegal instruction inserted #8386

Closed
g-w1 opened this issue Mar 29, 2021 · 16 comments
Closed

illegal instruction inserted #8386

g-w1 opened this issue Mar 29, 2021 · 16 comments
Milestone

Comments

@g-w1
Copy link
Contributor

g-w1 commented Mar 29, 2021

First of all, sorry for the big test case, I had a very hard time reducing it down.
This test case fails:

const std = @import("std");

const ast = std.zig.ast;

// const util = @import("utils.zig");

var tree: ast.Tree = undefined;

const AnalysedDecl = struct {
    /// The doc comment of the decl
    /// Owned by this decl
    dc: ?[]const u8,

    pl: []const u8,

    sub_cont_ty: ?[]const u8 = null,

    md: ?[]AnalysedDecl = null,

    src: struct {
        line: usize,
        column: usize,
    },

    fn deinit(self: *@This(), ally: *std.mem.Allocator) void {
        if (self.md) |more| {
            for (more) |*item| {
                item.deinit(ally);
            }
            ally.free(more);
        }
        if (self.dc) |dc| {
            ally.free(dc);
        }
        self.* = undefined;
    }
};

const Args = struct {
    fname: [:0]const u8,
    docs_url: ?[:0]const u8 = null,
    type: enum { json, html } = .json,
};

pub fn main() anyerror!void {
    var general_pa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = general_pa.deinit();

    const ally = &general_pa.allocator;

    const zig_code = zig_source;

    tree = try std.zig.parse(ally, zig_code);
    defer tree.deinit(ally);
    const decls = tree.rootDecls();

    var anal_list = try recAnalListOfDecls(ally, decls);

    defer {
        for (anal_list) |*anal| {
            anal.deinit(ally);
        }
        ally.free(anal_list);
    }

    const stdout = std.io.getStdOut().writer();
    try std.json.stringify(anal_list, .{}, stdout);
}

fn recAnalListOfDecls(
    ally: *std.mem.Allocator,
    list_d: []const ast.Node.Index,
) error{OutOfMemory}![]AnalysedDecl {
    const node_tags = tree.nodes.items(.tag);
    const node_datas = tree.nodes.items(.data);
    const main_tokens = tree.nodes.items(.main_token);
    const token_starts = tree.tokens.items(.start);

    var list = std.ArrayList(AnalysedDecl).init(ally);

    var i: usize = 0;
    while (i < list_d.len) : (i += 1) {
        const member = list_d[i];
        const oof = node_tags[member];
        if (true) {
            const tag = node_tags[member];

            // we know it has to be a vardecl now
            const decl_addr = member;
            var doc: ?[]const u8 = null;
            if (tag == .container_field or tag == .container_field_align or tag == .container_field_init) {
                const ftoken = tree.firstToken(member);
                const ltoken = tree.lastToken(member);
                const start = token_starts[ftoken];
                const end = token_starts[ltoken + 1];
                try list.append(.{
                    .pl = tree.source[start..end],
                    .dc = doc,
                    .md = null,
                    .src = blk: {
                        const src = std.zig.findLineColumn(tree.source, start);
                        break :blk .{ .line = src.line, .column = src.column };
                    },
                });
                continue;
            } else if (tag == .fn_decl) {
                var d = try doFunction(ally, decl_addr);
                d.dc = doc;
                try list.append(d);
                continue;
            } else if (tag == .global_var_decl or
                tag == .local_var_decl or
                tag == .simple_var_decl or
                tag == .aligned_var_decl)
            {
                // handle if it is a vardecl
                const vardecl = (switch (tree.nodes.items(.tag)[decl_addr]) {
                    .global_var_decl => tree.globalVarDecl(decl_addr),
                    .local_var_decl => tree.localVarDecl(decl_addr),
                    .aligned_var_decl => tree.alignedVarDecl(decl_addr),
                    .simple_var_decl => tree.simpleVarDecl(decl_addr),
                    else => null,
                }).?;

                const name_loc = vardecl.ast.mut_token + 1;
                const name = tree.tokenSlice(name_loc);

                const init = node_datas[decl_addr].rhs;
                const rhst = node_tags[init];

                // we find if the var is a container, we dont wanna display the full thing if it is
                // then we recurse over it
                const cd = getContainer(node_datas[decl_addr].rhs);
                if (cd) |container_decl| {
                    const offset = if (container_decl.ast.enum_token != null)
                        if (rhst == .tagged_union_enum_tag or rhst == .tagged_union_enum_tag_trailing)
                            @as(u32, 7)
                        else
                            @as(u32, 4)
                    else
                        @as(u32, 1);
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                    try list.append(.{
                        .pl = tree.source[token_starts[tree.firstToken(member)]..token_starts[
                            main_tokens[init] + offset
                        ]],
                        .dc = doc,
                        .md = more,
                        .src = blk: {
                            const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);
                            break :blk .{ .line = src.line, .column = src.column };
                        },
                    });
                    continue;
                } else {
                    const sig: []const u8 = undefined;
                    var ad: AnalysedDecl = .{
                        .pl = try ally.dupe(u8, sig),
                        .dc = doc,
                        .md = null,
                        .src = blk: {
                            const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);
                            break :blk .{ .line = src.line, .column = src.column };
                        },
                    };
                    try list.append(ad);
                    continue;
                }
            } else if (tag == .fn_proto or
                tag == .fn_proto_multi or
                tag == .fn_proto_one or
                tag == .fn_proto_simple)
            {
                var sig: []const u8 = undefined;
                var ad: AnalysedDecl = .{
                    .pl = sig,
                    .dc = doc,
                    .md = null,
                    .src = blk: {
                        const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);
                        break :blk .{ .line = src.line, .column = src.column };
                    },
                };
                try list.append(ad);
                continue;
            } else {
                continue;
            }
            unreachable;
        }
    }
    return list.toOwnedSlice();
}

fn doFunction(ally: *std.mem.Allocator, decl_addr: ast.Node.Index) !AnalysedDecl {
    const node_tags = tree.nodes.items(.tag);
    const node_datas = tree.nodes.items(.data);
    const main_tokens = tree.nodes.items(.main_token);
    const token_starts = tree.tokens.items(.start);

    const full_source = tree.source[token_starts[tree.firstToken(decl_addr)] .. token_starts[tree.lastToken(decl_addr)] + 1];

    // TODO configure max
    if (nlGtMax(full_source, 5)) {
        // handle if it is a function and the number of lines is greater than max
        const proto = node_datas[decl_addr].lhs;
        const block = node_datas[decl_addr].rhs;
        var params: [1]ast.Node.Index = undefined;

        const sig: []const u8 = undefined;

        var sub_cont_ty: ?[]const u8 = null;
        const md = null;
        return AnalysedDecl{
            .pl = sig,
            // to be filled in later
            .dc = undefined,
            .md = md,
            .sub_cont_ty = sub_cont_ty,
            .src = blk: {
                const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);
                break :blk .{ .line = src.line, .column = src.column };
            },
        };
    } else {
        return AnalysedDecl{
            .pl = full_source,
            .dc = undefined,
            .md = null,
            .src = blk: {
                const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);
                break :blk .{ .line = src.line, .column = src.column };
            },
        };
    }
}

fn getContainer(decl_addr: ast.Node.Index) ?ast.full.ContainerDecl {
    const node_tags = tree.nodes.items(.tag);
    const node_datas = tree.nodes.items(.data);
    const main_tokens = tree.nodes.items(.main_token);
    const token_starts = tree.tokens.items(.start);

    const rhst = node_tags[decl_addr];

    // we find if the var is a container, we dont wanna display the full thing if it is
    // then we recurse over it
    const container = if (rhst == .container_decl or rhst == .container_decl_trailing) tree.containerDecl(decl_addr) else if (rhst == .container_decl_arg or rhst == .container_decl_arg_trailing)
        tree.containerDeclArg(decl_addr)
    else if (rhst == .container_decl_two or rhst == .container_decl_two_trailing) blk: {
        var buf: [2]ast.Node.Index = undefined;
        break :blk tree.containerDeclTwo(&buf, decl_addr);
    } else if (rhst == .tagged_union or rhst == .tagged_union_trailing)
        tree.taggedUnion(decl_addr)
    else if (rhst == .tagged_union_two or rhst == .tagged_union_two_trailing) blk: {
        var buf: [2]ast.Node.Index = undefined;
        break :blk tree.taggedUnionTwo(&buf, decl_addr);
    } else if (rhst == .tagged_union_enum_tag or rhst == .tagged_union_enum_tag_trailing)
        tree.taggedUnionEnumTag(decl_addr)
    else
        null;
    return container;
}

fn nlGtMax(str: []const u8, max: usize) bool {
    var n: usize = 0;
    for (str) |c| {
        if (c == '\n') n += 1;
        if (n > max) return true;
    }
    return false;
}

const zig_source =
    \\/// Here is our z struct
    \\pub const D = union(enum) {
    \\    /// Here is the index of the rust code
    \\    rust: u32,
    \\    /// This performs the z function. big functions dont get inlined, but small ones do
    \\    pub fn z(self: @This()) u32 {
    \\        return 1;
    \\    }
    \\    /// WOW: even more
    \\    pub const EvenMoreInner = struct {
    \\        /// This function should get inlined because it is small
    \\        pub fn v() void {
    \\            return;
    \\        }
    \\        /// HMM
    \\        pub const C = 1;
    \\    };
    \\};
;

with:

thread 30828 panic: index out of bounds
/home/jacob/dev/docgen/repro.zig:84:30: 0x23d5a3 in recAnalListOfDecls (repro)
        const oof = node_tags[member];
                             ^
/home/jacob/dev/docgen/repro.zig:142:56: 0x23cf6f in recAnalListOfDecls (repro)
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                                                       ^
/home/jacob/dev/docgen/repro.zig:142:56: 0x23cf6f in recAnalListOfDecls (repro)
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                                                       ^
/home/jacob/dev/docgen/repro.zig:57:43: 0x233f47 in main (repro)
    var anal_list = try recAnalListOfDecls(ally, decls);
                                          ^
/home/jacob/.local/zig/0.8.0-dev.1565+ab9324e60/files/lib/std/start.zig:345:37: 0x20b6a4 in std.start.posixCallMainAndExit (repro)
            const result = root.main() catch |err| {
                                    ^
/home/jacob/.local/zig/0.8.0-dev.1565+ab9324e60/files/lib/std/start.zig:163:5: 0x20b542 in std.start._start (repro)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
Aborted (core dumped)

We can see that member is in fact bogus:

#6  0x000000000023d7e5 in recAnalListOfDecls (ally=0x7fffffffa808, list_d=...) at /home/jacob/dev/docgen/repro.zig:85
85              const oof = node_tags[member];
(gdb) p member
$1 = 4294944776
(gdb) 

Note: running in ReleaseSafe mode produces illegal instruction?????

❯ zig run repro.zig -OReleaseSafe
Illegal instruction at address 0x21cc99
/home/jacob/.local/zig/0.8.0-dev.1565+ab9324e60/files/lib/std/mem.zig:159:19: 0x21cc99 in recAnalListOfDecls (repro)
        dest[i] = s;
                  ^
/home/jacob/dev/docgen/repro.zig:142:56: 0x21c927 in recAnalListOfDecls (repro)
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                                                       ^
/home/jacob/dev/docgen/repro.zig:142:56: 0x21c927 in recAnalListOfDecls (repro)
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                                                       ^
/home/jacob/dev/docgen/repro.zig:57:43: 0x209d90 in std.start.posixCallMainAndExit (repro)
    var anal_list = try recAnalListOfDecls(ally, decls);
                                          ^
/home/jacob/.local/zig/0.8.0-dev.1565+ab9324e60/files/lib/std/start.zig:163:5: 0x208bce in std.start._start (repro)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
Aborted (core dumped)

I am confused out of my mind. I think this is probably a miscompilation or of not a bug in std.zig.ast.
Note: commenting out these two sections fixes it:
image

@mikdusan
Copy link
Member

mikdusan commented Mar 29, 2021

two instances of this pattern: the memory of buf is retained by tree.containerDeclTwo and escapes:

var buf: [2]ast.Node.Index = undefined;
break :blk tree.containerDeclTwo(&buf, decl_addr);

and that same pattern also appears in std.zig.render.renderExpression() which is probably a bug waiting to surface but is probably not a bug; it looks like there is no escape.

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

Hmm let me try allocating that, but I dont think that explains the illegal instruction.

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

Ok, that fixes one of the problems, but now I still get an illegal instruction.
Diff:

133c133,134
<                 const cd = getContainer(node_datas[decl_addr].rhs);
---
>                 var buf: [2]u32 = undefined;
>                 const cd = getContainer(node_datas[decl_addr].rhs, &buf);
238c239
< fn getContainer(decl_addr: ast.Node.Index) ?ast.full.ContainerDecl {
---
> fn getContainer(decl_addr: ast.Node.Index, buf: *[2]ast.Node.Index) ?ast.full.ContainerDecl {
251,252c252
<         var buf: [2]ast.Node.Index = undefined;
<         break :blk tree.containerDeclTwo(&buf, decl_addr);
---
>         break :blk tree.containerDeclTwo(buf, decl_addr);
256,257c256
<         var buf: [2]ast.Node.Index = undefined;
<         break :blk tree.taggedUnionTwo(&buf, decl_addr);
---
>         break :blk tree.taggedUnionTwo(buf, decl_addr);
❯ zig run repro.zig  -OReleaseSafe
Illegal instruction at address 0x21cc57
/home/jacob/.local/zig/0.8.0-dev.1565+ab9324e60/files/lib/std/mem.zig:159:19: 0x21cc57 in recAnalListOfDecls (repro)                                                                
        dest[i] = s;
                  ^
/tmp/repro.zig:143:56: 0x21c8fb in recAnalListOfDecls (repro)
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                                                       ^
/tmp/repro.zig:143:56: 0x21c8fb in recAnalListOfDecls (repro)
                    const more = try recAnalListOfDecls(ally, container_decl.ast.members);
                                                       ^
/tmp/repro.zig:57:43: 0x209d90 in std.start.posixCallMainAndExit (repro)
    var anal_list = try recAnalListOfDecls(ally, decls);
                                          ^
/home/jacob/.local/zig/0.8.0-dev.1565+ab9324e60/files/lib/std/start.zig:163:5: 0x208bce in std.start._start (repro)                                                                  
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
Aborted (core dumped)

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

Here is it in gdb:

┌──/tmp/repro.zig────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   151                                     const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);                                                   │
│   152                                     break :blk .{ .line = src.line, .column = src.column };                                                                                      │
│   153                                 },                                                                                                                                               │
│   154                             });                                                                                                                                                  │
│   155                             continue;                                                                                                                                            │
│   156                         } else {                                                                                                                                                 │
│   157                             const sig: []const u8 = undefined;                                                                                                                   │
│   158                             var ad: AnalysedDecl = .{                                                                                                                            │
│  >159                                 .pl = try ally.dupe(u8, sig),                                                                                                                    │
│   160                                 .dc = doc,                                                                                                                                       │
│   161                                 .md = null,                                                                                                                                      │
│   162                                 .src = blk: {                                                                                                                                    │
│   163                                     const src = std.zig.findLineColumn(tree.source, token_starts[tree.firstToken(decl_addr)]);                                                   │
│   164                                     break :blk .{ .line = src.line, .column = src.column };                                                                                      │
│   165                                 },                                                                                                                                               │
│   166                             };                                                                                                                                                   │
│   167                             try list.append(ad);                                                                                                                                 │
│   168                             continue;                                                                                                                                            │
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   0x21cc41 <recAnalListOfDecls+6289>      xor    esi,esi                                                                                                                               │
│   0x21cc43 <recAnalListOfDecls+6291>      vzeroupper                                                                                                                                   │
│   0x21cc46 <recAnalListOfDecls+6294>      call   0x2080f0 <std.builtin.default_panic>                                                                                                  │
│   0x21cc4b <recAnalListOfDecls+6299>      mov    edi,0x2060f0                                                                                                                          │
│   0x21cc50 <recAnalListOfDecls+6304>      xor    esi,esi                                                                                                                               │
│   0x21cc52 <recAnalListOfDecls+6306>      call   0x2080f0 <std.builtin.default_panic>                                                                                                  │
│  >0x21cc57 <recAnalListOfDecls+6311>      ud2                                                                                                                                          │
│   0x21cc59 <recAnalListOfDecls+6313>      lea    rdi,[rsp+0x90]                                                                                                                        │
│   0x21cc61 <recAnalListOfDecls+6321>      call   0x2102f0 <__zig_fail_unwrap>                                                                                                          │
│   0x21cc66                                    nop    WORD PTR cs:[rax+rax*1+0x0]                                                                                                       │
│   0x21cc70 <std.zig.ast.Tree.deinit>      push   r15                                                                                                                                   │
│   0x21cc72 <std.zig.ast.Tree.deinit+2>    push   r14                                                                                                                                   │
│   0x21cc74 <std.zig.ast.Tree.deinit+4>    push   r12                                                                                                                                   │
│   0x21cc76 <std.zig.ast.Tree.deinit+6>    push   rbx                                                                                                                                   │
│   0x21cc77 <std.zig.ast.Tree.deinit+7>    sub    rsp,0x138                                                                                                                             │
│   0x21cc7e <std.zig.ast.Tree.deinit+14>   mov    rax,QWORD PTR [rip+0x1f4eb]        # 0x23c170 <tree+24>                                                                               │
│   0x21cc85 <std.zig.ast.Tree.deinit+21>   test   al,0x3                                                                                                                                │
│   0x21cc87 <std.zig.ast.Tree.deinit+23>   jne    0x21cea0 <std.zig.ast.Tree.deinit+560>                                                                                                │
│   0x21cc8d <std.zig.ast.Tree.deinit+29>   mov    r15,rdi                                                                                                                               │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Seems like something is inserting an illegal instruction:

❯ objdump -d ./repro |rg ud2
  21cc57:    0f 0b                    ud2

@g-w1 g-w1 changed the title miscompilation in loop illegal instruction inserted Mar 29, 2021
@mikdusan
Copy link
Member

reduction for the illegal hardware instruction:

const std = @import("std");

pub fn main() !void {
    const sig: []const u8 = undefined;
    _ = try std.heap.c_allocator.dupe(u8, sig);
}

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 29, 2021

Ah, yep, this is UB since the len is undefined. I assume the illegal instruction comes from our equivalent of -fsanitize=undefined?

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

It only triggers in ReleaseSafe, not debug though. This is what is confusing me?
I think this is something from llvm and not from zig as runtime safety is off

pub fn copy(comptime T: type, dest: []T, source: []const T) void {
    // TODO instead of manually doing this check for the whole array
    // and turning off runtime safety, the compiler should detect loops like
    // this and automatically omit safety checks for loops
    @setRuntimeSafety(false);
    assert(dest.len >= source.len);
    for (source) |s, i|
        dest[i] = s;
}

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 29, 2021

The optimizer needs to see a lot of code to realize that this is UB. Neither function on its own is UB, only the combination is. In ReleaseSafe, dupe is inlined and the compiler can see the UB and insert an illegal instruction.

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

Should the assert be before the setRuntimeSafety so that this is caught?

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 29, 2021

That's unrelated, this is not a debug check that the zig compiler inserts. It's inserted by LLVM which is recognizing that a code path triggers unconditional UB.

@mikdusan
Copy link
Member

mikdusan commented Mar 29, 2021

yeah looks like llvm is doing its thing. I put the trigger code into an exported fn and this is OReleaseSafe .ll codegen; it must know it is ub to do this, right?

; Function Attrs: nobuiltin nounwind readnone sspstrong
define void @foobar() local_unnamed_addr #0 !dbg !850 {
UnwrapErrOk:
  ret void, !dbg !857
}

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 29, 2021

Yeah, ultimately the loop in copy is a branch on a condition derived from the undefined. Branch on undefined is UB, so that propagates all the way up to the closest conditional.

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

Ah I see, is there anything we can do in zig to make this easier to catch and not depend on llvm?

@mikdusan
Copy link
Member

one point of confusion, I just took the simple reduction and overwrote zig's standard panic handler... no more illegal instruction:

pub fn panic(msg: []const u8, error_return_trace: ?*@import("builtin").StackTrace) noreturn {
    while (true) {}
}

@SpexGuy
Copy link
Contributor

SpexGuy commented Mar 29, 2021

is there anything we can do in zig to make this easier to catch and not depend on llvm?

#63 exists but in practice this will be very hard, maybe impossible in some cases when pointers are involved. We can handle a lot of cases though, including this one probably.

@g-w1
Copy link
Contributor Author

g-w1 commented Mar 29, 2021

Ok, as we have figured out what went wrong, I will close this. Thanks a lot @mikdusan @SpexGuy

@g-w1 g-w1 closed this as completed Mar 29, 2021
@andrewrk andrewrk added this to the 0.8.0 milestone Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants