Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slice fixes #21426

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Slice fixes #21426

wants to merge 9 commits into from

Conversation

amp-59
Copy link
Contributor

@amp-59 amp-59 commented Sep 15, 2024

Summary

These fixes had been required to test the single-function panic interface.

Merging this patch will close the following issues:

Compatibility

The function added by this patch (Sema.analyzeSlice2) is intended as a drop-in replacement for Sema.analyzeSlice, with these exceptions:

Required by #19798: Will limit usage of comptime-known undefined pointer operands. This limitation is described by the third option suggested in the expected behaviour section of #19798. This behaviour is controlled by the call option allow_limited_slice_of_undefined and is enabled by default. Disabling this option will prevent slicing comptime-known undefined pointer operands entirely.

Will not implicitly propagate sentinels when slicing pointers-to-many without an end or length operand. It may be possible to restore this behaviour for ptr[0..] if ptr[start.. :sentinel] is removed entirely.

Performance

This patch reduces the (machine code) size of safety checked slice operations by 46 percent on average. This advantage is consistent between stage2_llvm (x86-64) and stage2_x86_64.

The size of this improvement may cause readers to question the thoroughness of this implementation, and the following sections are prepared to answer those concerns. The short answer is that this implementation only appears to be highly optimised because the current implementation is not optimal.

Background

The most significant unit of cost for safety checked operations is the conditional branch added for runtime safety checks. In the case of slices, these are produced for index bounds, sentinel values, null pointers, and in rare cases integer overflow. Each branch has a void success block and a fail block with a call to builtin.panic*. The sizes of these branches vary between 25B and 60B depending on the panic condition.

When the term 'variant' is used, this refers to the comptime state (known-ness) and values of all the operands, combined with the base syntax.

  • The operands are ptr, start, end (optional), len (optional), sentinel (optional).
  • The base syntax tags are slice_start (ptr[start..]), slice_end (ptr[start..end]), slice_sentinel (ptr[start..end:sentinel], ptr[start..:sentinel]), and slice_length (ptr[start..][0..len], ptr[start..][0..len:sentinel]).

The following sections will give examples to explain why this implementation adds far fewer conditional branches than the current implementation. The variants shown in these examples are not isolated cases of bad performance; they each represent broad subsets of the syntax.

Example A: Slice operation with one redundant call to panic

This example shows slicing a pointer-to-one array with comptime-known start and end operands.

The reduction to AIR instructions is 54 percent (117B to 54B), and the reduction to symbol size is 58 percent (46B to 19B).

Compile with zig build-obj -fstrip -fno-unwind-tables -fomit-frame-pointer -fformatted-panics one_unreachable.zig
one_unreachable.zig:

var slice: []u8 = undefined;
export fn fn0(src_ptr3: *[5]u8) void {
    slice = src_ptr3.*[0..5];
}

Output (master)

Air:

# Total AIR+Liveness bytes: 361B
# AIR Instructions:         13 (117B)
# AIR Extra Data:           24 (96B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      5 (20B)
# Liveness special table:   2 (16B)
  %0 = arg(*[5]u8)
  %1 = bitcast([*]u8, %0!)
  %2 = ptr_add([*]u8, %1!, @Air.Inst.Ref.zero_usize)
  %3 = bitcast(*[5]u8, %2!)
  %4 = cmp_lte(<usize, 5>, <usize, 5>)
  %7!= block(void, {
    %8!= cond_br(%4!, likely {
      %9!= br(%7, @Air.Inst.Ref.void_value)
    }, cold {
      %3!
      %5!= call(<fn (usize, usize) noreturn, (function 'panicOutOfBounds')>, [<usize, 5>, <usize, 5>])
      %6!= unreach()
    })
  } %4!)
  %10 = array_to_slice([]u8, %3!)
  %11!= store_safe(<*[]u8, one_unreachable.slice>, %10!)
  %12!= ret_safe(@Air.Inst.Ref.void_value)

Debug:

fn0:
	push	rax
	mov	qword ptr [rsp], rdi
	jmp	.LBB0_2
.LBB0_1:
	mov	rax, qword ptr [rsp]
	mov	qword ptr [rip + one_unreachable.slice], rax
	mov	qword ptr [rip + one_unreachable.slice+8], 5
	pop	rax
	ret
.LBB0_2:
	jmp	.LBB0_1
	mov	esi, 5
	mov	rdi, rsi
	call	builtin.panicOutOfBounds

Output (slices_1)

Air:

# Total AIR+Liveness bytes: 198B
# AIR Instructions:         6 (54B)
# AIR Extra Data:           8 (32B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg(*[5]u8)
  %1 = bitcast([*]u8, %0!)
  %2 = bitcast(*[5]u8, %1!)
  %3 = array_to_slice([]u8, %2!)
  %4!= store_safe(<*[]u8, one_unreachable.slice>, %3!)
  %5!= ret_safe(@Air.Inst.Ref.void_value)

Debug:

fn0:
	mov	qword ptr [rip + one_unreachable.slice], rdi
	mov	qword ptr [rip + one_unreachable.slice+8], 5
	ret

Explanation

Zig users might expect that no runtime check would be added for this operation, because the bounds had been validated at compile time. This is not caused by any obvious programming error; there is simply no attempt or mechanism to prevent redundant checks for this requirement.

This check will be optimised (removed) by Dead Code Elimination (DCE).

Example B: Slice operation with two redundant calls to panic

This example shows slicing a pointer-to-many with a runtime-known end operand.

The reduction to AIR instructions is 68 percent (171B to 54B), and the reduction to symbol size is 77 percent (109B to 25B).

Compile with zig build-obj -fstrip -fno-unwind-tables -fomit-frame-pointer -fformatted-panics two_unreachable.zig
two_unreachable.zig:

var dest_end: usize = 0;
var slice: []u8 = undefined;
export fn fn0(src_ptr15: *[*]u8) void {
    slice = src_ptr15.*[0..dest_end];
}

Output (master)

Air:

# Total AIR+Liveness bytes: 519B
# AIR Instructions:         19 (171B)
# AIR Extra Data:           39 (156B)
# Liveness tomb_bits:       16B
# Liveness Extra Data:      10 (40B)
# Liveness special table:   4 (32B)
  %0 = arg(*[*]u8)
  %1 = load(usize, <*usize, two_unreachable.dest_end>)
  %2 = load([*]u8, %0!)
  %3 = ptr_add([*]u8, %2!, @Air.Inst.Ref.zero_usize)
  %4 = cmp_lte(@Air.Inst.Ref.zero_usize, %1)
  %7!= block(void, {
    %8!= cond_br(%4!, likely {
      %9!= br(%7, @Air.Inst.Ref.void_value)
    }, cold {
      %3!
      %5!= call(<fn (usize, usize) noreturn, (function 'panicStartGreaterThanEnd')>, [@Air.Inst.Ref.zero_usize, %1!])
      %6!= unreach()
    })
  } %4!)
  %10 = cmp_lte(@Air.Inst.Ref.zero_usize, %1)
  %13!= block(void, {
    %14!= cond_br(%10!, likely {
      %15!= br(%13, @Air.Inst.Ref.void_value)
    }, cold {
      %3!
      %11!= call(<fn (usize, usize) noreturn, (function 'panicOutOfBounds')>, [@Air.Inst.Ref.zero_usize, %1!])
      %12!= unreach()
    })
  } %10!)
  %16 = slice([]u8, %3!, %1!)
  %17!= store_safe(<*[]u8, two_unreachable.slice>, %16!)
  %18!= ret_safe(@Air.Inst.Ref.void_value)

Debug:

fn0:
	sub	rsp, 24
	mov	rcx, qword ptr [two_unreachable.dest_end]
	mov	qword ptr [rsp + 8], rcx
	mov	rax, qword ptr [rdi]
	mov	qword ptr [rsp + 16], rax
	xor	eax, eax
	cmp	rax, rcx
	jbe	.LBB0_2
	jmp	.LBB0_3
.LBB0_1:
	mov	rcx, qword ptr [rsp + 8]
	xor	eax, eax
	cmp	rax, rcx
	jbe	.LBB0_5
	jmp	.LBB0_6
.LBB0_2:
	jmp	.LBB0_1
.LBB0_3:
	mov	rsi, qword ptr [rsp + 8]
	xor	eax, eax
	mov	edi, eax
	call	builtin.panicStartGreaterThanEnd
.LBB0_4:
	mov	rax, qword ptr [rsp + 16]
	mov	rcx, qword ptr [rsp + 8]
	mov	qword ptr [rip + two_unreachable.slice+8], rcx
	mov	qword ptr [rip + two_unreachable.slice], rax
	add	rsp, 24
	ret
.LBB0_5:
	jmp	.LBB0_4
.LBB0_6:
	mov	rsi, qword ptr [rsp + 8]
	xor	eax, eax
	mov	edi, eax
	call	builtin.panicOutOfBounds

ReleaseSafe:

fn0:
	mov	rax, qword ptr [rdi]
	mov	qword ptr [rip + two_unreachable.slice], rax
	ret

Output (slices_1)

Air:

# Total AIR+Liveness bytes: 206B
# AIR Instructions:         6 (54B)
# AIR Extra Data:           10 (40B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg(*[*]u8)
  %1 = load(usize, <*usize, two_unreachable.dest_end>)
  %2 = load([*]u8, %0!)
  %3 = slice([]u8, %2!, %1!)
  %4!= store_safe(<*[]u8, two_unreachable.slice>, %3!)
  %5!= ret_safe(@Air.Inst.Ref.void_value)

Debug:

fn0:
	mov	rcx, qword ptr [rip + two_unreachable.dest_end]
	mov	rax, qword ptr [rdi]
	mov	qword ptr [rip + two_unreachable.slice+8], rcx
	mov	qword ptr [rip + two_unreachable.slice], rax
	ret

ReleaseSafe:

fn0:
	mov	rax, qword ptr [rdi]
	mov	qword ptr [rip + two_unreachable.slice], rax
	ret

Explanation

Zig users might expect that no runtime safety checks would be added for this operation, because there is no upper bound and because 0 <= end is a fact.

These are the panic functions called by the assembly output by master for the example function fn0:

  1. builtin.panicStartGreaterThanEnd from start <= end: This check is added because the analysis does not check whether the start operand is known to be zero at compile time.
  2. builtin.panicOutOfBounds from start <= end: This check is added because the mechanism (var checked_start_lte_end: bool) intended to prevent duplicate checks for this requirement is not observed by the block adding the second check. It is also not engaged by the block adding the first check. It is only engaged by the compile time validation, and only observed by the block adding the first check. This means that even when both start and end operands are known at compile time the second check will still be added, and--yet again--the fail block of this check will be unreachable because any invalid result would have already failed compilation.

Both checks will be optimised by DCE, as shown by the ReleaseSafe assembly above.

Example C: Slice operation with five redundant calls to panic

This example shows slicing a slice with the only difference being the sentinel value.

The reduction to AIR instructions is 58 percent (324B to 135B), and the reduction to symbol size is 73 percent (304B to 82B).

Compile with zig build-obj -fstrip -fno-unwind-tables -fomit-frame-pointer -fformatted-panics five_unreachable.zig
five_unreachable.zig:

var slice: [:1]const u8 = undefined;
export fn fn2000(src_ptr112: *[:0]const u8) void {
    slice = src_ptr112.*[0.. :1];
}

Output (master)

Air:

# Total AIR+Liveness bytes: 908B
# AIR Instructions:         36 (324B)
# AIR Extra Data:           70 (280B)
# Liveness tomb_bits:       24B
# Liveness Extra Data:      28 (112B)
# Liveness special table:   8 (64B)
  %0 = arg(*[:0]const u8)
  %1 = load([:0]const u8, %0!)
  %2 = slice_ptr([*:0]const u8, %1)
  %3 = ptr_add([*:0]const u8, %2!, @Air.Inst.Ref.zero_usize)
  %4 = slice_len(usize, %1)
  %5 = cmp_lte(@Air.Inst.Ref.zero_usize, %4)
  %8!= block(void, {
    %9!= cond_br(%5!, likely {
      %10!= br(%8, @Air.Inst.Ref.void_value)
    }, cold {
      %1! %3!
      %6!= call(<fn (usize, usize) noreturn, (function 'panicStartGreaterThanEnd')>, [@Air.Inst.Ref.zero_usize, %4!])
      %7!= unreach()
    })
  } %5!)
  %11 = slice_len(usize, %1!)
  %12 = add_safe(%11!, @Air.Inst.Ref.one_usize)
  %13 = add_safe(%4, @Air.Inst.Ref.one_usize)
  %14 = cmp_lte(%13, %12)
  %17!= block(void, {
    %18!= cond_br(%14!, likely {
      %12! %13!
      %19!= br(%17, @Air.Inst.Ref.void_value)
    }, cold {
      %4! %3!
      %15!= call(<fn (usize, usize) noreturn, (function 'panicOutOfBounds')>, [%13!, %12!])
      %16!= unreach()
    })
  } %12! %13! %14!)
  %20 = cmp_lte(@Air.Inst.Ref.zero_usize, %4)
  %23!= block(void, {
    %24!= cond_br(%20!, likely {
      %25!= br(%23, @Air.Inst.Ref.void_value)
    }, cold {
      %3!
      %21!= call(<fn (usize, usize) noreturn, (function 'panicOutOfBounds')>, [@Air.Inst.Ref.zero_usize, %4!])
      %22!= unreach()
    })
  } %20!)
  %26 = slice([:1]const u8, %3!, %4)
  %27 = slice_elem_val(%26, %4!)
  %28 = cmp_eq(@Air.Inst.Ref.one_u8, %27)
  %31!= block(void, {
    %32!= cond_br(%28!, likely {
      %27!
      %33!= br(%31, @Air.Inst.Ref.void_value)
    }, cold {
      %26!
      %29!= call(<fn (u8, u8) noreturn, (function 'panicSentinelMismatch__anon_1335')>, [@Air.Inst.Ref.one_u8, %27!])
      %30!= unreach()
    })
  } %27! %28!)
  %34!= store_safe(<*[:1]const u8, five_unreachable.slice>, %26!)
  %35!= ret_safe(@Air.Inst.Ref.void_value)

Debug:

fn2000:
	sub	rsp, 56
	mov	rax, qword ptr [rdi]
	mov	qword ptr [rsp + 40], rax
	mov	rcx, qword ptr [rdi + 8]
	mov	qword ptr [rsp + 48], rcx
	xor	eax, eax
	cmp	rax, rcx
	jbe	.LBB0_2
	jmp	.LBB0_3
.LBB0_1:
	mov	rax, qword ptr [rsp + 48]
	add	rax, 1
	mov	qword ptr [rsp + 32], rax
	setb	al
	jb	.LBB0_4
	jmp	.LBB0_5
.LBB0_2:
	jmp	.LBB0_1
.LBB0_3:
	mov	rsi, qword ptr [rsp + 48]
	xor	eax, eax
	mov	edi, eax
	call	builtin.panicStartGreaterThanEnd
.LBB0_4:
	movabs	rdi, offset __anon_1328
	mov	esi, 16
	xor	eax, eax
	mov	edx, eax
	movabs	rcx, offset .L__unnamed_1
	call	builtin.default_panic
.LBB0_5:
	mov	rax, qword ptr [rsp + 48]
	add	rax, 1
	mov	qword ptr [rsp + 24], rax
	setb	al
	jb	.LBB0_6
	jmp	.LBB0_7
.LBB0_6:
	movabs	rdi, offset __anon_1328
	mov	esi, 16
	xor	eax, eax
	mov	edx, eax
	movabs	rcx, offset .L__unnamed_1
	call	builtin.default_panic
.LBB0_7:
	mov	rax, qword ptr [rsp + 24]
	mov	rcx, qword ptr [rsp + 32]
	cmp	rax, rcx
	jbe	.LBB0_9
	jmp	.LBB0_10
.LBB0_8:
	mov	rcx, qword ptr [rsp + 48]
	xor	eax, eax
	cmp	rax, rcx
	jbe	.LBB0_12
	jmp	.LBB0_13
.LBB0_9:
	jmp	.LBB0_8
.LBB0_10:
	mov	rsi, qword ptr [rsp + 32]
	mov	rdi, qword ptr [rsp + 24]
	call	builtin.panicOutOfBounds
.LBB0_11:
	mov	rcx, qword ptr [rsp + 48]
	mov	rax, qword ptr [rsp + 40]
	mov	rdx, rcx
	mov	qword ptr [rsp], rdx
	mov	qword ptr [rsp + 8], rax
	mov	cl, byte ptr [rax + rcx]
	mov	byte ptr [rsp + 23], cl
	mov	al, 1
	cmp	al, cl
	je	.LBB0_15
	jmp	.LBB0_16
.LBB0_12:
	jmp	.LBB0_11
.LBB0_13:
	mov	rsi, qword ptr [rsp + 48]
	xor	eax, eax
	mov	edi, eax
	call	builtin.panicOutOfBounds
.LBB0_14:
	mov	rax, qword ptr [rsp]
	mov	rcx, qword ptr [rsp + 8]
	mov	qword ptr [rip + five_unreachable.slice], rcx
	mov	qword ptr [rip + five_unreachable.slice+8], rax
	add	rsp, 56
	ret
.LBB0_15:
	jmp	.LBB0_14
.LBB0_16:
	mov	al, byte ptr [rsp + 23]
	mov	edi, 1
	movzx	esi, al
	call	builtin.panicSentinelMismatch__anon_1335

ReleaseSafe:

fn2000:
	push	rax
	mov	rax, qword ptr [rdi + 8]
	cmp	rax, -1
	je	.LBB0_3
	mov	rcx, qword ptr [rdi]
	movzx	edx, byte ptr [rcx + rax]
	cmp	dl, 1
	jne	.LBB0_2
	mov	qword ptr [rip + five_unreachable.slice], rcx
	mov	qword ptr [rip + five_unreachable.slice+8], rax
	pop	rax
	ret
.LBB0_3:
	mov	edi, offset __anon_1328
	mov	esi, 16
	call	builtin.default_panic
.LBB0_2:
	movzx	edi, dl
	call	builtin.panicSentinelMismatch__anon_1335

Output (slices_1)

Air:

# Total AIR+Liveness bytes: 407B
# AIR Instructions:         15 (135B)
# AIR Extra Data:           28 (112B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      8 (32B)
# Liveness special table:   2 (16B)
  %0 = arg(*[:0]const u8)
  %1 = load([:0]const u8, %0!)
  %2 = slice_len(usize, %1)
  %3 = slice_ptr([*:0]const u8, %1!)
  %4 = ptr_elem_ptr(*const u8, %3, %2)
  %5 = load(u8, %4!)
  %6 = cmp_eq(@Air.Inst.Ref.one_u8, %5)
  %9!= block(void, {
    %10!= cond_br(%6!, likely {
      %5!
      %11!= br(%9, @Air.Inst.Ref.void_value)
    }, cold {
      %2! %3!
      %7!= call(<fn (u8, u8) noreturn, (function 'panicSentinelMismatch__anon_1293')>, [@Air.Inst.Ref.one_u8, %5!])
      %8!= unreach()
    })
  } %5! %6!)
  %12 = slice([:1]const u8, %3!, %2!)
  %13!= store_safe(<*[:1]const u8, five_unreachable.slice>, %12!)
  %14!= ret_safe(@Air.Inst.Ref.void_value)

Debug:

fn2000:
	sub	rsp, 24
	mov	rax, qword ptr [rdi]
	mov	qword ptr [rsp], rax
	mov	rcx, qword ptr [rdi + 8]
	mov	qword ptr [rsp + 8], rcx
	mov	cl, byte ptr [rax + rcx]
	mov	byte ptr [rsp + 23], cl
	mov	al, 1
	cmp	al, cl
	je	.LBB0_2
	jmp	.LBB0_3
.LBB0_1:
	mov	rax, qword ptr [rsp]
	mov	rcx, qword ptr [rsp + 8]
	mov	qword ptr [rip + five_unreachable.slice+8], rcx
	mov	qword ptr [rip + five_unreachable.slice], rax
	add	rsp, 24
	ret
.LBB0_2:
	jmp	.LBB0_1
.LBB0_3:
	mov	al, byte ptr [rsp + 23]
	mov	edi, 1
	movzx	esi, al
	call	builtin.panicSentinelMismatch__anon_1293

ReleaseSafe:

fn2000:
	mov	rax, qword ptr [rdi]
	mov	rcx, qword ptr [rdi + 8]
	movzx	edx, byte ptr [rax + rcx]
	cmp	dl, 1
	jne	.LBB0_2
	mov	qword ptr [rip + five_unreachable.slice], rax
	mov	qword ptr [rip + five_unreachable.slice+8], rcx
	ret
.LBB0_2:
	push	rax
	movzx	edi, dl
	call	builtin.panicSentinelMismatch__anon_1293

Explanation

Zig users might expect that only one (sentinel) runtime safety check would be added for this operation.

This is the source code for five_unreachable.zig (again):

var slice: [:1]const u8 = undefined;
export fn fn2000(src_ptr112: *[:0]const u8) void {
    slice = src_ptr112.*[0.. :1];
}

These are the panic functions called by the assembly output by master for the example function fn2000:

  1. builtin.panicStartGreaterThanEnd from start <= end: This check is not what it appears to be, because there is no end operand for this slice operation. In this context end is equal to slice.len as shown by the AIR instructions (%4, %5), and the remainder of the function will ignore that. As in the previous example, the success condition 0 <= slice.len is a fact.
  2. builtin.default_panic from end + 1: This check is for the addition operation extending the end index, 'accounting' for the sentinel in the output type. The theory behind this technique is that pointers with sentinels have an 'actual length' equal to .len + 1, and these lengths are used to construct the conditions for bounds checks instead of the input bounds. Commitment to this technique is the most consistent--apparently deliberate--performance disadvantage of the current implementation for variants with sentinels. Using these values also produces panic messages which are inconsistent with the compile error messages for the same requirement.
  3. builtin.default_panic from slice.len + 1: This check is for the addition operation extending slice.len, 'accounting' for the sentinel in the input type. This is unnecessary because both input and output pointer types have sentinels. If the end index is out-of-bounds so is the sentinel index. The only case where slice.len + 1 is necessary is when the input pointer has a sentinel and the output pointer does not.
  4. builtin.panicOutOfBounds from end <= len: This check compares actual_end <= actual_len. These values are equal, as shown by the AIR instructions (%4, %11, %13, %14). The success condition slice.len + 1 <= slice.len + 1 is a fact.
  5. builtin.panicOutOfBounds from start <= end: This check is the second instance of start <= end. This is covered in more detail by the previous example.
  6. builtin.panicSentinelMismatch from ptr[end] == sentinel: This check is valid.

Four of these checks will be optimised by DCE, as shown by the ReleaseSafe assembly above. The optimised output for master retains a single overflow check (cmp rax, -1), but the addition itself is eliminated because the panic function (that would use the index as panic information) is unreachable.

Performance counters compiling 5620 unique success variants

These measurements are provided to give an indication of the compile time performance advantage of this patch. This implementation performs above average for this set, with a machine code size reduction of 54 percent per variant instead of the general average of 46 percent. This emerges from the fact that success variants will often not require any runtime checks, while panic variants will always require at least one runtime check.

Compiled with zig build-exe -fno-formatted-panics -fstrip -fomit-frame-pointer -fno-unwind-tables slice_success_variants.zig

Counters (master)

     6,596,208,647      instructions:u                   #    1.34  insn per cycle
                                                         #    0.14  stalled cycles per insn     (63.54%)
     4,912,981,950      cycles:u                         #    2.769 GHz                         (59.08%)
          1,773.97 msec task-clock:u                     #    1.102 CPUs utilized
            56,735      page-faults:u                    #   31.982 K/sec
       263,188,305      stalled-cycles-frontend:u        #    5.36% frontend cycles idle        (58.61%)
       950,606,083      stalled-cycles-backend:u         #   19.35% backend cycles idle         (61.67%)
       389,627,041      cache-references:u               #  219.636 M/sec                       (64.97%)
        94,014,673      cache-misses:u                   #   24.13% of all cache refs           (66.09%)
     1,291,407,625      branches:u                       #  727.978 M/sec                       (66.85%)
        25,959,105      branch-misses:u                  #    2.01% of all branches             (66.89%)

       1.610306908 seconds time elapsed

       1.321783000 seconds user
       0.412417000 seconds sys

Counters (slices_1)

     4,153,735,538      instructions:u                   #    1.42  insn per cycle
                                                         #    0.12  stalled cycles per insn     (59.31%)
     2,929,194,123      cycles:u                         #    2.590 GHz                         (58.16%)
          1,130.79 msec task-clock:u                     #    1.223 CPUs utilized
            25,722      page-faults:u                    #   22.747 K/sec
       160,299,259      stalled-cycles-frontend:u        #    5.47% frontend cycles idle        (58.11%)
       502,943,746      stalled-cycles-backend:u         #   17.17% backend cycles idle         (66.34%)
       207,185,552      cache-references:u               #  183.221 M/sec                       (69.18%)
        52,371,966      cache-misses:u                   #   25.28% of all cache refs           (70.22%)
       680,685,017      branches:u                       #  601.953 M/sec                       (69.79%)
        14,818,315      branch-misses:u                  #    2.18% of all branches             (69.04%)

       0.924530372 seconds time elapsed

       0.792225000 seconds user
       0.300937000 seconds sys

@andrewrk
Copy link
Member

andrewrk commented Sep 15, 2024

Thank you for taking the time to make this contribution tractable. I will be sure to give you a timely review.

edit: I will wait until it is passing the CI to review it, however

@amp-59
Copy link
Contributor Author

amp-59 commented Sep 16, 2024

The CI is failing because one of the previously omitted sentinel checks is catching an undefined null-terminator for a buffer sliced by buf[0..].

The error is only encountered on Windows, and the unexpected sentinel value is 43690, which is @as(u16, 0xaaaa).

This runtime check for implicitly propagated sentinels is new with this implementation, as implicitly propagated sentinels were previously only checked at compile time (under certain conditions). A similar bug was found in lib/std/tz.zig.

I searched for the uninitialised buffer, but was unable to find it or test anything. Without help from Windows contributors (@squeek502?) the only solution seems to be to re-disable the sentinel check.

Copy link
Member

@mlugg mlugg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not done a full review here, just skimmed some parts, but I have a few comments.

src/Sema.zig Show resolved Hide resolved
src/Sema.zig Show resolved Hide resolved
src/Sema.zig Show resolved Hide resolved
src/Sema.zig Outdated Show resolved Hide resolved
* Various superficial changes.

* Removed bounds checks for pointers with `comptime`-known upper bounds.
    1) Except when the actual upper bound may be used to rule out the
        existence of a sentinel.

    2) Except when those bounds are explicitly defined, such as by
       `comptime`-known slices `.len` or by array type information.

       Any available explicitly defined bounds will be used instead of
       actual bounds in all safety checks.
* Removed unused functions and fields.

* Added validation for start, end, and length operands when slicing
  `comptime`-only types.
const dest_sent_val: Value = val: {
if (dest_sent_opt != .none) {
sa.dest_sent = .known;
try checkSentinelType(sema, block, dest_sent_src, elem_ty);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with merge.


// Disable all bounds checks for pointers without explicit lengths.
if (!src_ptr_explicit_len and sa.eq_sentinel != .known) {
sa.end_le_len = .unknown;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with previous commits, as requested.

Comment on lines +38992 to +39012
if (try elem_ty.comptimeOnlySema(sema.pt)) require_comptime: {
assert(sa.src_ptr == .known);
const src_note = src_note: {
if (sa.dest_start == .variable) {
break :src_note .{ dest_start_src, "start index must be comptime-known" };
}
if (dest_len_opt != .none and sa.dest_len == .variable) {
break :src_note .{ dest_end_src, "length must be comptime-known" };
}
if (dest_end_opt != .none and sa.dest_end == .variable) {
break :src_note .{ dest_end_src, "end index must be comptime-known" };
}
break :require_comptime;
};
const msg: *Zcu.ErrorMsg = try sema.errMsg(src, "unable to resolve operand slicing comptime-only type '{}'", .{operand_ty.fmt(sema.pt)});
{
errdefer msg.destroy(sema.gpa);
try sema.errNote(src_note[0], msg, "{s}", .{src_note[1]});
}
return sema.failWithOwnedErrorMsg(block, msg);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation is not present in master, and because of this it is quite easy to crash the compiler by slicing comptime-only types with runtime-known operands.

For example, compile the example program with zig build-obj runtime_indices_slicing_comptime_only_pointers.zig
runtime_indices_slicing_comptime_only_pointers.zig:

export fn end() void {
    comptime var buf: [1]comptime_int = .{0};
    var dest_end: usize = 0;
    dest_end +%= 1;
    const slice: []const comptime_int = buf[0..dest_end];
    const ptr: [*]const comptime_int = slice.ptr;
    _ = ptr[0..5];
}
export fn start() void {
    comptime var buf: [1]comptime_int = .{0};
    var dest_start: usize = 0;
    dest_start +%= 1;
    const slice: []const comptime_int = buf[dest_start..][0..9];
    _ = slice[0];
}
export fn len() void {
    comptime var buf: [1]comptime_int = .{0};
    var dest_len: usize = 0;
    dest_len +%= 1;
    const slice: []const comptime_int = buf[0..][0..dest_len];
    _ = slice[0];
}
export fn end_const() void {
    const buf: [1]type = .{void};
    var dest_end: usize = 0;
    dest_end +%= 1;
    const slice: []const type = buf[0..dest_end];
    _ = slice[0];
}
export fn start_const() void {
    const buf: [1]type = .{void};
    var dest_start: usize = 0;
    dest_start +%= 1;
    const slice: []const type = buf[dest_start..][0..9];
    _ = slice[0];
}
export fn len_const() void {
    const buf: [1]type = .{void};
    var dest_len: usize = 0;
    dest_len +%= 1;
    const slice: []const type = buf[0..][0..dest_len];
    _ = slice[0];
}

Output (master)

zig build-obj runtime_indices_slicing_comptime_only_pointers.zig
Trace/breakpoint trap

This crash is produced by example function end. If end is rewritten to be like the other functions, the output is several apparently self-contradictory compile errors:

zig build-obj runtime_indices_slicing_comptime_only_pointers.zig
runtime_indices_slicing_comptime_only_pointers.zig:6:15: error: values of type '[]const comptime_int' must be comptime-known, but index value is runtime-known
    _ = slice[0];
              ^
runtime_indices_slicing_comptime_only_pointers.zig:13:15: error: values of type '[]const comptime_int' must be comptime-known, but index value is runtime-known
    _ = slice[0];
              ^
runtime_indices_slicing_comptime_only_pointers.zig:20:15: error: values of type '[]const comptime_int' must be comptime-known, but index value is runtime-known
    _ = slice[0];
              ^
runtime_indices_slicing_comptime_only_pointers.zig:27:15: error: values of type '[]const type' must be comptime-known, but index value is runtime-known
    _ = slice[0];
              ^
runtime_indices_slicing_comptime_only_pointers.zig:27:14: note: types are not available at runtime
    _ = slice[0];
        ~~~~~^~~
runtime_indices_slicing_comptime_only_pointers.zig:34:15: error: values of type '[]const type' must be comptime-known, but index value is runtime-known
    _ = slice[0];
              ^
runtime_indices_slicing_comptime_only_pointers.zig:34:14: note: types are not available at runtime
    _ = slice[0];
        ~~~~~^~~
runtime_indices_slicing_comptime_only_pointers.zig:41:15: error: values of type '[]const type' must be comptime-known, but index value is runtime-known
    _ = slice[0];
              ^
runtime_indices_slicing_comptime_only_pointers.zig:41:14: note: types are not available at runtime
    _ = slice[0];
        ~~~~~^~~

Output (slices_1)

This is what the new compile errors look like:

zig build-obj runtime_indices_slicing_comptime_only_pointers.zig
runtime_indices_slicing_comptime_only_pointers.zig:5:44: error: unable to resolve operand slicing comptime-only type '[1]comptime_int'
    const slice: []const comptime_int = buf[0..dest_end];
                                        ~~~^~~~~~~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:5:48: note: end index must be comptime-known
    const slice: []const comptime_int = buf[0..dest_end];
                                               ^~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:13:58: error: unable to resolve operand slicing comptime-only type '[1]comptime_int'
    const slice: []const comptime_int = buf[dest_start..][0..9];
                                        ~~~~~~~~~~~~~~~~~^~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:13:45: note: start index must be comptime-known
    const slice: []const comptime_int = buf[dest_start..][0..9];
                                            ^~~~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:20:49: error: unable to resolve operand slicing comptime-only type '[1]comptime_int'
    const slice: []const comptime_int = buf[0..][0..dest_len];
                                        ~~~~~~~~^~~~~~~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:20:53: note: length must be comptime-known
    const slice: []const comptime_int = buf[0..][0..dest_len];
                                                    ^~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:27:36: error: unable to resolve operand slicing comptime-only type '[1]type'
    const slice: []const type = buf[0..dest_end];
                                ~~~^~~~~~~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:27:40: note: end index must be comptime-known
    const slice: []const type = buf[0..dest_end];
                                       ^~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:34:50: error: unable to resolve operand slicing comptime-only type '[1]type'
    const slice: []const type = buf[dest_start..][0..9];
                                ~~~~~~~~~~~~~~~~~^~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:34:37: note: start index must be comptime-known
    const slice: []const type = buf[dest_start..][0..9];
                                    ^~~~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:41:41: error: unable to resolve operand slicing comptime-only type '[1]type'
    const slice: []const type = buf[0..][0..dest_len];
                                ~~~~~~~~^~~~~~~~~~~~~
runtime_indices_slicing_comptime_only_pointers.zig:41:45: note: length must be comptime-known
    const slice: []const type = buf[0..][0..dest_len];
                                            ^~~~~~~~

@amp-59
Copy link
Contributor Author

amp-59 commented Sep 25, 2024

The only remaining differences between this patch and master (besides the fixes) are:

  1. Machine code produced for slices is twice as efficient on average.
  2. Slicing comptime-only types with runtime-known operands is prevented (no issue).
  3. Behaviour is consistent with respect to comptime-known undefined pointers.

If (2) is not desirable, I will just remove it and create issues for the related crashes. If it is desirable I will add an appropriate test case.

If (3) is not desirable, #19798 should be closed, and I need to know which combinations of inputs should allow slicing comptime-known undefined pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment