Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundedArray segfaults for items larger than stack #19954

Open
Nairou opened this issue May 12, 2024 · 4 comments
Open

BoundedArray segfaults for items larger than stack #19954

Nairou opened this issue May 12, 2024 · 4 comments
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@Nairou
Copy link

Nairou commented May 12, 2024

Zig Version

0.12.0

Steps to Reproduce and Observed Behavior

Even when dealing with a std.BoundedArray that exists on the heap, functions to remove items from the array can result in a segmentation fault due to stack overflow, since each of these functions attempt to fetch or return the old value onto the stack.

const LargeStruct1 = struct {
    value: [800000]u8 = undefined,
};
const LargeStruct2 = struct {
    items: std.BoundedArray( LargeStruct1, 16 ),
};

pub fn main() !void {
    var array = try std.heap.page_allocator.create( LargeStruct2 );
    try array.items.append( .{} );
    //_ = array.items.swapRemove( 0 );     // WILL SEGFAULT
    //_ = array.items.pop();               // WILL SEGFAULT
}

Expected Behavior

Ideally, manipulation of heap data stays on the heap.

At a deeper level, should the compiler be smart enough to not try to pull a struct onto the stack when it is of a known compile-time size to be too large?

@Nairou Nairou added the bug Observed behavior contradicts documented or intended behavior label May 12, 2024
@jedisct1
Copy link
Contributor

BoundedArray was designed for small arrays that are passed around as a single value on the stack.

What you're doing here is equivalent to declaring a type which is more than 12,800,000 bytes large.

Even if you reserve that much space on the heap, types are assumed to be small. You can't access a value so large that it wouldn't even fit on the stack.

@thrwawyacct
Copy link

thrwawyacct commented May 17, 2024

Ideally, manipulation of heap data stays on the heap.

Not possible without an implicit heap allocation which is obviously a no-go for Zig. The ideal solution would be to force a compilation error when the compiler is required to put very large values on the stack.

This issue is not unique to BoundedArray or even Zig itself. If you are actually having this problem in real world code I would recommend using an array of pointers instead of values.

@rohlem
Copy link
Contributor

rohlem commented May 17, 2024

The actual issue (/ design incompatible with this large-item usage) lies within the method implementations, for example in swapRemove:

const old_item = self.get(i); //tells the compiler to save a copy on the stack
self.set(i, self.pop()); //pop returns the last entry on the stack, set overwrites index i completely with this value
return old_item; //returns the previously-saved copy

It would technically be possible:

  • (1) for the compiler to recognize the result will be discarded via _ = , and produce a "discarded-result-location" variant of the function,
    so that old_item is deduced to not be allocated on the stack (liveness analysis figures out it's immediately "dead"/unused)
  • (2) copy the last value into index i piecewise, so effectively instead use @memcpy
    followed by a pop() that also just discards and doesn't save the value on the stack

While I think that introducing such transformations in the long-term at some point would be nice, I also think it's likely for the suggestion to be rejected by maintainers for language simplicity (more directly mapping the code you write to the assembly generated).
If they are rejected it would be up to the user to implement a popOverwrite method for (2) and a swapRemoveDiscard method for (1) by hand.

Either way, I do think the compiler should produce a compile error once it detects too high stack usage, i.e. not produce an executable that segfaults because of this issue at run time.
This is in some ways related to the mechanisms behind #157 .

@jedisct1
Copy link
Contributor

The maximum stack size could only be checked at runtime. It can be set per thread, stacks can also be switched (very common for coroutines), stacks can also grow...

So the compiler could raise an error based on some arbitrary limit, just like it could refuse to compile a loop if it could spin for too many iterations. But arbitrary limits are not great. They don't mirror the reality, nor crazy things developers may legitimately do.

The actual issue you are raising here is that there are cases where a value is unused, yet its computation is not optimized out in debug mode (your code example doesn't segfault in ReleaseSafe and ReleaseFast modes).

I guess trying to do that kind of optimization in debug mode, that intentionally doesn't do optimizations, would slow down compilation with little benefits. The stack overflow you are getting in debug mode highlights a design error in the data structures the code is using, so it's actually useful.

@Vexu Vexu added the standard library This issue involves writing Zig code for the standard library. label May 20, 2024
@Vexu Vexu added this to the 1.0.0 milestone May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

5 participants