Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialized: attempt to produce nice backtraces on stack overflow #5549

Merged
merged 1 commit into from
Feb 2, 2021

Conversation

benesch
Copy link
Contributor

@benesch benesch commented Feb 1, 2021

Rust produces bad error messages on stack overflow, like

"thread 'foo' has overflowed its stack"

which provides very little insight into where the recursion that caused
the stack to overflow occurred.

See rust-lang/rust#51405 for details.

This commit adds a SIGSEGV handler that attempts to print a backtrace,
following the approach in the backtrace-on-stack-overflow crate. I
copied the code from that crate into Materialize and tweaked it because
it's a very small amount of code that we'll likely need to modify, and I
wanted to improve its error handling.

In my manual testing this produces a nice backtrace when Materialize
overflows its stack.


This change is Reviewable

@benesch benesch requested review from umanwizard and elindsey and removed request for umanwizard February 1, 2021 21:45
// squabble over a few megabytes.
const STACK_SIZE: usize = 2 << 20;

// x86_64 requires 16-byte alignment. Hopefully other platforms don't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arm64 requires 16-byte as well AFAICT from a quick googling; you might want to mention that because my first thought here was "will this break mz-on-arm?"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will do on the next pass through CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this one up too!

// Allocate a stack.
let buf_layout =
Layout::from_size_align(STACK_SIZE, STACK_ALIGN).expect("layout known to be valid");
let buf = unsafe { alloc::alloc(buf_layout) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New coding standard I just invented: you must leave a comment about why any usage of unsafe is sound. (Feel free to push back if you don't think it's a good rule, but seems reasonable to me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do try to follow that in general—I'm just not sure what it would say in this function. Like, there aren't invariants that must be upheld. We're just calling unsafe C functions. I can add comments like // SAFETY: calling a C function in the manner it which it is documented to be called but that doesn't seem very useful. Happy to take suggestions on what would be more valuable here though!

Copy link
Contributor

@umanwizard umanwizard Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloc::alloc does in fact have documented requirements: "SAFETY: See GlobalAlloc::alloc."

Which points to...

This function is unsafe because undefined behavior can result if the caller does not ensure that layout has non-zero size.

(Extension subtraits might provide more specific bounds on behavior, e.g., guarantee a sentinel address or a null pointer in response to a zero-size allocation request.)

The allocated block of memory may or may not be initialized.

So I think you can just mention that (1) you have ensured that the size is non-zero, and (2) that nobody is reading the uninitialized value (or at any rate, not doing so before it gets passed to C World and outside the purview of Rust's safety guarantees anyway).

Given your response, I think you might have instead been thinking of your later uses of unsafe , in the calls to sigaltstack and sigaction. For these use your judgment, you could write down your understanding of when POSIX/linux guarantees that calling these functions is allowed, or just a link to man pages, or nothing if you really think there's not anything useful to be said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, took another stab at it.

@benesch benesch force-pushed the overflow-backtraces branch 2 times, most recently from 4995236 to 4576289 Compare February 1, 2021 23:13
// Chosen to match the default Rust thread stack size of 2MiB. Probably
// overkill, but we'd much rather have backtraces on stack overflow than
// squabble over a few megabytes.
const STACK_SIZE: usize = 2 << 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use libc::SIGSTKSZ instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that's a good point you bring up. The thing is that libc::SIGSTKSZ makes me a little nervous because I'm not sure that Backtrace::backtrace is equipped to run on an 8KiB stack. I mean, it should be, because a few stack frames should not take up 8KIB... but I figured it'd be better to be safe than sorry. Don't want some crazy change to how gimli works down the road causing a stack overflow while we're trying to handle a stack overflow.

Happy to be convinced otherwise, but for now I just added a comment that briefly explains this.

Copy link
Contributor

@elindsey elindsey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Left one minor comment - there's an existing constant for sigstack sizes you may want to use.

Rust produces bad error messages on stack overflow, like

    "thread 'foo' has overflowed its stack"

which provides very little insight into where the recursion that caused
the stack to overflow occurred.

See rust-lang/rust#51405 for details.

This commit adds a SIGSEGV handler that attempts to print a backtrace,
following the approach in the backtrace-on-stack-overflow crate. I
copied the code from that crate into Materialize and tweaked it because
it's a very small amount of code that we'll likely need to modify, and I
wanted to improve its error handling.

In my manual testing this produces a nice backtrace when Materialize
overflows its stack.
@benesch benesch force-pushed the overflow-backtraces branch from 4576289 to f11f749 Compare February 2, 2021 03:47
@benesch benesch merged commit a1438d9 into MaterializeInc:main Feb 2, 2021
@benesch benesch deleted the overflow-backtraces branch February 2, 2021 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants