Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add middleware for catching panics #265

Merged
merged 1 commit into from
Jun 5, 2019

Conversation

Nemo157
Copy link
Contributor

@Nemo157 Nemo157 commented May 27, 2019

Motivation and Context

cc #263, @prasannavl, @yoshuawuyts

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@Nemo157
Copy link
Contributor Author

Nemo157 commented May 27, 2019

One thing I'm unsure about post restructure is where relatively trivial middleware like this should go. Is it big enough to be worth its own crate? If not, should it be in the main tide crate like it is currently, or somewhere else?

@prasannavl
Copy link
Contributor

prasannavl commented May 27, 2019

@Nemo157 Nice! I was also just discussing this with @yoshuawuyts and I'm wondering if there're common cases that cause non-unwindable panic outside of alloc and interop situations.

If this is sufficient, there's probably no need to think about future executors or resurrecting worker threads for now.

Another question: If it isn't desired for this to be caught, should we still catch them and propagate them to tide somehow as an error and initiate a (partially clean?) shutdown instead of just crashing? (This could be a another middleware in the same crate or a config, may be?)


Thinking out loud here, on the where it goes bit:

  • I think we're probably going to have some 10-20 middlewares. And most of trivial ones will likely fall in some category - log, headers, auth, cookies, errors etc.
  • I think over time we can probably logically group them into one of the existing category. For instance today we have DefaultHeaders in tide-headers, we probably could have other trivial bits that have to do with headers in tide-headers -- say ForwardHeaders for adding support for proxy forwarded headers - just separated by feature flags. (or does this affect discoverability?)
  • Personally, I think we generally should avoid putting things in tide directly, as that defeats the purpose of splitting things up.
  • Anything that doesn't fit in, we probably could just let it be outside in personal repos.
  • If absolutely needed, may be add tide-extras and keep things in there?

That said, I personally think, error handling could be in it's own logical category and hence go into a crate.

@Nemo157
Copy link
Contributor Author

Nemo157 commented May 28, 2019

I'm wondering if there're common cases that cause non-unwindable panic outside of alloc and interop situations.

If this is sufficient, there's probably no need to think about future executors or resurrecting worker threads for now.

As far as I'm aware running out of memory and most panic-inside-FFI situations abort the process, so there's no way that Tide/the executor could recover.

Another question: If it isn't desired for this to be caught, should we still catch them and propagate them to tide somehow as an error and initiate a (partially clean?) shutdown instead of just crashing? (This could be a another middleware in the same crate or a config, may be?)

I have another branch adding a middleware for something like that along with examples for both this middleware and the one it adds: Nemo157/tide@catch-unwind...propagate-unwind (I haven't included it in this PR yet because I'm not certain whether it should resume_unwind like it does, or return the panic as an error, and I've noticed some weird behaviour of Tokio when panicking on the "main" task I want to investigate for the example).

That said, I personally think, error handling could be in it's own logical category and hence go into a crate.

How does tide-panic sound?

@prasannavl
Copy link
Contributor

prasannavl commented May 28, 2019

As far as I'm aware running out of memory and most panic-inside-FFI situations abort the process, so there's no way that Tide/the executor could recover.

I'm wondering if there are any other common cases which might abort the process directly beyond these two. However, alloc can also be recovered by this: rust-lang/rust#43596

Currently, this works:

#![feature(alloc_error_hook)]

use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;

fn handle_alloc_error(layout: Layout) {
    panic!("alloc: failed to allocate {} bytes", layout.size());
}

fn main() {
    set_alloc_error_hook(handle_alloc_error);
    let result = std::panic::catch_unwind(|| {
        let v = Vec::<u8>::with_capacity(100000000000000);
        println!("{:p}", &v[..]);
    });
    println!("{:?}", result);
}

But interestingly, looks like drops don't work properly with release optimisations:

#![feature(alloc_error_hook)]

use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;

fn handle_alloc_error(layout: Layout) {
    panic!("alloc: failed to allocate {} bytes", layout.size());
}

fn main() {
    set_alloc_error_hook(handle_alloc_error);
    let result = std::panic::catch_unwind(|| {
        let mut r = Resource { ptr: Box::into_raw(Box::new(0)) };
        let mut v = Vec::<usize>::with_capacity(100000000000000);
        println!("{:p}", &v[..]);
        println!("{:p}", r.ptr);
        r.ptr = v.as_mut_ptr();
        println!("{:p}", r.ptr);
    });
    println!("{:?}", result);
}

struct Resource { ptr: *mut usize }

impl Drop for Resource {
    fn drop(&mut self) {
        println!("dropping resource: {:p}", self.ptr);
    }
}
  • So, likely not a viable solution at this stage.

And tide-panic sounds great! - though I'll let others chime in on this.

std::panic::AssertUnwindSafe(next.run(cx))
.catch_unwind()
.unwrap_or_else(move |err| (self.f)(err))
.boxed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only returns 500 for a single request, right? My initial understanding was that all requests would be terminated with 500, and then the panic would continue. But I guess the intended behavior here is different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to terminate all requests? panic on the request path doesn't need to prevent other working code paths, as long as it's safely unwindable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the “I think my application will continue working if a single request panics” middleware. See the link in my other comment to the “fail-fast” version I haven’t finished yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking with the other Runtime maintainers yesterday, and if a future breaks inside of runtime somewhere (e.g. runtime::spawn), it'll default to terminating its thread, and we have no good way to restore that thread. There might not even be one tbh. I believe Tokio adopted a similar model recently too (on request of the Deno project).

If Tide starts catching panics rather than propagating them, we will start losing threads without having a way to spin them back up; which will severely limit throughput without failing the process entirely. Which seems like not a great situation to be in. I feel this would be particularly hard to debug when not intimately familiar with all moving parts up and down the stack (and even then it's not easy).

I guess what I'm saying is that this approach is making me feel quite nervous, and perhaps we should start with Nemo157/tide@catch-unwind...propagate-unwind first?

@Nemo157
Copy link
Contributor Author

Nemo157 commented May 29, 2019

Note from chat: make documentation more explicit that using this can lead to other issues and that you really need to understand how unwinding could affect the rest of your code, and investigate whether it should require a bound like State: (Ref)UnwindSafe to make it more explicit.

@Nemo157
Copy link
Contributor Author

Nemo157 commented Jun 1, 2019

I've moved this to a new tide-panic crate and extended the documentation. I've added a State: RefUnwindSafe bound for now, since the panic may occur while there is an &State accessible, and all other data from the context should be request local.

Copy link
Member

@yoshuawuyts yoshuawuyts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks heaps!

@yoshuawuyts yoshuawuyts merged commit 2504828 into http-rs:master Jun 5, 2019
@Nemo157 Nemo157 deleted the catch-unwind branch June 6, 2019 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants