Add middleware for catching panics #265

Nemo157 · 2019-05-27T17:08:53Z

Motivation and Context

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

Nemo157 · 2019-05-27T17:09:57Z

One thing I'm unsure about post restructure is where relatively trivial middleware like this should go. Is it big enough to be worth its own crate? If not, should it be in the main tide crate like it is currently, or somewhere else?

prasannavl · 2019-05-27T21:19:03Z

@Nemo157 Nice! I was also just discussing this with @yoshuawuyts and I'm wondering if there're common cases that cause non-unwindable panic outside of alloc and interop situations.

If this is sufficient, there's probably no need to think about future executors or resurrecting worker threads for now.

Another question: If it isn't desired for this to be caught, should we still catch them and propagate them to tide somehow as an error and initiate a (partially clean?) shutdown instead of just crashing? (This could be a another middleware in the same crate or a config, may be?)

Thinking out loud here, on the where it goes bit:

I think we're probably going to have some 10-20 middlewares. And most of trivial ones will likely fall in some category - log, headers, auth, cookies, errors etc.
I think over time we can probably logically group them into one of the existing category. For instance today we have DefaultHeaders in tide-headers, we probably could have other trivial bits that have to do with headers in tide-headers -- say ForwardHeaders for adding support for proxy forwarded headers - just separated by feature flags. (or does this affect discoverability?)
Personally, I think we generally should avoid putting things in tide directly, as that defeats the purpose of splitting things up.
Anything that doesn't fit in, we probably could just let it be outside in personal repos.
If absolutely needed, may be add tide-extras and keep things in there?

That said, I personally think, error handling could be in it's own logical category and hence go into a crate.

Nemo157 · 2019-05-28T06:28:14Z

I'm wondering if there're common cases that cause non-unwindable panic outside of alloc and interop situations.

If this is sufficient, there's probably no need to think about future executors or resurrecting worker threads for now.

As far as I'm aware running out of memory and most panic-inside-FFI situations abort the process, so there's no way that Tide/the executor could recover.

Another question: If it isn't desired for this to be caught, should we still catch them and propagate them to tide somehow as an error and initiate a (partially clean?) shutdown instead of just crashing? (This could be a another middleware in the same crate or a config, may be?)

I have another branch adding a middleware for something like that along with examples for both this middleware and the one it adds: Nemo157/tide@catch-unwind...propagate-unwind (I haven't included it in this PR yet because I'm not certain whether it should resume_unwind like it does, or return the panic as an error, and I've noticed some weird behaviour of Tokio when panicking on the "main" task I want to investigate for the example).

That said, I personally think, error handling could be in it's own logical category and hence go into a crate.

How does tide-panic sound?

prasannavl · 2019-05-28T19:38:07Z

As far as I'm aware running out of memory and most panic-inside-FFI situations abort the process, so there's no way that Tide/the executor could recover.

I'm wondering if there are any other common cases which might abort the process directly beyond these two. However, alloc can also be recovered by this: rust-lang/rust#43596

Currently, this works:

#![feature(alloc_error_hook)]

use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;

fn handle_alloc_error(layout: Layout) {
    panic!("alloc: failed to allocate {} bytes", layout.size());
}

fn main() {
    set_alloc_error_hook(handle_alloc_error);
    let result = std::panic::catch_unwind(|| {
        let v = Vec::<u8>::with_capacity(100000000000000);
        println!("{:p}", &v[..]);
    });
    println!("{:?}", result);
}

But interestingly, looks like drops don't work properly with release optimisations:

#![feature(alloc_error_hook)]

use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;

fn handle_alloc_error(layout: Layout) {
    panic!("alloc: failed to allocate {} bytes", layout.size());
}

fn main() {
    set_alloc_error_hook(handle_alloc_error);
    let result = std::panic::catch_unwind(|| {
        let mut r = Resource { ptr: Box::into_raw(Box::new(0)) };
        let mut v = Vec::<usize>::with_capacity(100000000000000);
        println!("{:p}", &v[..]);
        println!("{:p}", r.ptr);
        r.ptr = v.as_mut_ptr();
        println!("{:p}", r.ptr);
    });
    println!("{:?}", result);
}

struct Resource { ptr: *mut usize }

impl Drop for Resource {
    fn drop(&mut self) {
        println!("dropping resource: {:p}", self.ptr);
    }
}

So, likely not a viable solution at this stage.

And tide-panic sounds great! - though I'll let others chime in on this.

yoshuawuyts · 2019-05-29T11:01:14Z

src/catch_unwind.rs

+        std::panic::AssertUnwindSafe(next.run(cx))
+            .catch_unwind()
+            .unwrap_or_else(move |err| (self.f)(err))
+            .boxed()


This only returns 500 for a single request, right? My initial understanding was that all requests would be terminated with 500, and then the panic would continue. But I guess the intended behavior here is different?

Is there a need to terminate all requests? panic on the request path doesn't need to prevent other working code paths, as long as it's safely unwindable?

Yes, this is the “I think my application will continue working if a single request panics” middleware. See the link in my other comment to the “fail-fast” version I haven’t finished yet.

I was talking with the other Runtime maintainers yesterday, and if a future breaks inside of runtime somewhere (e.g. runtime::spawn), it'll default to terminating its thread, and we have no good way to restore that thread. There might not even be one tbh. I believe Tokio adopted a similar model recently too (on request of the Deno project).

If Tide starts catching panics rather than propagating them, we will start losing threads without having a way to spin them back up; which will severely limit throughput without failing the process entirely. Which seems like not a great situation to be in. I feel this would be particularly hard to debug when not intimately familiar with all moving parts up and down the stack (and even then it's not easy).

I guess what I'm saying is that this approach is making me feel quite nervous, and perhaps we should start with Nemo157/tide@catch-unwind...propagate-unwind first?

Nemo157 · 2019-05-29T12:28:20Z

Note from chat: make documentation more explicit that using this can lead to other issues and that you really need to understand how unwinding could affect the rest of your code, and investigate whether it should require a bound like State: (Ref)UnwindSafe to make it more explicit.

Nemo157 · 2019-06-01T13:48:50Z

I've moved this to a new tide-panic crate and extended the documentation. I've added a State: RefUnwindSafe bound for now, since the panic may occur while there is an &State accessible, and all other data from the context should be request local.

yoshuawuyts

Excellent, thanks heaps!

yoshuawuyts reviewed May 29, 2019

View reviewed changes

Add middleware for catching panics

a081bfe

Nemo157 force-pushed the catch-unwind branch from 181cd19 to a081bfe Compare June 1, 2019 13:32

yoshuawuyts approved these changes Jun 5, 2019

View reviewed changes

yoshuawuyts merged commit 2504828 into http-rs:master Jun 5, 2019

Nemo157 deleted the catch-unwind branch June 6, 2019 08:29

yoshuawuyts mentioned this pull request Oct 31, 2019

Tide development plan #336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add middleware for catching panics #265

Add middleware for catching panics #265

Nemo157 commented May 27, 2019

Nemo157 commented May 27, 2019

prasannavl commented May 27, 2019 •

edited

Loading

Nemo157 commented May 28, 2019

prasannavl commented May 28, 2019 •

edited

Loading

yoshuawuyts May 29, 2019

prasannavl May 29, 2019

Nemo157 May 29, 2019

yoshuawuyts May 29, 2019

Nemo157 commented May 29, 2019

Nemo157 commented Jun 1, 2019

yoshuawuyts left a comment

Add middleware for catching panics #265

Add middleware for catching panics #265

Conversation

Nemo157 commented May 27, 2019

Motivation and Context

Types of changes

Checklist:

Nemo157 commented May 27, 2019

prasannavl commented May 27, 2019 • edited Loading

Nemo157 commented May 28, 2019

prasannavl commented May 28, 2019 • edited Loading

yoshuawuyts May 29, 2019

Choose a reason for hiding this comment

prasannavl May 29, 2019

Choose a reason for hiding this comment

Nemo157 May 29, 2019

Choose a reason for hiding this comment

yoshuawuyts May 29, 2019

Choose a reason for hiding this comment

Nemo157 commented May 29, 2019

Nemo157 commented Jun 1, 2019

yoshuawuyts left a comment

Choose a reason for hiding this comment

prasannavl commented May 27, 2019 •

edited

Loading

prasannavl commented May 28, 2019 •

edited

Loading