-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add middleware for catching panics #265
Conversation
One thing I'm unsure about post restructure is where relatively trivial middleware like this should go. Is it big enough to be worth its own crate? If not, should it be in the main |
@Nemo157 Nice! I was also just discussing this with @yoshuawuyts and I'm wondering if there're common cases that cause non-unwindable panic outside of alloc and interop situations. If this is sufficient, there's probably no need to think about future executors or resurrecting worker threads for now. Another question: If it isn't desired for this to be caught, should we still catch them and propagate them to tide somehow as an error and initiate a (partially clean?) shutdown instead of just crashing? (This could be a another middleware in the same crate or a config, may be?) Thinking out loud here, on the
That said, I personally think, error handling could be in it's own logical category and hence go into a crate. |
As far as I'm aware running out of memory and most panic-inside-FFI situations abort the process, so there's no way that Tide/the executor could recover.
I have another branch adding a middleware for something like that along with examples for both this middleware and the one it adds: Nemo157/tide@catch-unwind...propagate-unwind (I haven't included it in this PR yet because I'm not certain whether it should
How does |
I'm wondering if there are any other common cases which might abort the process directly beyond these two. However, alloc can also be recovered by this: rust-lang/rust#43596 Currently, this works: #![feature(alloc_error_hook)]
use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;
fn handle_alloc_error(layout: Layout) {
panic!("alloc: failed to allocate {} bytes", layout.size());
}
fn main() {
set_alloc_error_hook(handle_alloc_error);
let result = std::panic::catch_unwind(|| {
let v = Vec::<u8>::with_capacity(100000000000000);
println!("{:p}", &v[..]);
});
println!("{:?}", result);
} But interestingly, looks like drops don't work properly with release optimisations: #![feature(alloc_error_hook)]
use std::alloc::set_alloc_error_hook;
use std::alloc::Layout;
fn handle_alloc_error(layout: Layout) {
panic!("alloc: failed to allocate {} bytes", layout.size());
}
fn main() {
set_alloc_error_hook(handle_alloc_error);
let result = std::panic::catch_unwind(|| {
let mut r = Resource { ptr: Box::into_raw(Box::new(0)) };
let mut v = Vec::<usize>::with_capacity(100000000000000);
println!("{:p}", &v[..]);
println!("{:p}", r.ptr);
r.ptr = v.as_mut_ptr();
println!("{:p}", r.ptr);
});
println!("{:?}", result);
}
struct Resource { ptr: *mut usize }
impl Drop for Resource {
fn drop(&mut self) {
println!("dropping resource: {:p}", self.ptr);
}
}
And |
src/catch_unwind.rs
Outdated
std::panic::AssertUnwindSafe(next.run(cx)) | ||
.catch_unwind() | ||
.unwrap_or_else(move |err| (self.f)(err)) | ||
.boxed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only returns 500 for a single request, right? My initial understanding was that all requests would be terminated with 500, and then the panic
would continue. But I guess the intended behavior here is different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to terminate all requests? panic on the request path doesn't need to prevent other working code paths, as long as it's safely unwindable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the “I think my application will continue working if a single request panics” middleware. See the link in my other comment to the “fail-fast” version I haven’t finished yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking with the other Runtime maintainers yesterday, and if a future breaks inside of runtime somewhere (e.g. runtime::spawn
), it'll default to terminating its thread, and we have no good way to restore that thread. There might not even be one tbh. I believe Tokio adopted a similar model recently too (on request of the Deno project).
If Tide starts catching panics rather than propagating them, we will start losing threads without having a way to spin them back up; which will severely limit throughput without failing the process entirely. Which seems like not a great situation to be in. I feel this would be particularly hard to debug when not intimately familiar with all moving parts up and down the stack (and even then it's not easy).
I guess what I'm saying is that this approach is making me feel quite nervous, and perhaps we should start with Nemo157/tide@catch-unwind...propagate-unwind first?
Note from chat: make documentation more explicit that using this can lead to other issues and that you really need to understand how unwinding could affect the rest of your code, and investigate whether it should require a bound like |
I've moved this to a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks heaps!
Motivation and Context
cc #263, @prasannavl, @yoshuawuyts
Types of changes
Checklist: