Panic within a handler should return an HTTP 500 #1501

sazzer · 2020-05-11T12:19:22Z

(Apologies if this is already covered anywhere. I did look and couldn't find anything, but I may very well have missed it!)

Expected Behavior

If anything goes catastrophically wrong within a handler, it would be good to return an HTTP 500 response to the client.

Even better would be the option to customize what the error response is.

Current Behavior

Currently, the server just hangs up the network connection.
It does not seem to break the threading - I can make many more panicking requests to the server than it is running threads and they all connect just fine. I just don't get a decent error.

Steps to Reproduce (for bugs)

Write a handler
Call - e.g. unwrap() on an Err value from within the handler

Context

There are certain classes of errors that are catastrophic in nature. For example, the database not being present. Adding deliberate error propagation for these scenarios only serves to bloat the codebase - all of a sudden I've got to propagate the errors all the way from the bottom layer up to the handlers in order to return an error.

Alternatively, I can just not care about these. Just call pool.get().await.unwrap() to get a connection from the connection pool, for example. If I get a connection then that's great. If I don't then I have a catastrophic failure on my hands anyway, so panicking isn't unreasonable.

I could wrap every single handler in panic::catch_unwind() and handle it myself, but this is then a lot of duplication of effort, and has the need to return appropriate catastrophic errors from every handler. (I'm also not sure how to use catch_unwind in an async context, but that's my problem :) )

Alternatively, if Actix-web can do this then it only needs to be done once, and can have guaranteed consistent responses no matter which handler had the problem.

Obviously, this should only apply to actual catastrophic failures. Business errors, validation errors, etc are the domain of the application and not of Actix-web and should remain as such.

Your Environment

Rust Version (I.e, output of rustc -V): rustc 1.45.0-nightly (a08c47310 2020-05-07)
Actix Web Version:

-> % cargo tree | grep actix
├── actix-cors v0.2.0
├── actix-http v1.0.1 (*)
├── actix-rt v1.1.1 (*)
├── actix-web v2.0.0 (*)

The text was updated successfully, but these errors were encountered:

robjtede · 2020-05-11T14:04:31Z

I would argue that terminating a connection is a suitable response to a handler panic. You should endevor to employ proper error handling at all levels. Things that return a Result do so because their errors are recoverable and should be either handled or bubbled.

Using pool.get().await is certainly common in handler code. Taking advantage of the ? operator is the best and cleanest way to handle this pool.get().await? where your handler's Error type impls From<PoolError> and can be converted to an error response (eg impl actix_web::ResponseError). There a plethora of error helper crates to make this pattern clean and easy to implement but using a plain global error enum with manual impls is also reasonably simple.

On catch_unwind, trying to employ a system that relies on catch_unwind has all sorts of gotchas including considerations of things that are or are not unwind-safe; it is not equivalent to try-catch in other languages that are built to handle clean up.

sazzer · 2020-05-11T14:25:53Z

I appreciate that catch_unwind is not a trivial thing to go for, and that it's not an equivalent to try/catch in other languages.

However, I'm also not suggesting that this is something to do lightly. You're absolutely right that returning a Result and using the ? operator is correct behaviour most of the time. But I'm not convinced it's correct behaviour all of the time.

Specifically I am talking about things that really are catastrophic failures. Things like talking to a database that just isn't there. Not reading a database record that is not found, but the entire database being inaccessible.

And yes, this can be done using Result and propagating an error value. That works, but it gets very tedious and repetitive when you have a multi-layered architecture. For example, in my app right now I'm following the Clean Architecture pattern (Which works remarkably well in Rust), which means I have:

Handler function lookup_username() -> impl Responder
Calls UserService.lookup_username() -> bool
Calls UserRepository.find_user_by_username() -> Option<UserModel>
Calls Database.checkout() -> PooledConnection (Where Database is a wrapper around bb8)
Calls Pool.get() -> Result<PooledConnection, Error>

Right now, this works great. And the only way this can at all fail is in a catastrophic sense. If the database is missing, or the schema is corrupt. Anything else will cause a graceful result to be returned and all is good. (This design was based on reading around in blog posts, the Rust Book and The Rustonomicon.)

Now, I could return a Result here. That would mean that every step of the way I've got to have a Result type, with an appropriate Error type for that level - bearing in mind that other calls through could have business errors involved as well, so you'd now be mixing these catastrophic failures with normal business errors - and then have to handle it in every single handler function to correctly return an HTTP 500 indicating that something was seriously wrong.

As such, it seemed reasonable to have this level of handling be something that was more common and shared so that everyone can benefit automatically. :)

Since these are the level of errors that means that everything is broken anyway, I'm not totally invested in whether it is decided to go ahead with this or not. If you still deem it better to leave it alone then you are significantly more experienced than I am and I bow to that wisdom :)

Cheers

Lesiuk · 2020-05-11T15:22:51Z

I am writing my code without using any panics but some external libraries are using panics in some cases so returning 500 if it happens should be better idea than just closing connection. I will research this possibility (https://docs.rs/futures/0.3.5/futures/future/trait.FutureExt.html#method.catch_unwind) this weekend and maybe create my first PR.

robjtede · 2020-05-11T15:50:00Z

Anyone from @actix/contributors got experiences or opinions on handler panicking?

Dowwie · 2020-05-11T15:52:37Z

I'm sorry but I don't suppose this proposal. Rust offers first-class error handling to manage this situation. Impress upon the authors of 3rd party libraries to manage errors rather than panic.

robjtede · 2020-05-11T15:57:07Z

@sazzer if you want the feeling of throwing from functions this crate gets you pretty close to the syntax: https://docs.rs/fehler/1.0.0/fehler/attr.throws.htm

fafhrd91 · 2020-05-11T16:02:22Z

catch unwind is not possible with actix, period.

sazzer · 2020-05-11T16:06:21Z

@robjtede It's not that I want exceptions or throws. I actually like the Result<> handling that Rust gives us (Despite coming from a predominantly overwhelming Java background!)

What I wasn't keen on was the need to do that for error conditions that are truly catastrophic, and that wouldn't need to have a Result<> return for any reason other than this catastrophic case.

For example, my case above with "lookup_username". Assuming that the database is present, the network is working and the schema is valid, this can not fail. Putting in multiple levels of Result<> to handle a case that should never happen just feels like busywork for no real benefit. That's why I've got it panicking in that case - if the database isn't there then there really is nothing that can be done within the code to deal with that. (As opposed to, for example, trying to persist a record with a duplicate username which really should return a Result<> since that's an error that is expected and can be handled)

However, based on the comment from @fafhrd91 that turned up as I was typing this, it's all a non-starter anyway so I'm going to shut up now :)

Cheers

mamcx · 2020-05-12T17:10:35Z

Exist another very simple reason to support this.

Is the expected behavior. Specially on development, where the browser IS the main interface.

In any other web framework I have used (in .net, python, delphi,...) is how things are done. To the point that when this happened to me, I was wonder why?? Is something I haven't done?.

Exist many quality-of-life stuff that come for "free" in things like https://flask.palletsprojects.com/ that in actix and other rust frameworks are "missing".

I'm totally 100% in agreement that this must be done with correctness in mind. This is what I have bet on rust for rewrite a very large .net web backend, and the payoff is very good. But also I think that provide this small touches (maybe as additions or as part of "extras" crate or something like) will make rust even more attractive to the crow of python/.net/ruby...

fafhrd91 · 2020-05-12T17:19:56Z

This is not matter of new feature. This is just not possible

Lesiuk · 2020-05-12T17:27:51Z

@fafhrd91 Could you explain us why catch_unwind don't work for actix so we all can learn something? Just curious.

fafhrd91 · 2020-05-12T17:29:37Z

Did you check definition?

onelson · 2020-05-12T17:34:58Z

I was about to write a thoughtful response to this, but #1501 (comment) pretty much sums up my point of view.

Signaling failure with Result is a win-win and is well supported. 90% of my error handling code comes in the form of an impl actix_web::ResponseError. At that point, my handler code uses ? to focus on the "happy path" and the rest is automatic.

Using things like catch_unwind in my mind is a last resort, probably most useful in stuff like test runners or what have you.

robjtede · 2020-05-12T18:32:40Z

From std::panic::catch_unwind docs:

It is not recommended to use this function for a general try/catch mechanism.

From futures::future::FutureExt::catch_unwind docs:

It's not recommended to use this for error handling.

From the Rust book ch 9.3:

When code panics, there’s no way to recover.

From the edition guide:

Panics thus abort execution up to some "isolation boundary", with code on the other side of the boundary still able to run, and perhaps to "recover" from the panic in some very coarse-grained way.

Fundamentally, panicking is Rust's solution to prevent memory errors/corruption and generally prevent invalid states in situations where recovery is not possible. Such cases are exceeding rare in HTTP request handling, therefore Result should be used where possible.

The "isolation boundary" should be well defined and not let states become invalid. In the case of actix-web, since workers are naturally isolated by their arbiter/thread, it is sensible for that boundary to be the thread and nothing more granular.

fafhrd91 · 2020-05-12T18:36:48Z

It is simpler, Rc and Arc are not catch unwind

mamcx · 2020-05-12T18:50:24Z

Such cases are exceeding rare in HTTP request handling, therefore Result should be used where possible.

Rare? I wish. Loss database connections, the database schema change, the upstream server/apis break (my world, all the time), etc is not rare at ALL. It happens in production and is handled corrected by ANY decent web server or router or whatever in the stack.

As I say, this lack of feature is VERY surprising to me and I think anyone that come from any other web framework below the sun. Where else this is like this?

I get if this in rust is harder. So what else can be done? Is possible to wrap the code, fork it then handle the crash in the fork or something like that? Actix MUST be put behind another web server like nginx for production? If is the case, put the idea in the docs is important (not assume everyone put nginx or similar in front of their APIs).

If rust is mean to be better at handle problems, then we can't just say "let it crash" and not respond with the proper error 500 just because this. As I say, if a workaround must be implemented, is ok. But "blind" crash is not good.

fafhrd91 · 2020-05-12T18:55:37Z

Then you need to fix dB connector or api access code. Panic is not exception, you can have a lot of bad consequences after panic. For example leaked memory.

It is better to use “panic = abort” in your cargo.toml

onelson · 2020-05-12T19:01:15Z

Yeah, exactly. For all intents and purposes, panics just shouldn't happen in production systems, least of all due to an unwrap() in your own code.

If you're in a situation where some code you're calling panics without giving you the opportunity to handle the error (as a Result), I'd look to report it as a bug in whatever that library is.

mamcx · 2020-05-12T19:03:29Z

@fafhrd91,@onelson

Sure, but when the problem happen, the user get a "connection reset", so the tech support blame it on the internet or something (totally believable on my case, my users roam the country) and not report it to me, instead of a "error 500: Internal server error" where now I sure the problem is in the server.

Again, this behavior is not what is expected. So look like I need to workaround to put things behind nginx (that look like a non issue for regular websites but are another complexity for deploy intranet things).

fafhrd91 · 2020-05-12T19:06:43Z

If you put rust code with panics into production then your have to use nginx, no options.

mamcx · 2020-05-12T19:07:27Z

@fafhrd91

That is fair, is something good to know. Thanks for the help.

robjtede · 2020-05-12T19:14:15Z

where recovery is not possible

These are the exceedingly rare cases referred to in the previous sentence. Recoverable errors are obviously commonplace.

Loss database connections, the database schema change, the upstream server/apis break...

These are all recoverable.

onelson · 2020-05-12T19:57:16Z

Sure, but when the problem happen, the user get a "connection reset"

This would not happen if your handler returned a Result where the Err is converted to a 500 status response via impl actix_web::ResponseError. They'd get a 500 response, and you'd have one less chance of leaking memory. You could even retry your database connection a couple times if you wanted.

ssokolow · 2020-12-31T04:06:19Z

I still think it's better UX for something like this to return an Error 500 to the user if the invariant is broken:

        thumb_path.push(format!("{:x}.png", md5::compute(
            Url::from_file_path(path).expect("path should be absolute by this point").as_str())));

(It's MD5 because it's from an in-development implementation of the XDG Thumbnail Managing Standard for a miniserve-like tool for quickly sharing image sets with friends... a tool where the whole point is that you don't need to set up something like nginx.)

phiresky · 2023-07-05T12:49:13Z

~~I want to mention that the (at least with tokio::main), actix already seems to catch panics and continue on uninterrupted:~~ (I guess this is what the OP is talking about with "threading continues uninterrupted")

actix seems barely affected by panics:

...
thread 'actix-server worker 0' panicked at 'called `Option::unwrap()` on a `None` value', /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/markdown-it-0.5.0/src/plugins/extra/smartquotes.rs:518:51
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
... (it just goes on here)

So far so good, but the issue with this is that if actix is behind nginx, the current behaviour of actix causes nginx to log these errors:

2023/07/05 11:46:30 [error] 2158443#2158443: *721527 upstream prematurely closed connection while reading response header from upstream, client: 47.149.14.66, server: lemmy.world, request: "GET /c/xxx HTTP/1.1", upstream: "http://0.0.0.0:8536/c/xxx", host: "lemmy.world"

which (by default) causes nginx to mark the upstream down for ten seconds, the same as it would if the upstream was actually down. This causes nginx to not serve requests to the server anymore even though it is still up. This causes a DOS because one panicking request handler effectively takes the server down for 10s . Sadly this behaviour isn't even cleanly configurable in nginx.

So I'm not sure about the exact implications, but it would be great if actix would not instantly close the connection if one of my dependencies does something dumb. I don't want a configurable error message, just a 500 response.

robjtede · 2023-07-05T13:05:02Z

Reluctantly, I created a CatchPanic middleware some time ago. It is documented with big warnings about usage and a link to this thread to understand why it's not recommended for use without careful consideration.

If you're interested in reporting panics and not catching them then the alternative PanicReporter middleware is a solution.

As for the built-in behavior of Actix Web, our view about making this behavior default is extremely unlikely to change unless the project maintainership changes significantly.

robjtede mentioned this issue May 12, 2020

How to catch and redirect to 500 error in a crash on actix 2? #1504

Closed

robjtede closed this as completed May 12, 2020

davidpdrsn mentioned this issue Jan 31, 2022

Add CatchPanic tower-rs/tower-http#214

Merged

2 tasks

actix locked as too heated and limited conversation to collaborators Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic within a handler should return an HTTP 500 #1501

Panic within a handler should return an HTTP 500 #1501

sazzer commented May 11, 2020

robjtede commented May 11, 2020 •

edited

Loading

sazzer commented May 11, 2020

Lesiuk commented May 11, 2020

robjtede commented May 11, 2020

Dowwie commented May 11, 2020

robjtede commented May 11, 2020

fafhrd91 commented May 11, 2020

sazzer commented May 11, 2020

mamcx commented May 12, 2020

fafhrd91 commented May 12, 2020

Lesiuk commented May 12, 2020 •

edited

Loading

fafhrd91 commented May 12, 2020

onelson commented May 12, 2020

robjtede commented May 12, 2020

fafhrd91 commented May 12, 2020

mamcx commented May 12, 2020 •

edited

Loading

fafhrd91 commented May 12, 2020

onelson commented May 12, 2020

mamcx commented May 12, 2020 •

edited

Loading

fafhrd91 commented May 12, 2020

mamcx commented May 12, 2020

robjtede commented May 12, 2020

onelson commented May 12, 2020

ssokolow commented Dec 31, 2020 •

edited

Loading

phiresky commented Jul 5, 2023 •

edited

Loading

robjtede commented Jul 5, 2023 •

edited

Loading

Panic within a handler should return an HTTP 500 #1501

Panic within a handler should return an HTTP 500 #1501

Comments

sazzer commented May 11, 2020

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

robjtede commented May 11, 2020 • edited Loading

sazzer commented May 11, 2020

Lesiuk commented May 11, 2020

robjtede commented May 11, 2020

Dowwie commented May 11, 2020

robjtede commented May 11, 2020

fafhrd91 commented May 11, 2020

sazzer commented May 11, 2020

mamcx commented May 12, 2020

fafhrd91 commented May 12, 2020

Lesiuk commented May 12, 2020 • edited Loading

fafhrd91 commented May 12, 2020

onelson commented May 12, 2020

robjtede commented May 12, 2020

fafhrd91 commented May 12, 2020

mamcx commented May 12, 2020 • edited Loading

fafhrd91 commented May 12, 2020

onelson commented May 12, 2020

mamcx commented May 12, 2020 • edited Loading

fafhrd91 commented May 12, 2020

mamcx commented May 12, 2020

robjtede commented May 12, 2020

onelson commented May 12, 2020

ssokolow commented Dec 31, 2020 • edited Loading

phiresky commented Jul 5, 2023 • edited Loading

robjtede commented Jul 5, 2023 • edited Loading

robjtede commented May 11, 2020 •

edited

Loading

Lesiuk commented May 12, 2020 •

edited

Loading

mamcx commented May 12, 2020 •

edited

Loading

mamcx commented May 12, 2020 •

edited

Loading

ssokolow commented Dec 31, 2020 •

edited

Loading

phiresky commented Jul 5, 2023 •

edited

Loading

robjtede commented Jul 5, 2023 •

edited

Loading