Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Handling #2

Merged
merged 23 commits into from
Mar 16, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
393 changes: 393 additions & 0 deletions proposals/0001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,393 @@
* Title: *Error Handling in Rust*
* Champion: *@FintanH, fintan.halpenny@gmail.com*
* References:
* [Yoshua Wuyats error handling survey][survey]
* [failure][failure]
* [snafu][snafu]
* [anyhow][anyhow]
* [thiserror][thiserror]

**Note**: The links to the packages are to versions that are at the latest as of 31/1/2019.
xla marked this conversation as resolved.
Show resolved Hide resolved

# Introduction

As good Rust developers we persevere to utilise and provide good error
handling. This is a nuanced statement as it may cover library errors,
command-line errors, structured recoverable errors, unstructured recoverable
errors, and unrecoverable errors. This proposal will examine and contrast the
different methods and libraries that exist, ultimately deciding as to which
methods and library(ies) our Rust developers should use when writing our
libraries and applications.

For a quick recap as to what's available to us out of the box see [The std](#the-std). In [Types of
Errors](#types-of-errors) we discuss when we should use `Option`, `Result`, and what kind of errors
we should use. In [Packages](#packages) we discuss potential libraries to use to aid error handling
boiler-plate and help structure our packages and applications.

# Explanation

## The std

To ensure that we are all on the same page, it is worth covering what we mean by error handling and
what exists for us when working with the `std` library out of the box. When we talk about error
handling we are referring to the possibility of failures in a function that can be handled by us,
the author of the code. Are we writing a function that will fail if a `Vec` is empty? Then we
return a value that represents whether the function was successful if the `Vec` contained elements
and failure if not.

When we say "return a value", what do we mean? The two types that encapsulate a function failing are
`Option` and `Result`. Without going through too much detail, `Option` will convey that a function
xla marked this conversation as resolved.
Show resolved Hide resolved
failed and we do not provide any information as to why it failed. On the other hand, `Result`
carries information, the `E` in `Result<T, E>`. We will discuss what `E` should be in [Types of
Errors](#types-of-errors).

## Types of Errors

With the above in mind we begin to discuss the more opionated nuances of this proposal. Here we will
discuss when we should use `Option` and when we should use `Result`, as well as what should be used
in the place of the `E` type parameter in `Result`.

The guiding principle of using `Option` over `Result` should be the thought of conveying information
to the caller of the function. When the error case can be inferred then it is safe to use `Option`.
This generally means, that there is a single mode of failure for the function or a limited scope of
success and all other cases we expect to fail.

For example:

```rust
/// We know that the only way to fail this operation is
/// when `m` is `0`.
fn divide(n: f64, m: f64) -> Option<f64>
```

```rust
/// Get the `Foo` contained in `MyEnum` and fail otherwise.
fn get_foo(my_enum: MyEnum) -> Option<Foo> {
match my_enum {
MyEnum::Foo(foo) => Some(foo),
_ => None,
}
}
```

If the error needs more context, then the caller can use
[`ok_or`](https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or) and pass in more
information.

To follow, when we want to convey information about what went wrong in our function we should
use `Result`. For example, say we cared about what case `MyEnum` was in `get_foo` when it was not
`Foo`, we might then return an error that contained the case.

```rust
enum MyError {
What(MyEnum),
}

/// Get the `Foo` contained in `MyEnum` and fail with `MyError`.
fn get_foo(my_enum: MyEnum) -> Result<Foo, MyError> {
match my_enum {
MyEnum::Foo(foo) => Ok(foo),
e => Err(MyError::What(e)),
}
}
```

The next thing we need to decide on is what should we use for `E`?

### `E` is for Error

In Rust we essentially have three things we would want to return in a `Result`. We can return what
is colloquially called a structured error (an author defined `enum` or `struct`), a `String`, or a
`dyn trait` such as [`Error`](https://doc.rust-lang.org/std/error/trait.Error.html).

The thing we need to keep in mind is that in a function our error types for `Result` must be
compatible, which is either achieved by using the same type for `E`, or being able to convert to the
target type (e.g. using `?` and `into` under-the-hood).

Let's consider what happens when using the three categories and what pros & cons we run into.

#### Structured Errors

Structured errors can convey a lot of information, naturally documenting particular cases where
failure might occur. They can also hold information within each case so that the user of the type
can access this information. As well as this, it allows the user of the type to match (if made
public) on the error and handle it as they like.

For example, let us imagine we are writing an error type for things that may go wrong when reading a
file:

```rust
enum FileError {
/// The file being accessed does not exist
DoesNotExist(PathBuf),
/// The file being accessed cannot be accessed by the user
NoAccess(PathBuf, User),
}
```

Here we can report that the file we tried to access does not exist, or we do not have access. We
pack the information on the given file path for both cases and the user for the access case. This
way we can convey as much information as possible in an application so our users can make informed
decisions as to how to remedy the error (e.g. pass in the correct file name).

Where structured errors become less manageable is what we will dub "structural hell" here. If we
structure our errors we will necessarily end up with nested `enum`s. This is since we
would like to localise the error cases that can happen in one component. Then as we move our
component hierarchy, combining sub-components we end up combining the error cases. We can
demonstrate with a two-level hierarchy, but it's not hard to imagine how this can grow.

```rust
// Defined in file.rs
enum FileError {
DoesNotExist(PathBuf),
NoAccess(PathBuf, User),
}

// Defined in user.rs
enum UserError {
UserNotFound(UserId),
SessionExpired(User),
AuthenticationError,
}

// Defined in lib.rs
enum AppError {
File(FileError),
User(UserError)
}
```

We can note that there is an alternative to flattening out the errors into on giant error type. This
means that we lose granularity when returning errors because we mix user errors with file errors.
Can a file error occur when trying to find a user and vice versa?

#### Strings

`String`s are an easy option to use. They can be created on the fly and if you have a `Result<T,
String>` you can chain errors further by returning another `Err("Something went wrong")`. `String`
can almost be seen as an "unstructured error" since we can place whatever we like between
the double-quotes. We can reverse the pros and cons in contrast to structured errors. We avoid
"structural hell" but we can't inspect any of the information (unless you parse, but please don't)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you parse, but please don't

this!

and we cannot gauge the category of the error from the type information.

#### `dyn Error` (or something like `Error`)

In this case we are returning anything that has an implementation of a trait, here we will discuss
in terms of `Error`. This is halfway between structured errors and `String`s. We are imposed by the
structure of the trait methods, i.e. this type must implement these methods, but we can also unify
different errors into one "type" of thing, `Error`, so no nested structure. This means that we
cannot match on a structure and handle errors in our way. Instead we can only call the methods
defined in the trait.

These trade-offs must be weighed carefully in the context of what kind of package we are working
with: a library or an application. The heuristic recommendations will be covered in
[Recommendation](#recommendation).

## Packages

In this section we view the three packages[0][failure],[1][snafu],[2][anyhow] that are currently the
most mature in terms of features. The idea is to give us an overview of the capabilities of each
package so that we can inform our decision in [Recommendation](#recommendation).

The table is a distillation of [Yoshua Wuyats error handling survey][survey] in terms of what
features each package provides.

**Legend**:
* ✔️: Is supported by the library
* ❌: Is not supported by the library

| Library | assertion string | assertion structured | causes | backtraces | errors from strings | early returns | context | custom errors |
| :--------------: | :--------------: | :------------------: | :----: | :--------: | :-----------------: | :-----------: | :-----: | :-----------: |
| `failure` | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| `snafu` | ❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ |
| `anyhow` | ✔️ | ✔️ | ✔️ | ✔️* | ✔️ | ✔️ | ✔️ | ✔️\*\* |

\* Enabled via [backtrace](https://doc.rust-lang.org/std/backtrace/index.html) being available on nightly
\*\* Enabled via [thiserror][thiserror]

### Features

**assertion**: allows the author to make an assertion and return an error if it fails. The
**string** variant means that the author can only use string values as the error, whereas the
**structured** means that the author can only use a structured error. In the case of `anyhow` the
author is allowed to use either.

**causes**: allows the author to iterate over the sources of errors.

**backtraces**: allows the author to get a back-trace of the failure calls, provided they turn on
`RUST_BACKTRACE`.

**errors from strings**: allows the author to throw an anonymous error from a string value.

**early returns**: allows the author to return from a function early via a macro call.

**context**: allows the author to add contextual information to a function that can cause an error.

**custom errors**: allows the author to derive the library's custom error type along with special
`Display` attributions.

`snafu` seems to be the most opinionated in terms of types of errors that should be returned. It only
allows structured errors for a lot of its functionality. It would seem that it makes the most sense
for library packages but not applications.

`failure` and `anyhow` are very similar in their functionality if we account for `thiserror` and
experimental features augmenting `anyhow`. However, `anyhow` does allow for structured and stringly
typed errors in its assertion functions.

It is also noteworthy that a library could get enough functionality from `thiserror` while
applications can build on top of this with another library such as `anyhow` or `failure` to get the
full functionality of error handling.

# Recommendation

In this section, I will recommend what types should be used when encountering error handling in their
Rust code. Following this, I will recommend which packages to use in our libraries and applications.

In [Types of Errors](#types-of-errors) and [`E` is for Error](#e-is-for-error), we outlined
heuristics that we can follow when encountering error handling situations. To reiterate the
following rules should be followed when weighing up whether to use `Option` or `Result`:

1. Use `Option` when the failure case is obvious to the caller. This generally means that there is one
code path that can't handle the input, or we know the one code path that should be executed. See the examples above to understand these two cases.
2. Use `Result` when there is more than one failure case in the code path or this function is
expected to be composed with more `Result` returning functions. Useful information is expected to
be held in the `Err` variant of the `Result`.

Going forward the following recommendations for structuring errors are proposed.
An `enum` or a `struct` will be created to represent the custom error type. In
the case of creating a library, these can be left public (`pub`) if they are
fine to expose. In the case of opaque errors for which the library author
wishes to prevent the end-user from doing any control flow, a combination of a
`pub struct` and hidden fields is recommended. I believe this gives maximal
control to the library while also preventing the end-user from inspecting the
error. As well as this, we will use
[`non_exhaustive`](https://doc.rust-lang.org/1.35.0/unstable-book/language-features/non-exhaustive.html)
to tell library consumers that these error cases may expand in the future and
our changes to these types will not break their implementations. To
demonstrate all of this lets use our previous example of a file error but also
introduce a deserialization error.

```rust
pub struct DeserializationError(SomeDeserializationError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok. I think I would in practice have an internal error enum, and unify that at a higher level in a variant ContentsOfSausage(Box<dyn Error>). Because, with this solution you would end up with a lot of standalone structs only for the purpose of hiding the underlying error, or no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the struct can hide a private enum of opaque errors. So whatever is in the opaque struct is up to you. Whether it's dyn Error or something else, all it needs is the Error derivation.

My point for having this opaque struct method is to still allow the library to handle the errors if it thinks it can, otherwise it'll eventually become a String.

Would you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, but I suspect that I’d be too lazy to exercise this diligently XD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, we can always amend ;)


#[non_exhaustive]
pub enum FileError {
/// The file being accessed does not exist
DoesNotExist(PathBuf),
/// The file being accessed cannot be accessed by the user
NoAccess(PathBuf, User),
// The file was opened but could not be deserialized
Deserialization(DeserializationError),
}
```

It is recommended that we avoid a plain `String` where possible because this does not convey
information to others, even fellow library authors.

In the case of applications the recommendation is that control-flow matching is used where it makes
sense, and so it is based on each case. Otherwise the use of `dyn Error` to propagate up to the
end-user and reporting the exception is recommended.

Regarding which libraries to use, due to the information in [`Packages`](#packages), it is
recommended that we use a `thiserror` for our libraries, while applications can use the combination
of `thiserror` and `anyhow`. Libraries should use `thiserror` on their structured errors to reduce
boiler-plate and aid the consumers of the library. Applications can then use `anyhow` in
conjunction with `thiserror` to access the macro abilities for bailing early in functions and
assertion based errors. These can be used the `anyhow::Result` type which serves as the `dyn Error`
way of error propagation.

To bring this all together, here is an example of how a library may use `thiserror`, continuing with
the file error example:

```rust
use thiserror::Error;

/// An opaque deserialization error.
/// This payload is hidden from the library consumer but still accessible
/// to the library itself.
#[derive(Error, Debug)]
#[error("A deserialization error occurred: {0}")]
pub struct DeserializationError(pub(crate) SomeDeserializationError);

#[derive(Error, Debug)]
pub enum FileError {
/// The file being accessed does not exist
#[error("The path {0} does not exist")]
DoesNotExist(PathBuf),
/// The file being accessed cannot be accessed by the user
#[error("The user {0} does not have access to {1}")]
NoAccess(PathBuf, User),
/// The file was opened but could not be deserialized
#[error("A deserialization error occurred: {0}")]
Deserialization(DeserializationError),
}
```

While an application would look like the following:

```rust
use file_error::FileError;
use thiserror::Error;

#[derive(Error, Debug)]
enum UserError {
#[error("The user {0} was not found")]
NotFound(UserId),
#[error("The session expired for user: {0}")]
SessionExpired(User),
#[error("You are not authenticated for this action. Please check with an admin.")]
AuthenticationError,
}

#[derive(Error, Debug)]
enum AppError {
#[error("A file error occurred: {0}")]
File(FileError),
#[error("A user error occurred: {0}")]
User(UserError)
}

fn find_user(db: &Db, user_id: &UserId) -> Result<User> {
let user = db.lookup(user_id).ok_or(UserError::NotFound(user_id))?;
Ok(user)
}

fn is_authenticated(user: &User, permissions: &Permissions) -> Result<()> {
ensure!(user.permissions.match(permissions), UserError::AuthenticationError)
Ok(())
}
```

There are other niceties that `anyhow` provides, so it is recommended that the documentation of the
library is checked out since they will not be fully covered here.

# Drawbacks

Some immediate drawbacks can be seen:

1. Some of us have already used `failure` so this would need refactoring work.
2. Limitations of the libraries are only discovered when getting into the weeds a lot of the time.
3. `anyhow` relies on nightly experimentation features for `backtraces` which does not guarantee it
will reach a stable version.

# Open Questions

N/A

# Future possibilities

The landscape of error handling seems to be a tumultuous one. There are many explorations made in
the space with opinionated sets of features. However, most of the libraries seem to be converging
on a useful set of features in general and it is only small differences that set them apart. We may
want to revisit this recommendation in the future as packages come out so that we can survey whether
it is ever worth changing our decision here.

# Status

_Pending_

[survey]: https://blog.yoshuawuyts.com/error-handling-survey/
[failure]: https://docs.rs/failure/0.1.6/failure/
[snafu]: https://docs.rs/snafu/0.6.2/snafu/
[anyhow]: https://docs.rs/anyhow/1.0.26/anyhow/
[thiserror]: https://docs.rs/thiserror/1.0.10/thiserror/