Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Revamp Error types #229

Closed
therealprof opened this issue Jun 21, 2020 · 123 comments · Fixed by #296
Closed

[RFC] Revamp Error types #229

therealprof opened this issue Jun 21, 2020 · 123 comments · Fixed by #296

Comments

@therealprof
Copy link
Contributor

As demonstrated by #181 and other issues there's a real world usability issue with the Error type of the Result of operations being an associated type. An associated type means that a driver or user of an HAL impl is required to adapt to the custom Error type; while possible for applications this makes universal drivers impossible.

In my estimation the main reason for this was probably caused by embedded-hal predating the improvements to Rust type and specifically Error handling. Now we do have better possibilities and with the upcoming big 1.0 breakage it should be possible to also revamp the Error types without too much grief.

Currently our trait definition e.g. for I2C looks as follows:

pub trait Read {
    /// Error type
    type Error;
    ...
    fn try_read(&mut self, address: u8, buffer: &mut [u8]) -> Result<(), Self::Error>;
}

My proposal would be to instead provide a specific Error type such as:

#[non_exhaustive]
pub enum Error {
    /// An unspecific bus error occured
    BusError,
    /// The arbitration was lost, e.g. electrical problems with the clock signal
    ArbitrationLoss,
    /// A bus operation received a NACK, e.g. due to the addressed device not being online
    NoAcknowledge,
    /// The peripheral buffer was overrun
    Overrun,
    /// The peripheral buffer ran out of data
    Underrun,
}

and then redefine all traits as:

pub trait Read {
    fn try_read(&mut self, address: u8, buffer: &mut [u8]) -> Result<(), Error>;
}

The #[non_exhaustive] flag (available from Rust 1.40.0, cf. https://blog.rust-lang.org/2019/12/19/Rust-1.40.0.html) allows us to easily extend the reasons in the feature without being a breaking change because it requires implementations which care about the actual error to also have a wildcard match branch, which is a good idea anyway.

By having the Error types defined in embeded-hal we do some advantages:

  • Drivers can now universally check the error kind reported by a HAL implementation (cf. I²C Polling (detecting NACKs) #181 where a chip might return a NACK due to currently not being able to handle more requests, yet the driver cannot make use of this information and retry later) and act accordingly
  • More uniform and exhaustive documentation of available error conditions next to the API itself
  • Better guidelines for HAL authors which cases might be work distinguishing
  • Improved portability of embedded applications between different hardware

Dowsides:

  • More breakage right now
  • MRSV 1.40
@ryankurte
Copy link
Contributor

ryankurte commented Jun 22, 2020

Yeah #181 is an annoying case, and it would definitely be nice to improve the error handling situation. From quick read / chat I have a few of thoughts:

  • Is this a case for any drivers other than I2C?
  • #[non_exhaustive] looks like a super useful thing, is this an acceptable bump in MSRV? (seems okay to me)
  • How do we communicate underlying errors (for example, a network failure with an I2C-over-ethernet bridge) without erasing useful information?
    • In the standard flat-error case I believe this is usually achieved via boxing inner types
    • If using the e-h error types is optional we create a green/blue driver situation

In my estimation the main reason for this was probably caused by embedded-hal predating the improvements to Rust type and specifically Error handling

I might not be totally up to date on this but as far as I am aware std::error::Error is not exposed in core (rust-lang/rust#33149), and there was intent to move it to alloc which also would seem unsuitable. It'd be great to pick up something like anyhow but even the no_std support for this would appear to require a global allocator, so AFAIK the best options we have are failure ?, ? and enums.

There are a couple of possible alternatives that would solve the i2c problem (though could be used in other similar cases) without erasing types / require global allocation:

A generic error type with NACK

#[non_exhaustive]
pub enum Error<E> {
    /// An unspecific bus error occured
    BusError<E>,
    /// The arbitration was lost, e.g. electrical problems with the clock signal
    ArbitrationLoss,
    /// A bus operation received a NACK, e.g. due to the addressed device not being online
    NoAcknowledge,
    ...
}

pub trait I2cRead {
    /// Error type
    type Error;
    ...
    fn try_read(&mut self, address: u8, buffer: &mut [u8]) -> Result<(), Error<Self::Error>>;
}

This would work now, allows drivers to handle the i2c appropriate errors, and is consistent with nb traits and other things, though is somewhat unwieldy.

An is_nack trait and bound (or more generic I2cError if there are other methods) on the I2C Error Type

pub trait I2cError {
  fn is_nack(&self) -> bool;
  ...
}

pub trait I2cRead {
    /// Error type
    type Error: I2cError;
    ...
    fn try_read(&mut self, address: u8, buffer: &mut [u8]) -> Result<(), Error<Self::Error>>;
}

This requires Generic Associated Types (much like the work on async wrappers), so, isn't yet stable / available. It'd be neat to combine this with `core::error::Error` but, 

I'm not really sure what the best option here would be, I prefer the latter but it's not currently available and a future change to this would be breaking, and whatever we do propose I think we should demonstrate reasonably thoroughly in the same manner as hal traits?

@eldruin
Copy link
Member

eldruin commented Jun 22, 2020

A real error type for I2C sounds like a good improvement.

FWIW, the only error that I have actually needed is NoAcknowledge due to the exact same issue described in #181 (in the eeprom24x driver waiting for a write to finish). I also remember seeing the possibility for I2C polling in some other device but I cannot remember exactly which one it was.
With this I could transform NoAcknowledge into nb::Error::WouldBlock and bubble up anything else, so that would be great.

Regarding the suitability of a similar Error type for other protocols than I2C, I am unsure. I would have said I2C is the only "real" protocol that e-h supports which has errors of its own and not just kind of "hand off some bits, no strings attached" :)
Nevertheless, maybe someone here has more experience on this and can think of some errors for SPI or so.

My biggest concern with the OP proposal is the point about transportation of underlying errors as @ryankurte already mentioned. I think the BusError<E> is a fair mechanism.
An alternative I could also think of would be an additional Other<E> instead:

#[non_exhaustive]
pub enum Error<E> {
    /// An unspecific bus error occurred
    BusError,
    // ...
    /// Something else happened
    Other<E>,
}

This would distinguish itself from something which is wrong with the bus. However, I think this is not a great solution:

  • I would need to see examples of BusError but I would have said they should be quite rare/unclear and I can imagine somebody wanting to add more information to BusError as otherwise this gives zero guidance on an error recovery strategy.
  • Two variants BusError<E1> and Other<E2> seem like an overkill and not worth the additional parameter type.

That is why I think that BusError<E> is a fair solution offering some extensibility.

Of course, (at least at first sight) I like the GAT version from @ryankurte the most due to the transparent extensibility but we have been waiting for GAT for a long time and I think I would rather not do that.

I am also a bit concerned about the MSRV bump but I personally have no need for an older compiler. I would say #[non_exahustive]'s avoidance of future breaking changes is definitely worth it.

@therealprof
Copy link
Contributor Author

therealprof commented Jun 22, 2020

Thanks for your responses. I don't have time to read them all in detail right now but here's one paragraph which stuck out in the notifcation going by:

Regarding the suitability of a similar Error type for other protocols than I2C, I am unsure. I would have said I2C is the only "real" protocol that e-h supports which has errors of its own and not just kind of "hand off some bits, no strings attached" :)
Nevertheless, maybe someone here has more experience on this and can think of some errors for SPI or so.

I can see other applications as well. For instance if you have an application running e.g. over a UART it would be very helpful being able to see buffer overruns so a driver could start over. For GPIO (where we're discussing things like tri-state GPIOs and changing directions) it might be useful to know that a pin is in a wrong mode for some operation.

But even other than in drivers, whenever you're doing real error handling in an application (other than the unfortunately common plain unwrapping) this is currently not at all portable so whenever you want to run an application on a different MCU and properly handle different error situations you'd have to write MCU specific error handling -- which is not great.

@dbrgn
Copy link
Contributor

dbrgn commented Jun 22, 2020

I only found this RFC now, but independently suggested the same solution as @ryankurte / @eldruin:

#181 (comment)

I think it's important that the implementation can provide custom errors if it wants to, but that we can have implementation-independent error variants for things like Nack / ArbitrationLoss / etc so that I²C device drivers can use polling instead of fixed sleep times (when waiting for something to be processed).

This should probably be decided before the 1.0.0 release, right?

@therealprof
Copy link
Contributor Author

I think it's important that the implementation can provide custom errors if it wants to, but that we can have implementation-independent error variants for things like Nack / ArbitrationLoss / etc so that I²C device drivers can use polling instead of fixed sleep times (when waiting for something to be processed).

Do you have an example for a custom error which would be worth the additional complexity of generic parameter? If we say we would like to entertain the idea of custom errors I'd rather see something like a Custom enum type with a static error string but TBH if we have non-exhaustive errors we can extend them any time if the need arises and everyone gets to profit. Otherwise you'd still have the same handling mess as now and people could decide to rather keep their own proprietary error types instead of shared error types.

This should probably be decided before the 1.0.0 release, right?

Yes.

@dbrgn
Copy link
Contributor

dbrgn commented Jun 22, 2020

From looking through the different HALs, error types that exist right now:

  • STM32L0: Overrun, Nack, PECError, BusError, ArbitrationLost
  • STM32L1: OVERRUN, NACK
  • STM32L4: Bus, Arbitration, Nack
  • STM32F0: OVERRUN, NACK, BUS
  • STM32F1: Bus, Arbitration, Acknowledge (Nack), Overrun
  • STM32F3: Bus, Arbitration
  • STM32F4: OVERRUN, NACK
  • STM32F7: Bus, Arbitration, Acknowledge, Overrun, Busy
  • STM32G0: Overrun, Nack, PECError, BusError, ArbitrationLost
  • nRFx (TWIM): TxBufferTooLong, RxBufferTooLong, Transmit, Receive, DMABufferNotInDataMemory
  • LPC8xx: EventTimeout, MasterArbitrationLoss, MasterStartStopError, MonitorOverflow, SclTimeout
  • Linux: Nix(nix::Error), Io(io::Error)
  • TM4C: Bus, Arbitration, DataAck, AdrAck

I don't know enough about the different implementations (and SMBus, which seems to add some other error types) to know what all those variants mean, but it feels a bit wrong to throw away error information for error types that we as embedded-hal developers did not think of, or where platform-specific error handling could be added. Maybe some people with more experience can judge that.

@therealprof
Copy link
Contributor Author

I'm not proposing to throw away error types at all, my proposal is to add all error types (which make sense in some form) into the shared error type. In fact most of the HAL impls only have enum kinds for the errors they're detecting right now, they could provide more of them...

@therealprof
Copy link
Contributor Author

I also should add that unlike for std applications the focus here should be on the ability to distinguish and handle different kinds of errors in a useful manner. Presentation of an error to a user requires a bespoke implementation implementation anyway so investing extra effort to provide as much detail in the errors as possible is actually counterproductive.

@ryankurte
Copy link
Contributor

hmm, some other things I didn't really consider in the last reply:

  • is there a difference between a recoverable-error and a non-recoverable error?
  • is a NACK actually an error, or, normal part of bus interaction?
  • could this normal-failure be represented in a clearer way on the interface? (Result<Option<...>, Self::Error> or I2cResult::Ok|Nack|Whatever|Err as crude examples)
    • are there other errors that are perhaps, not errors, or is this the only case (impacting whether this is viable)?
    • would an API change of this nature resolve both the specific (i2c) and general case problems in a satisfactory way
  • should retry etc. (which is part of the justification for non-NACK errors) be driver or application level (and is that a decision we can/should make here)

I totally agree that we are missing some API detail, but am still not swayed by the exhaustive (yet non-exhaustive) enum approach. Aside from the other complaints it seems like introducing a bunch of application detail into an otherwise abstract crate and I don't know how we would choose to accept or not-accept enum variants.

@ryankurte
Copy link
Contributor

Oh wait, the trait bound is only blocked on GATs where lifetimes are involved (ie. with futures and buffers), so you can pretty trivially implement the second of the options in #229 (comment) on stable.

Demonstrated in embedded-hal and stm32f4-hal, should probably try with a consumer / example but, it's time for bed here.

@therealprof
Copy link
Contributor Author

  • is there a difference between a recoverable-error and a non-recoverable error?

As far as I am concerned it's up to the application to decide what is recoverable and what is not.

is a NACK actually an error, or, normal part of bus interaction?

Both. Originally it was supposed to signal that a device does not exist on the bus but nowadays is sometimes is used to signal that the device is busy.

could this normal-failure be represented in a clearer way on the interface?

You do have a strange interpretation of "clearer"... 😅

should retry etc. (which is part of the justification for non-NACK errors) be driver or application level (and is that a decision we can/should make here)

I think both. E.g. if you have a driver talking over some interface and it is waiting for a certain stream of data to arrive and the HAL signals an overrun or a jammed line, the driver might be able to retry the last communication in order to fix the issue. If you don't have a driver but your application is handling the data directly you similarly might want to be able to handle such situations (which is currently not possible in a generic way).

totally agree that we are missing some API detail, but am still not swayed by the exhaustive (yet non-exhaustive) enum approach. Aside from the other complaints it seems like introducing a bunch of application detail into an otherwise abstract crate

I absolutely disagree. Having a common set of error types for general use is one the achievements of the latest iterations of e.g. libstd. I think it is fundamental for portable and composable software to have a common set of error kinds to rely on and that's also the reason why custom approaches like failure are now deprecated.

and I don't know how we would choose to accept or not-accept enum variants.

Easy: if it makes sense, is not covered by an existing error kind we'll accept it. There should be a very low bar to including new error kinds and since additions are non-breaking I would not worry about that at all.

@therealprof
Copy link
Contributor Author

Demonstrated in embedded-hal and stm32f4-hal, should probably try with a consumer / example but, it's time for bed here.

I don't think that solves any of the issues and adds more convolution to the API.

@therealprof
Copy link
Contributor Author

@ryankurte @eldruin So I was looking into your proposal to have a generic custom error kind. I do see a few problems with that approach:

  • You still have to parametrize each type use with the HAL impl specific Error which funny enough makes the implementation less universal despite being more generic. The current associated type does not have suffer from the problem this approach would introduce
  • It essentially allows a HAL impl to opt out of using the standard types by wrapping it in a Custom or Other type rendering the plan to a good deal toothless; we absolutely want to encourage implementers to use new shared error kinds and extend them within e-h if they're not sufficient (I so think that with a good look at some powerful implementation we can get a very good idea of what should be included in a first throw)
  • I don't think nesting errors makes a lot of sense for embedded applications. std applications are moving away from nesting due to useability issues and crates like anyhow making it really easy to get a nice chain of errors iff error variants are not manually nested and embedded applications are even more simplistic than that due to the requirement that the application actually needs to know about the used peripherals and even initialise them specifically; there's no such thing as a library hidden several layers down magically initialising an peripheral and doing fancy stuff without the application knowing anything about that

I think with the new fallible-first approach it is very important to deliver a decent set of standard error kinds, too, and not leave it to impls to come up with a wild variety of custom errors (or none at all).

@ryankurte
Copy link
Contributor

ryankurte commented Jun 23, 2020

@therealprof i agree with most your points except for the error nesting bit, ime the flat approach only works in std because they nest Box<dyn Error> so you can handle flat cases without erasing useful underlying information. The critiques of the alternatives I proposed are totally valid, however, as noted above none of the modern rust error handling stuff works without alloc, which i do not think an acceptable compromise for us :-(

there's no such thing as a library hidden several layers down magically initialising an peripheral and doing fancy stuff without the application knowing anything about that

I think it's also important to remember that this is not only used in no_std contexts? I think linux-embedded-hal is a pretty good example, especially where async implementations are calling out to libc and mio.

I don't think that solves any of the issues and adds more convolution to the API.

This solves the, there is a specific error that may occur during normal operation of the interface that a driver may wish to handle, problem. Which may be specific to I2C, and in a way that doesn't restrict error types or delete error information, which afaik would resolve #181, and is generalisable to other cases similar to this.

Easy: if it makes sense, is not covered by an existing error kind we'll accept it. There should be a very low bar to including new error kinds and since additions are non-breaking I would not worry about that at all.

The other question about this is how you ensure people use the correct error types, and how you interpret and use those both at the driver and application level? Say we end up with BusFaultA and BusFaultB or something, we loose abstraction in that the driver author now needs to be aware of what an underlying impl might return, and any addition to this enum that should be handled at the driver level requires a driver update, and the application author now needs to know both what the underlying impl can return, and what the driver handles / forwards. Some of this is implicit to having defined errors, some of this is a facet of the flat approach I think.

I think with the new fallible-first approach it is very important to deliver a decent set of standard error kinds, too, and not leave it to impls to come up with a wild variety of custom errors (or none at all).

A single flat non-generic set would simplify things like this which is nice, but at the same time, seems to make it functionally impossible to report meaningful errors (device disconnect, permissions, pin not mounted, etc.) at the application level, which does not seem to me to be tenable.

It would be great to improve the situation, but, I would posit that any improvement should not be worse / less-expressive than the current approach, even if only for a subset of uses :-/

@therealprof
Copy link
Contributor Author

This solves the, there is a specific error that may occur during normal operation of the interface that a driver may wish to handle, problem. Which may be specific to I2C, and in a way that doesn't restrict error types or delete error information, which afaik would resolve #181, and is generalisable to other cases similar to this.

By adding another layer (the Option) you're forcing everyone to care about the possibility of a NACK being returned (which normally is an error) on the happy path in addition to regular error handling.

The other question about this is how you ensure people use the correct error types, and how you interpret and use those both at the driver and application level?

To be totally honest: I fully expect some people continuing to blindly unwrap the Result and panic, while others will continue to to wildcard matching on the Error and just use a best general error handling. What changes is that they now actually have the chance to do better because now they can for the first time have universal error kinds.

I do not expect that applications/drivers will handle each error individually but go "hey, I have a nice way of dealing with this of error (like NACK)" and just use some generic "let's start over" for anything else.

Say we end up with BusFaultA and BusFaultB or something, we loose abstraction in that the driver author now needs to be aware of what an underlying impl might return, and any addition to this enum that should be handled at the driver level requires a driver update, and the application author now needs to know both what the underlying impl can return, and what the driver handles / forwards. Some of this is implicit to having defined errors, some of this is a facet of the flat approach I think.

Yes, we should not allow have duplicate types and the description should be very clear so there's no confusion what's what.

However any problem around the error kinds in HAL impls, drivers and applications will be trivial to fix in a non-breaking way and can easily done by community contributions. Currently in most impls any change around errors is a breaking change.

A single flat non-generic set would simplify things like this which is nice, but at the same time, seems to make it functionally impossible to report meaningful errors (device disconnect, permissions, pin not mounted, etc.) at the application level, which does not seem to me to be tenable.

I am not proposing a single flat of set of errors, I'm proposing a set of errors per trait. I do not quite see how that makes reporting of any of the above impossible, certainly not any less possible than it is right now.

@eldruin
Copy link
Member

eldruin commented Jun 23, 2020

Let's take a look at a real public example I know of:
In the e.ziclean-cube the MCU (stm32f101) is connected via GPIO to an I2C accelerometer (KXCJ9).
In order to interact with the accelerometer, a bitbanging I2C implementation must be used (bitbang-hal).
If the MCU hal returns an error when trying to set or read a pin during an I2C operation, how would you give that information to the application?

I do not know if it would be fine to just start over, but I think if this comes up frequently, it would be useful to the application to be able to know that the MCU pins are acting up, and not the accelerometer, for example, and that is the reason for some failures. This would immediately simplify diagnostics.
I think it would be useful to make something like this somehow possible.

@therealprof
Copy link
Contributor Author

therealprof commented Jun 23, 2020

In order to interact with the accelerometer, a bitbanging I2C implementation must be used (bitbang-hal).
If the MCU hal returns an error when trying to set or read a pin during an I2C operation, how would you give that information to the application?

The bitbang-hal would take the digital::Error and turn it into an appropriate i2c::Error which the application can then handle. I would expect the accelerometer driver (if any) to properly relay the error to the application, in such a case it's totally fine if the accelerometer driver defines its own Error type with one of the enum variants being I2CError(i2c::Error) in order to relay any low level problems to the application.

I do not know if it would be fine to just start over, but I think if this comes up frequently, it would be useful to the application to be able to know that the MCU pins are acting up, and not the accelerometer, for example, and that is the reason for some failures. This would immediately simplify diagnostics.
I think it would be useful to make something like this somehow possible.

Absolutely.

@therealprof
Copy link
Contributor Author

After reading the comments in addition to the trait specific error types which should cover the peripheral specific interaction with a peripheral it might also make sense to have a general error type for peripheral initialisation to handle the specific errors which usually occur during initialisation only as mentioned by @ryankurte.

@eldruin
Copy link
Member

eldruin commented Jun 23, 2020

In order to interact with the accelerometer, a bitbanging I2C implementation must be used (bitbang-hal).
If the MCU hal returns an error when trying to set or read a pin during an I2C operation, how would you give that information to the application?

The bitbang-hal would take the digital::Error and turn it into an appropriate i2c::Error which the application can then handle. I would expect the accelerometer driver (if any) to properly relay the error to the application, in such a case it's totally fine if the accelerometer driver defines its own Error type with one of the enum variants being I2CError(i2c::Error) in order to relay any low level problems to the application.

Maybe there is a misunderstanding here.

The bitbang-hal I2C instance contains the GPIOs. If some kind of digital::Error is returned when setting a pin, the bitbang-hal I2C impl must transform this into some kind of i2c::Error. Let's say simplify and say just i2c::BusError.
The accelerometer driver will see i2c::BusError, wrap it into its own Error::I2C(i2c::BusError) and pass that back to the application.

The point is that it would be interesting for the application to be able to get the the original digital::Error, which was the source of the whole thing, and not a plain BusError, which contains no further information.

@therealprof
Copy link
Contributor Author

The point is that it would be interesting for the application to be able to get the the original digital::Error, which was the source of the whole thing, and not a plain BusError, which contains no further information.

We could support this with this proposal. I do not think this is very practical though to allow passthrough errors and also encourage implementations to make use of them because you have an explosion of possible error cases and the whole point of the proposal is to unify them.

What would happen to your application if we did that? Well, if your application uses I2C with a bitbang-hal implementation a read might return i2c::Error(digital::Error(BogusPin)) while any other I2C peripheral implementation would return i2c::Error(BusError) (or a more specific Error) for the very same thing so your error handling needs to cater for all possible cases.

@eldruin
Copy link
Member

eldruin commented Jun 23, 2020

What would happen to your application if we did that? Well, if your application uses I2C with a bitbang-hal implementation a read might return i2c::Error(digital::Error(BogusPin)) while any other I2C peripheral implementation would return i2c::Error(BusError) (or a more specific Error) for the very same thing so your error handling needs to cater for all possible cases.

Hmm, there is a mismatch in the types. I am not sure if it is intended.
The bitbang-hal::i2c::Read::try_read() method would return embedded_hal::i2c::Error::BusError(embedded_hal::digital::Error(BogusPin)).
A hardware I2C peripheral would return embedded_hal::i2c::Error::BusError(Something else or just nothing).

If somebody is not interested in causes for the BusError and just wants to retry, they would write this:

use embedded_hal::i2c;
match i2c.try_read() {
  Err(i2c::Error::BusError(_)) => // retry or so, no dependencies on whatever is in there
  // ...
}

If somebody is interested in some underlying causes, then of course they have a dependency to whatever is inside, but this dependency is already there. e.g. they already know they are using bitbang-hal and that the error might be from the pins the application itself put into it.
Here the code:

use embedded_hal::i2c;
match i2c.try_read() {
  Err(i2c::Error::BusError(digital::Error(pin))) => // log diagnostics or so
  Err(i2c::Error::BusError(e)) => // something else happened
  // ...
}

So the point is that code that does not care about underlying errors can just ignore them and have no dependencies and code that cares about them can use them.

@dbrgn
Copy link
Contributor

dbrgn commented Jun 23, 2020

I think there are two cases that we need to consider: A generic driver (which will not be able to deal with any platform-specific or project-specific error) and a hardware-specific application.

A generic driver will be very happy about unified errors because it allows it to handle certain cases like NACKs. I think nobody disagrees with the idea that there should be a list of predefined error variants.

However, an application like the vacuum robot may want to handle different kinds of bus errors differently.

If the e-h trait does not allow nesting errors (and thus the error information is not available by using the e-h API), then the driver will need to be modified with a custom non-e-h-compatible interface that returns the needed error information. For example:

// Pseudocode

// This is the hardware-specific API with more error information
impl Driver {
  fn write_with_error_details(&mut self, bytes: &[u8]) -> Result<(), DriverError> {
    ...
  }
}

// This is the generic driver API
impl Write for Driver {
  fn write(&mut self, bytes: &[u8]) -> Result<(), I2cError> {
    self.write_with_error_details(bytes)
      .map_err(|e| match e {
        DriverError::Gpio(_) => I2cError::BusError,
        DriverError::I2c(_) => I2cError::BusError,
      })
  }
}

The problem is that now the driver contains "prorietary" APIs that need to be called by the application. The developer of the application would need to fork or copy-paste the driver code in order to modify it.

If instead the e-h error enum allows wrapping a generic error, then the driver crate could simply pass through the underlying error, and the application could handle it if desired.

Of course this does not mean that we should not try to cover all possible error cases, but it still offers an escape hatch for developers that don't want to go through the "propose-change-to-e-h > get-change-accepted > wait-for-release > update-driver > wait-for-release > update-application" cycle.

@therealprof
Copy link
Contributor Author

Hmm, there is a mismatch in the types. I am not sure if it is intended.

Sorry, just me being sloppy.

Err(i2c::Error::BusError(digital::Error(pin))) => // log diagnostics or so

That will not work. You either need have a dedicated type for the BusError which includes all possible variants, like digital::Error or you need to make it a generic type which is incredibly painful to use.

@therealprof
Copy link
Contributor Author

therealprof commented Jun 23, 2020

However, an application like the vacuum robot may want to handle different kinds of bus errors differently.

I still don't see it. If you have a universal application then you have no idea (and shouldn't!) what to do about a GPIO error that causes a bit-banging implementation to fail, if the bit-banging driver fails to work around the problem (e.g. by retrying the transaction) then it's game over anyway and you can only tell the application: "Sorry, something's wrong with the I2C bus."

If you want to have a proprietary application which cares about those implementation specific details then you can still offer an API in your bitbang-hal driver to provide more detailed information about the problem and ways to recover but that's off the path for what embedded-hal can provide IMHO.

@eldruin
Copy link
Member

eldruin commented Jun 23, 2020

Hmm, there is a mismatch in the types. I am not sure if it is intended.

Sorry, just me being sloppy.

No problem.

Err(i2c::Error::BusError(digital::Error(pin))) => // log diagnostics or so

That will not work. You either need have a dedicated type for the BusError which includes all possible variants, like digital::Error or you need to make it a generic type which is incredibly painful to use.

Yes, we would need a generic error variant like already proposed:

#[non_exhaustive]
pub enum Error<E> {
    /// An unspecific bus error occured
    BusError(E),
    // ...
}

This is nothing really new, all drivers already define a generic Error<E> type and to me it has been quite fine so far.
For example, in a driver:

pub enum Error<E> {
  I2C(E),
  InvalidInputData,
}

Would now become:

pub enum Error<E> {
  I2C(i2c::Error<E>),
  InvalidInputData,
}

We now have an unified error layer in between but otherwise the driver code would see very few changes.

@dbrgn
Copy link
Contributor

dbrgn commented Jun 23, 2020

If you want to have a proprietary application which cares about those implementation specific details then you can still offer an API in your bitbang-hal driver to provide more detailed information about the problem and ways to recover

Yes, that's precisely the problem. A user would need to fork all driver crates and modify them with a custom API in order to provide detailed error information.

Especially if we want to promote embedded-hal in a commercial environment, it seems a bit weird to first advertise a "production-ready ecosystem of mature generic and compatible drivers" but then say "if you want to do error handling for your hardware-specific project, you will need to fork all generic drivers and adjust them to your needs".

but that's off the path for what embedded-hal can provide IMHO

No, embedded-hal can definitely provide a solution for this if we want (the solution with a generic parameter that @eldruin and others proposed). The only question is whether we want to sacrifice this use case in order for the syntax to look slightly more clean (without the type parameter).

@therealprof
Copy link
Contributor Author

Well, at the moment it has to be generic because the error is in associated type within the HAL impl. I would like to get rid of that, that's part of my proposal.

The problem with the generic being all the way generic is that now the application needs to know implementation details which makes the approach less universal. I planned to get rid of the associated error type in the impls, your suggestion would turn that into an inner error type which now needs to be carried around anywhere and quite frankly makes things a lot worse.

@therealprof
Copy link
Contributor Author

Yes, that's precisely the problem. A user would need to fork all driver crates and modify them with a custom API in order to provide detailed error information.

That is not correct. We already have proprietary methods on HAL impls, e.g. to construct them or initialise or make them change something internally (like switching modes on GPIO pins). You will always have implementation details which cannot be governed by e-h. Figuring out why bitbang-hal failed is nothing that should concern e-h at all.

No, embedded-hal can definitely provide a solution for this if we want (the solution with a generic parameter that @eldruin and others proposed). The only question is whether we want to sacrifice this use case in order for the syntax to look slightly more clean (without the type parameter).

I strongly disagree. We're not sacrificing any usecase here, it's merely a matter of the amount of precision we want to provide out of the box at the cost of complexity. I am of the opinion that simplicity should be the goal. The vast majority of usecases will benefit from the simple approach and only very few specific usecases might actually benefit from the ability to nest errors; do we really want to complicate the implementation and add boilerplate for every user because of some obscure usecases which also could be easily implemented in a different way?

@eldruin
Copy link
Member

eldruin commented Sep 30, 2020

I don't follow. Currently there's no plan to include anything in embedded-hal, it would be merely a user of the error types provided by embedded-error, [...]. As such ImplError automatically becomes an integral part of the concept.

I already highlighted here what I think the problem with ImplError is. Here adapted to the current proposal:
For ImplError to be useful, the information transmitted by it must be very specific to the implementation problem. This naturally leads to a high amount of errors.

  • If we accept these, after some time several similar variants of an error may be available so that users do not know what could actually be returned when calling a method, then they need to look into the implementation thus defeating its purpose.
  • To avoid this, slightly similar variants could be united into one, but then implementations return the same error for different situations, thus defeating the purpose again since the original error information needs to be processed anyway.

It seems like it is either overwhelming and thus difficult to use or a small extension of the available variants and thus imprecise.

Since generic drivers have no use for ImplError either, why bother adding it?

I would simply leave ImplError out of embedded-hal (in whatever integration form. through dependency or own code) and let applications/HALs define their own.
I would add a simple enum with the error variants from the protocols and make implementations return I2cError::Other in kind() for whatever else. If the user is interested, the original information can still be taken out.

@therealprof
Copy link
Contributor Author

It seems like it is either overwhelming and thus difficult to use or a small extension of the available variants and thus imprecise.

I do not agree. Have you looked at the available kinds? Any kinds you would add from the top of your head?

Since generic drivers have no use for ImplError either, why bother adding it?

Actually they do. Examples include things like handling (or not being able to handle) NACKs, signalling Timeouts, disconnected devices, discovery problems... Those are even more relevant when stacking implementations, e.g. the driver for a GPIO expander connected via SPI would like to signal that a connectivity issue is the problem for your pin toggling problems.

let applications/HALs define their own.

That is an declared anti-goal of this approach, pretty much the only thing I am not okay with.

If some system wants to define their own proprietary errors to allow more fine grained handling or better diagnostics that is okay but it should be exception and not the norm.

We want to give HAL/driver implementers a solid and consistens toolbox to communicate errors and drivers as well as applications universal and consistent information to allow them to handle any situation in an appropriate way (if they so desire).

@eldruin
Copy link
Member

eldruin commented Sep 30, 2020

It seems like it is either overwhelming and thus difficult to use or a small extension of the available variants and thus imprecise.

I do not agree. Have you looked at the available kinds? Any kinds you would add from the top of your head?

I am sure people will come up with more.

Since generic drivers have no use for ImplError either, why bother adding it?

Actually they do. Examples include things like handling (or not being able to handle) NACKs, signalling Timeouts, disconnected devices, discovery problems... Those are even more relevant when stacking implementations, e.g. the driver for a GPIO expander connected via SPI would like to signal that a connectivity issue is the problem for your pin toggling problems.

Things like NACKs are already part of the I2cError enum, not ImplError. Same thing with I2cError::Timeout (although that is part of SMBus, not I2C).

A driver getting an OutputPin should not care / cannot care about whether the pin comes from the MCU or through an I2C multiplexer. It really cannot handle ImplError::PowerDown.

While ImplError allows to get a bit more of information, it cannot tell exactly what went wrong without making ImplError overwhelming (because only the implementation knows), which is precisely my point.

Since it does not provide a complete solution and is also not useful for generic drivers, I would avoid it and just let I2C methods return I2C-proper errors, or Other.

let applications/HALs define their own.

That is an declared anti-goal of this approach, pretty much the only thing I am not okay with.
If some system wants to define their own proprietary errors to allow more fine grained handling or better diagnostics that is okay but it should be exception and not the norm.

MCU HALs can reuse an unified error definition like ImplError by importing it themselves form a central crate like embedded-error. I have nothing against that. Similar to the rand_core dependency, for example.

However, I would not make ImplError part of I2cError.

@therealprof
Copy link
Contributor Author

I am sure people will come up with more.

Which is encouraged. But if you can't come up with additions in a hurry I have my doubts that we're going to be swamped with an unwieldy number of requests. 😏

A driver getting an OutputPin should not care / cannot care about whether the pin comes from the MCU or through an I2C multiplexer. It really cannot handle ImplError::PowerDown.

Maybe not PowerDown but it may want to retry a few times if it receives an ImplError::Timeout.

While ImplError allows to get a bit more of information, it cannot tell exactly what went wrong without making ImplError overwhelming (because only the implementation knows), which is precisely my point.

It doesn't need to tell exactly what went wrong but if it can point into the right direction that is a huge help. Without it all a non-device specific application can only do a best-effort problem resolution (or just crash), with a useful ImplError it has a ton more options.

@eldruin
Copy link
Member

eldruin commented Sep 30, 2020

I am sure people will come up with more.

Which is encouraged. But if you can't come up with additions in a hurry I have my doubts that we're going to be swamped with an unwieldy number of requests. 😏

My ability to come up with examples right on the spot is beside the point. A high amount of requests within a short period of time (leading to "swamping") is also beside the point.
The point is that the number of ImplError error variants is expected to grow, which adds to my point of an unwieldy number of ever-so-slightly-different error variants.

A driver getting an OutputPin should not care / cannot care about whether the pin comes from the MCU or through an I2C multiplexer. It really cannot handle ImplError::PowerDown.

Maybe not PowerDown but it may want to retry a few times if it receives an ImplError::Timeout.

Again, Timeout is already part of I2cError, not ImplError. A generic driver would be able to handle it without needing ImplError.

While ImplError allows to get a bit more of information, it cannot tell exactly what went wrong without making ImplError overwhelming (because only the implementation knows), which is precisely my point.

It doesn't need to tell exactly what went wrong but if it can point into the right direction that is a huge help. Without it all a non-device specific application can only do a best-effort problem resolution (or just crash), with a useful ImplError it has a ton more options.

If it can only point into the right direction, you will still need to process the implementation-specific error if you want to fix it or know what actually happened.
We may as well cut the middle-man and just encourage applications to choose ImplError as their implementation error type if they are satisfied with the level of information. That would keep the errors simple.

Why force an additional set of somewhat unrelated errors into every error definition if we already have a mechanism to get the underlying information?

@therealprof
Copy link
Contributor Author

Why force an additional set of somewhat unrelated errors into every error definition

They are not unrelated, they are general system errors which apply to any peripheral which is exactly why they are in every error definition.

if we already have a mechanism to get the underlying information?

That mechanism is useless in the vast majority of cases because you're back to having device specific error handling which is exactly what we need/want to get away of. I have honestly no idea why some people are insisting on having it but for me this is a compromise I'm willing to make as long as we don't loose the ability to do universal error handling.

@TeXitoi
Copy link

TeXitoi commented Oct 1, 2020

To define the set of enum, what about checking a few existing errors in hal, and see how we would transfer it to the corresponding ErrorKind.

For example, the error used in https://github.com/stm32-rs/stm32f4xx-hal/pull/218/files look quite protocol related and should fit in the ErrorKind type.

OTOH, the DMABufferNotInDataMemory error in nrf-hal doesn't seem appropriate for a dedicated entry in kind, and, I think, would enter in the Other case.

linux embedded hal is not really helpful as it seems to directly use std::io::Error as its error type.

Other seems to be the right name, as it is also used in std::io::ErrorKind::Other https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.Other

So, maybe we can begin with a small set of variant in the embedded-error kind types, try to use it in a few hal, and see if it need additions. It would be semver compatible.

The only thing to really choose are:

  • if we add some "generic" error that apply to every errors (as OS related errors, as permission denied and the like, corresponding to the actual ImplError) and its name. ImplError is not really talkative, maybe GenericError or something like that?
  • if we use a "generic" error, does the Other variant belong to the generic error, or to the specific errors.

@therealprof
Copy link
Contributor Author

@TeXitoi I don't think what HAL impls are doing right now is indicative for what they will look like after we make the change. Since all the moment all error types are bespoke there's no right or wrong choice.

So, maybe we can begin with a small set of variant in the embedded-error kind types, try to use it in a few hal, and see if it need additions. It would be semver compatible.

That's exactly what I did: Considered a few usecases and defined the variants for them.

if we add some "generic" error that apply to every errors (as OS related errors, as permission denied and the like, corresponding to the actual ImplError) and its name. ImplError is not really talkative, maybe GenericError or something like that?

The thought behind the name ImplError was to suggest implementation specific errors. I'm not attached to the name, only to the concept. 😉

I'd be okay adding an Other variant to ImplError but I'd rather call it something like Custom or Extended to indicate that there would is additional custom information available requiring a custom implementation to extract.

@TeXitoi
Copy link

TeXitoi commented Oct 1, 2020

@TeXitoi I don't think what HAL impls are doing right now is indicative for what they will look like after we make the change. Since all the moment all error types are bespoke there's no right or wrong choice.

But it gives a good idea of the practical errors. I agree that it doesn't show anything in error analysis.

So, maybe we can begin with a small set of variant in the embedded-error kind types, try to use it in a few hal, and see if it need additions. It would be semver compatible.

That's exactly what I did: Considered a few usecases and defined the variants for them.

As @eldruin thinks there is too much kinds, I propose to begin with a really minimal set. For example, I didn't find a concrete example of GpioError::WrongMode and ImplError::OutOfMemory for example. We can always add them later, that's not a breaking change.

if we add some "generic" error that apply to every errors (as OS related errors, as permission denied and the like, corresponding to the actual ImplError) and its name. ImplError is not really talkative, maybe GenericError or something like that?

The thought behind the name ImplError was to suggest implementation specific errors. I'm not attached to the name, only to the concept. wink

impl has a meaning in rust, thus I didn't understand the name. As it is added to every error kinds, I thought it represents some kind of error that are applicable everywhere.

@therealprof
Copy link
Contributor Author

As @eldruin thinks there is too much kinds, I propose to begin with a really minimal set. For example, I didn't find a concrete example of GpioError::WrongMode and ImplError::OutOfMemory for example. We can always add them later, that's not a breaking change.

My understanding of the concern is that in the future there will be too many kinds, which I totally disagree with -- there cannot be too many error kinds: when an error can occur and is not represented by a similar kind yet then it should be added and there should be a very low barrier to getting it in.

Indeed there's no concrete examples of those kinds yet, mostly because there're only very few implementations at the moment. It makes little sense to me to cut down on error kinds which have been already identified only to add them again later when someone complains about them missing. I do already have plans to use GpioError::WrongMode in a GPIO port expander driver.

impl has a meaning in rust, thus I didn't understand the name. As it is added to every error kinds, I thought it represents some kind of error that are applicable everywhere.

Yes, implementation specific errors are applicable everywhere and it's the only common variant on all peripheral specific kinds to express problems caused by HAL implementations or drivers. If you can think of a better name, bikeshed away. 😅

@TeXitoi
Copy link

TeXitoi commented Oct 1, 2020

I do already have plans to use GpioError::WrongMode in a GPIO port expander driver.

Shouldn't it be handled by type state?

If you can think of a better name, bikeshed away.

I have GenericError, but I'm not convinced.

@eldruin
Copy link
Member

eldruin commented Oct 6, 2020

We have barely started using embedded-error (AFAIK) and right now, embedded_error::I2cError already has two different timeout error variants:

Even written differently.
I believe this can be confusing.

@therealprof
Copy link
Contributor Author

We have barely started using embedded-error (AFAIK) and right now,

Sure. We're kind of in a gridlock (again, sic!). For reference, I was trying to introduce embedded-error into various HAL impls, starting with stm32f4xx-hal when someone restarted the discussion: stm32-rs/stm32f4xx-hal#218 . It doesn't make sense to me to implement embedded-error everywhere when people are still requesting major changes.

embedded_error::I2cError has already two different timeout error variants:

Even written differently.
I believe this can be confusing.

They do have a different purpose. Anything in the peripheral specific Error is, well, peripheral specific (although we could discuss if we actually want to have a separate Error type for SMBus), anything in ImplError is implementation specific. Timeout is coming coming from the an actual implementation (and I think it might be called that way in the reference manual, too), TimedOut was made up by me; I don't have any real preference though I don't think calling them the same would make things clearer but I'm open to suggestions.

@eldruin
Copy link
Member

eldruin commented Oct 6, 2020

We have barely started using embedded-error (AFAIK) and right now,

Sure [...]

No criticism there. I am trying to highlight that despite a small number of errors (so far), there is already a case of similar-but-actually-different error variants.
I understand these two timeout error variants. However, as I wrote, I believe they can be be confusing, depending on your familiarity with HAL implementations/Rust (embedded) experience.

When more variants are added, this may become more complicated.

there cannot be too many error kinds: when an error can occur and is not represented by a similar kind yet then it should be added and there should be a very low barrier to getting it in.

Exactly because of this, I believe more similar-but-slightly-different error variants will be available in the future and error handling on the user side will become more confusing.

I believe it would be better for everybody if operations only return their own error variants and use Other for any arbitrary "implementation/generic" errors.

@therealprof
Copy link
Contributor Author

I believe it would be better for everybody if operations only return their own error variants and use Other for any arbitrary "implementation/generic" errors.

As mentioned before that's an absolute no-go for me: the only thing I deeply care about is the ability to handle common errors generically and Other takes exactly that ability away. If someone doesn't care for those error kinds they can simply do a wildcard match and get exactly the same effect, the other way around is not possible.

@eldruin
Copy link
Member

eldruin commented Oct 6, 2020

I believe what most people need is to handle protocol-own errors, especially I2C's NACK, not underlying errors.

ImplError started as a later addition to forward underlying errors. @therealprof wrote:

Since there seems to be an interest in relaying underlying error causes to a user, how about we add a universal ImplError enum with a number of common error cases and add this to each of the trait specific Error variants?

@therealprof
Copy link
Contributor Author

I believe what most people need is to handle protocol-own errors, especially I2C's NACK, not underlying errors.

I don't. Quite a few people actually declared it to be a deal breaker not to have the ability to declare custom errors which is even a step further in that direction (and something I do not agree with but I'm willing to compromise on).

ImplError started as a later addition to forward underlying errors. @therealprof wrote:

That is correct, I missed that in the initial proposal and added it after a ton of discussion and requests. So?

@eldruin
Copy link
Member

eldruin commented Oct 6, 2020

I am one of the persons that thinks that forwarding underlying error information must be possible.
That is possible without ImplError.
ImplError is only about standardisation of underlying error causes. That is a second level of standardisation.

I believe error types would be more flexible without attempting to represent every possible hardware failure situation inside each error type. This information can be transported separately with the approach from proposal 3.

ImplError started as a later addition to forward underlying errors. @therealprof wrote:

That is correct, I missed that in the initial proposal and added it after a ton of discussion and requests. So?

So, standardisation of underlying errors was not the most pressing need in the community.

@therealprof
Copy link
Contributor Author

I am one of the persons that thinks that forwarding underlying error information must be possible.

I am aware. And again that is a compromise I'm willing to make.

I believe error types would be more flexible without attempting to represent every possible hardware failure situation inside each error type. This information can be transported separately with the approach from proposal 3.

If you can get to the underlying error, you can get to the underlying error no matter what the error value is. So there's absolutely no difference in flexibility.

So, standardisation of underlying errors was not the most pressing need in the community.

Err, you certainly have a way of twisting words in your favour. 🙃

@dbrgn
Copy link
Contributor

dbrgn commented Oct 6, 2020

Err, you certainly have a way of twisting words in your favour. 🙃

Please refrain from personal attacks.

@therealprof
Copy link
Contributor Author

@dbrgn That's not an attack but an observation. A lot of stuff was added after the initial approach after a ton of discussion; I don't see why this specific addition has any lesser value than the downcasting which was suggested even later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants