Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] Value based exceptions (Zero-cost exceptions) #111

Closed
redradist opened this issue Nov 10, 2022 · 26 comments
Closed

[SUGGESTION] Value based exceptions (Zero-cost exceptions) #111

redradist opened this issue Nov 10, 2022 · 26 comments

Comments

@redradist
Copy link

redradist commented Nov 10, 2022

This feature is based on https://youtu.be/ARYP83yNAWk?t=2227
Long time ago I wrote to you an email about it, but it was lost somewhere in history of civilisation ...
Anyway, I just want to extend your idea to use value based exceptions like it is done in Rust for example with Result<,> class and ? syntax
See the code:

run: (file: &std::string) -> int32_t throws Error1, Error2 {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

This function could throw 2 exceptions Error1 or Error2.
Implementation could be done using union and value for describing type of error.
See godbolt possible underlying implementation: https://godbolt.org/z/a_vbNw

I suggest to convert all functions without throws keyword ( in cpp2 syntax) to function with noexcept (in C++ current syntax)

Also a list of exceptions could be generated automatically by analyzing which new cpp2 syntax functions is called from current context:

run: (file: &std::string) -> int32_t throws {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

could be converted to:

run: (file: &std::string) -> int32_t throws Error1, Error2 {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

Because cpp2 compiler knows that , for example, File::open(file).try and file.read_to_string(contents&).try could throw Error1 or Error2

@switch-blade-stuff
Copy link

To add to this, it could be implemented via std::expected<T, std::error_code>.
std::expected is C++23, but is very easy to backport if necessary.

@redradist
Copy link
Author

@switch-blade-stuff

To add to this, it could be implemented via std::expected<T, std::error_code>. std::expected is C++23, but is very easy to backport if necessary.

Completely agree, std::expected<T, std::error_code> is very similar to Result<,> in Rust

@fluffinity
Copy link

Also a list of exceptions could be generated automatically by analyzing which new cpp2 syntax functions is called from current context:

I do not think we want this behavior. The kinds of errors a function can throw should be considered part of its contract. So if we go with the current example every time the try operator gets used the error can only be Error1 or Error2 but not Error3. This way the possible errors become visible at the signature level making not only reasoning about functions as a programmer easier but tools can also handle them easier as the do not have to scan the body in order to obtain this information.

In the case of functions provided by external dynamic libraries this is actually the only way to know which error values can even be thrown as the function body may not be available at compile time. They may not be here in cpp2 now but have to be considered for this feature.

@JohelEGP
Copy link
Contributor

I think I remember a speaker mentioning otherwise, just like C++1 moved away from throws. I think they also mentioned how new languages don't go down that path. Do you have proof against that?

@fluffinity
Copy link

Rust essentially makes exactly this with its Result<T, E> type. The second type argument E encodes the error type that can be returned from the function so the function can only return errors of this type E.

@JohelEGP
Copy link
Contributor

Right, it remains in the signature. I think I got confused with how throws originally worked. So OP is suggesting repurposing it for a C++2 with built-in support for value-based exceptions.

@fluffinity
Copy link

Exactly.

@redradist
Copy link
Author

@JohelEGP

I think I remember a speaker mentioning otherwise, just like C++1 moved away from throws. I think they also mentioned how new languages don't go down that path. Do you have proof against that?

Original throws was not deduced from context, but original idea was not bad, because it shows all exceptions.
I suggest to return throws keyword with potential manual list of exceptions, but the most powerfully this feature would be in case of compiler deduction, for example:

run: (file: &std::string) -> int32_t throws {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

compiler could deduce that run throws only Error1 and Error2 and add additional mangling to signature like run_std::string_Error1_Error2 (pseudo-mangling)
I think the original feature with list of exceptions is good, but if list of exceptions is deduced by compiler using the signature

@fluffinity
Copy link

@redradist

Original throws was not deduced from context, but original idea was not bad, because it shows all exceptions.
I suggest to return throws keyword with potential manual list of exceptions, but the most powerfully this feature would be in case of compiler deduction, for example:

I do not think it is good to allow for this deduction by default for the reasons I have given in this comment. However it would be possible to use this deduction logic to offer a tool that generates the throws exception list for you. Such a tool would make migrations to cpp2 code easier while still having explicit exception lists.

@hsutter
Copy link
Owner

hsutter commented Nov 19, 2022

Specifically about listing/deducing the exceptions that can be thrown:

@redradist Here's one brain dump of the issues: https://herbsutter.com/2007/01/24/questions-about-exception-specifications/

Probably the most fundamental issue is that listing a specific set of exceptions is not composable... here are two main aspects of that:

  1. It's brittle: It always exposes implementation detail, and does it virally (e.g., what if you change your implementation to call a different function that has a different exception specification), which makes code brittle. This is a major reason why Java programs generally disable checked exceptions with throws Exception, even before Java got generics.
  2. It's nongeneric: With generics it's even harder, because what exception specification do you write on myfunc<T>? The answer would be to immediately invent a throws() expression to get access to the things that could be thrown, and have myfunc<T> declared as throws ( /*...?...*/ ) where "?" contains every expression its body could transitively call using a T object. Which of course also immediately exposes implementation detail and has all the problems of (1) above plus requires a lot of ugly boilerplate.

Note that (1) is still true even if the specification is deduced by the compiler (which also only works if you have the full source code for the entire call tree, which we won't always have and is another reason it's not composable).

AIUI the languages that chose to list exceptions have either backed off and deprecated/removed that feature or else have had their users de facto disable it in practice (e.g., Java throws Exception). But my information may be dated... if anyone knows of large-scale experience with listed exceptions/errors in a function interface, including for generic functions, please reply here with an example showing its use at scale in a composable way. Thanks!

@JohelEGP
Copy link
Contributor

FYI, @redradist, the proposal linked at https://github.com/hsutter/cppfront#2019-zero-overhead-deterministic-exceptions-throwing-values, IIRC, allows for value-based exceptions of arbitrary load with a single type.

@redradist
Copy link
Author

@hsutter

Specifically about listing/deducing the exceptions that can be thrown:

@redradist Here's one brain dump of the issues: https://herbsutter.com/2007/01/24/questions-about-exception-specifications/

Probably the most fundamental issue is that listing a specific set of exceptions is not composable... here are two main aspects of that:

1. It's brittle: It always exposes implementation detail, and does it virally (e.g., what if you change your implementation to call a different function that has a different exception specification), which makes code brittle. This is a major reason why Java programs generally disable checked exceptions with `throws Exception`, even before Java got generics.

I partially agree with that, but in Java the issue was that list of exceptions should be updated manually.
I think that to fix this problem we just need to allow compiler to add list of exceptions as part or signature (mangling), and if user will use a new exception that previous code should be just recompiled without manual modification of list exceptions

Note that (1) is still true even if the specification is deduced by the compiler (which also only works if you have the full source code for the entire call tree, which we won't always have and is another reason it's not composable).

No, it is possible if list of exceptions would be a part of signature run_std::string_Error1_Error2 (pseudo-mangling)

AIUI the languages that chose to list exceptions have either backed off and deprecated/removed that feature or else have had their users de facto disable it in practice (e.g., Java throws Exception). But my information may be dated... if anyone knows of large-scale experience with listed exceptions/errors in a function interface, including for generic functions, please reply here with an example showing its use at scale in a composable way. Thanks!

Okay, one additional idea, throws could mean a max size of error that could be stored on stack, see example:

main: void = {
    run("Example string");
}

run: (file: &std::string) -> int32_t throws {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

will be converted to:

main: void = {
    error_buf: std::array<uint8_t, 2035>= {};
    run("Example string", error_buf);
}

run: (file: &std::string, error_buf: &std::array<uint8_t, 2035>) -> int32_t {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

An error would be stored in error_buf without exposing the implementation details

@fluffinity
Copy link

@redradist

No, it is possible if list of exceptions would be a part of signature run_std::string_Error1_Error2 (pseudo-mangling)

This will not work with extern functions. The moment you change the exception list you change the symbol name and so we run into the problem that code compiled with a header file or module file of an old version of the function will not work when given a newer version of the function. For example:

run: (file: &std::string) -> int32_t throws Error1, Error2;

becomes run_std::string_Error1_Error2. Now if a newer version of the library comes out and gets shipped with the following version of the function:

run: (file: &std::string) -> int32_t throws Error2, Error1;

This function would become run_std::string_Error2_Error1. The symbol names are different and so we run into an ABI problem. From an API perspective both versions of run should be identical as their signature only differs in the order of exception declarations. Even if the order stays the same it is sufficient to add/remove exceptions from the list to break code.

Okay, one additional idea, throws could mean a max size of error that could be stored on stack, see example:

This does not solve the fundamental issue of this approach as you still need information about the exception types for a function. Again, if the source code is not available, you can not perform this analysis. Even if it is you need to re-compile your code if somewhere in the call chain an exception type changes. For example if deep in the call tree of file.read_to_string(contents&) an exception type gets changed the analysis has to be re-done if every function in the chain relies on this implicit exception list. And this would be a likely scenario because these implicit lists are easier to write than the explicit ones so people have to do extra work just to make their code robust against internal changes.

@fluffinity
Copy link

Going back to std::expected should work the best although instead of always using std::expected<T, std::error_code> std::expected<T, E> should be the default. This allows for returning arbitrary error information. For example, if we had the code:

send_string: (in s: std::string) -> std::expected<size_t, send_error<std::string>>;

In case send_string fails a general exception type allows for transferring back the string so we can continue using it. The string would only get consumed fully if the function succeeds. std::error_code is too low level for such operations.

@switch-blade-stuff
Copy link

To me, using std::expected would work best here, there will be no need to invent some 3d kind of error handling mechanism, and it solves the issue of name mangling too.

As for multiple alternative error types, you can use std::variant for that.

IMO it should not be a fully generic system that can propagate any kind of error implicitly, because then it is no better than the classic exceptions (and will require the runtime generic object handling overhead too).

std::error_code is too low level for such operations.

I agree, after all the purpose of std::error_code is to be a generic error code, not a universal error type. I just used it as an example for this kind of error handling since you usually see error codes used for this.

@fluffinity
Copy link

fluffinity commented Nov 20, 2022

I have checked the impact switching to this kind of error handling as the default would have on the Error Handling part of the C++ core guidelines

  • E.15 handle errors by value
  • E.18 remove because even pass-through of errors becomes explicit
  • E.25-E.31, excluding E.28 remove because heap allocation and RTTI get eliminated (https://youtu.be/ARYP83yNAWk)
  • E.28 may be removed if users can not use C headers by default. To my knowledge C++ does not make public use of errno, correct me if I am wrong here
  • For all guidelines replace the occurrences of try/catch with returning the error case/handling the error case
  • Add one guideline stating the error handling strategy and when to use which form (std::expected vs unwinding vs std::abort)

Based on this error handling strategy more changes could be implemented.

  • Make functions noexcept by default. When unwinding is reserved for severe error conditions, like violated preconditions, code may be supposed to fail fast anyways
  • Make constructors noexcept. If they can fail in some way use factory functions that use std::expected instead

I expect the first point to require the most amount of discussion and changes to make the idea work.

@Krzmbrzl
Copy link

Krzmbrzl commented Nov 23, 2022

Maybe a less informed user-point-of-view: If the compiler deduced what kinds of exceptions a given function may throw, it could warn about exceptions that have not been handled.
That should make it easier to write code that accidentally lets exceptions pass by unhandled causing mayhem further up in the call hierarchy.

In terms of calling functions whose exceptions specs are not known, cpp2 could assume that the user writing the code knows the possible exceptions and give them a way to explicitly handle those (with shortcuts for "ignore" and "abort if encountered") and if anything passes through this handling we'll abort (essentially wrapping the unknown call in a noexcept labelled wrapper function that also contains some error handling for that function)

@jcanizales
Copy link

jcanizales commented Nov 24, 2022

It's brittle: It always exposes implementation detail, and does it virally (e.g., what if you change your implementation to call a different function that has a different exception specification), which makes code brittle.

I disagree with this. If I change my implementation to call a different function that has a different exception specification, then it's my responsibility to make sure I don't propagate them if my callers don't expect arbitrary unspecified errors. This is true whether the language helps me here or not.

Consider e.g. that same situation with error codes:

/** Returns 0 on success, -1 if the directory already exists. */
int CreateDirectory(std::string name) {
  // ... do things
  int error = CallNewFunction();  // returns -2 if a network connection can't be established
  if (error) { return error; }
  // ... do things
}

I have exposed implementation details and broken my callers. The only difference is the language didn't help me prevent it. The virality of it happens if I decide the solution is to add "can also return -2" to the documentation; which is the wrong thing to do here. And if propagating is the right thing to do, the language is not helping my callers by not signaling the breaking change: They have to stumble upon it in human language. But error specification (of the type "expected situation outside the caller's control") is part of my function interface, whether I describe them in C++ or in English.

Of course, sometimes we're writing a function that's not part of any module's interface, e.g. just some internal wrapper around another function call. Similarly to how in that case I might want to specify auto as the return type, I probably don't want to enumerate the errors it propagates and just let the compiler deduce it. Like, throws auto (modulo bikeshedding). The same two solutions (return type auto, throw specifier auto) seem to me to be applicable to template code. Which is @hsutter 's second concern:

It's nongeneric: With generics it's even harder, because what exception specification do you write on myfunc? The answer would be to immediately invent a throws() expression to get access to the things that could be thrown, and have myfunc declared as throws ( /...?.../ ) where "?" contains every expression its body could transitively call using a T object. Which of course also immediately exposes implementation detail and has all the problems of (1) above plus requires a lot of ugly boilerplate.

Same deal, no? What return type do I write for myfunc<T>? I can use decltype and declval, and expose my implementation details with ugly boilerplate; or I can just let the compiler infer it with auto.

The parallelism between return types void, MyType, auto and declaring that you never fail, you might fail with error E, or "same as whatever I'm calling" is not a coincidence.

@fluffinity
Copy link

One of the first ideas in this issue was to use std::expected. This would be the simplest solution to this error specification as the current type system is sufficient and you only want a shorthand syntax for getting the value out of the expected if it is there and return the error otherwise. This is the try syntax in the original comment.
This is compatible with generic code as you can name the error type. There is only one. If multiple different errors can be returned you have to wrap them in a std::variant. Importantly, if you then choose to call a different function that may return a different error than you specified you will get a compile error. This means such changes can not have silent effects making the code more robust against them.

Of course, sometimes we're writing a function that's not part of any module's interface, e.g. just some internal wrapper around another function call. Similarly to how in that case I might want to specify auto as the return type, I probably don't want to enumerate the errors it propagates and just let the compiler deduce it. Like, throws auto (modulo bikeshedding).

This is the viral part of this exception list approach. Once you call one function with a list specified with auto you have to make your exception list auto as well because you can not know what errors that function will return. The behavior you want, however, is that only the caller needs to change their code if the callee changes their error type. This keeps the effect local to the calling code without transitive effects.

@JohelEGP
Copy link
Contributor

I think Boost.LEAF can achieve what's desired here (that is, arbitrary errors from the callee which the caller has to handle):

If we want to ensure that all possible failures are handled, we use leaf::try_handle_all instead of leaf::try_handle_some:

U r = leaf::try_handle_all(

  []() -> leaf::result<U>
  {
    BOOST_LEAF_AUTO(v1, f1());
    BOOST_LEAF_AUTO(v2, f2());

    return g(v1. v2);
  },

  []( leaf::match<err1, err1::e1> ) -> U
  {
    // Handle err::e1
  },

  []( err1 e ) -> U
  {
    // Handle any other err1 value
  },

  []() -> U
  {
    // Handle any other failure
  } );

The leaf::try_handle_all function enforces at compile time that at least one of the supplied error handlers takes no arguments (and therefore is able to handle any failure). In addition, all error handlers are forced to return a valid U, rather than a leaf::result<U>, so that leaf::try_handle_all is guaranteed to succeed, always.

@Krzmbrzl
Copy link

Once you call one function with a list specified with auto you have to make your exception list auto as well because you can not know what errors that function will return.

I don't think that is true. You could handle all errors and only let specific errors pass. E.g.

try {
    func_with_auto_throws_decl();
} catch (const std::invalid_argument &) {
    throw;
} catch (...) {
    std::abort();
}

which would only let errors of type std::invalid_argument escape, regardless of what the exact exception types of that function are. This could be made more efficient, by handling the errors that you know the function can throw and if you forget any (or the function changes at some point and now may throw more/different exceptions) the compiler will be able to error on that in order to force you handling those. After all what kinds of exceptions can be thrown, is known at compile-time.

@fluffinity
Copy link

@Krzmbrzl
I can not argue against that at the moment. But something else that is of concern for me with this approach is the fact that programmers can no longer determine the possible exceptions by looking at the signature of a function. This may encourage to write such catch all clauses just so you do not have to keep that mental burden around. The C++ core guidelines warn against that for a good reason. You are catching exceptions you may not be able to handle at all.
This is another thing which std::expected covers. It is explicit so you know the error type just by looking at the signature. The toolability remains the same while being more readable.

@Krzmbrzl
Copy link

Yes, that I absolutely agree with. Though, the mental burden would not really be that big as the compiler could (should!) tell you explicitly which exceptions you have not yet handled yet, so then you can only handle those explicitly.

@hsutter
Copy link
Owner

hsutter commented Nov 26, 2022

Thanks for all the comments, everyone. Yes, the intent is for Cpp2 to use value-based exceptions, throwing by value using a single type (some two-word type similar to std::error_condition), and not to list the specific values (which is still an open-ended set since we can use a value-based exception to wrap any existing dynamically typed exception).

programmers can no longer determine the possible exceptions by looking at the signature of a function.

I understand, but let's agree to disagree. I don't think that's desirable for the reasons I gave above (exposes implementation detail, brittle, not composable, not generic), and I'll add one thing: A major point of error/exception handling is that it's desirable to separate the error handling code from normal control flow (which makes the normal control flow clearer), and not only is the handler typically further up the call tree than the immediate call site (i.e., the call site often doesn't know what to do with the error), but it has been commented that "the value of the exception increases with distance thrown" as the distant higher code has more context and this is where automatic propagation really shines (as opposed to having to manually propagate Result or expected values).

It's still fine for a function to return an explicit status code for an immediate caller, especially for common situations (which IMO aren't "errors" -- an "error" should mean that the function could not accomplish what it promised, which does not include success-with-info). Real errors (whether delivered by codes or exceptions) are typically not for the immediate call site to handle -- typically the closest they're handled is in a batch later in the calling function (see C's goto ERROR; style), and often in a possibly-distant calling functions.

That's why IMHO it's a major weakness of Result and expected types that they require call sites to see the errors and pass them along, and languages/styles that do that keep reinventing ways to make that less painful. I want to preserve the automatic propagation of exceptions, just make them more affordable... for much more detail, see the references at README.md > 2019: Zero-overhead deterministic exceptions: Throwing values.

Thanks!

@hsutter hsutter closed this as completed Nov 26, 2022
@redradist
Copy link
Author

redradist commented Nov 26, 2022

@hsutter and all other,

Let me add a few points why I think returning throws keyword in signature is a good idea:

  1. It improves readability and maintainability
    Readability, because it will require to handle an exception somewhere like Java did. I do like Java exception when it comes to readability and maintainability !! You can see an exception flow, you do not forgot the handling exceptions.
    For example in C++ as well as in C# I do not like that in runtime I could catch some exceptions that was not described in documentation and compiler did not emphasized for me that some function is throwing exceptions ...
    In Java I know that code is "safe" because I handled all exceptions.
    The only issue with Java exceptions is that how they were implemented
  2. It could bring to Cpp2 and to Cpp1 exceptions that are based on value semantic, which are predictable
    If Cpp1 had only the following syntax from the beginning:
void main() {
    run("Example string");
}

int32_t run(const std::string& file) throws {
    auto file = File::open(file);
    const auto& contents = std::string{};
    file.read_to_string(&contents);
    return contents.trim().parse();
}

and instead of dynamic allocated memory on heap, compiler knows exception flow due to throws keyword in signature then current C++ exceptions could be implemented using allocation memory for biggest exception size on stack instead of heap ... Only reason we allocate exceptions on heap is because we do not know size of max exception at the current moment. throws keyword could fix this:

void main() {
    std::array<uint8_t, 2035> exception_buffer = {};
    run("Example string", error_buf);
}

int32_t run(const std::string& file, std::array<uint8_t, 2035>& exception_buffer) {
    auto file = File::open(file);
    const auto& contents = std::string{};
    file.read_to_string(&contents);
    return contents.trim().parse();
}
  1. List exceptions is not bad, it shows only the maximum exception with biggest size that could be thrown.
    List of exception could be sorted by compiler to prevent the issue with throws Error1, Error2 and throws Error2, Error1.
    If user do not want to expose the implementation details than user could omit the list of exceptions and a compiler will only add in signature the maximum buffer size for stored exception:
run: (file: &std::string) -> int32_t throws Error1, Error2 {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

or

run: (file: &std::string) -> int32_t throws {
    file := File::open(file).try;
    contents := std::string{};
    file.read_to_string(contents&).try;
    return contents.trim().parse().try;
}

The compiler could in first case mangle the function in the following way run_std::string_throws_Error1_Error2.
In second case - run_std::string_throws_size_2035.
List of exceptions would bring the self documented part in a new syntax

I believe the list of exceptions, a visible exception flow and value semantic could increase Cpp2 "safety", performance and determinism by a lot !!

@Krzmbrzl
Copy link

I believe that part of the discussion here, might be better covered in a suggestion of itself, that does not require compiler enforcement. I have created #144 as an attempt to factor out the discussion focused on code readability and toolability from this suggestion, which seems to be mainly concerned about finding a more performant way of actually implementing exceptions themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants