Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Enabled improved panic error reporting by default #12763

Conversation

koute
Copy link
Contributor

@koute koute commented Nov 23, 2022

This PR will make it so that any runtime built with require the abort_on_panic host function first introduced in #10741.

(This still needs PRs for polkadot and cumulus bumping the version of substrate they use for this to take effect; I'll make the relevant PRs after this one gets merged, as I don't really want to deal with the jankiness of companions if I don't have to.)

@koute koute added A0-please_review Pull request needs code review. J0-enhancement An additional feature request. B7-runtimenoteworthy C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Nov 23, 2022
@koute koute requested a review from a team November 23, 2022 05:47
@bkchr
Copy link
Member

bkchr commented Nov 23, 2022

While this doesn't introduce the host function, still lets tag it with the label to make people aware that we will use the new host function.

Regarding companions, I think we can just wait until some other change is merged in Polkadot/Cumulus that will pull in this pr.

@tomaka
Copy link
Contributor

tomaka commented Nov 23, 2022

Before we're shipping this into production forever, I must say that I'm personally really unhappy with this function, for three reasons:

  1. There's no reason why abort_on_panic itself cannot stop the execution. Why did we go for the solution to magically add an unreachable opcode when the function returns? Can the executor not stop the execution when the function is called?

  2. The name says _on_panic, giving the impression that it can only be called in case of a Rust panic. This isn't true. The function can be called at any time. The word "panic" here is very Rust-specific.

  3. The function is called abort but doesn't abort. It's almost deliberately misleading. The function just sets the reason why we're going to intentionally put an unreachable opcode later. If we don't change its behavior, then it should at least be called something like gravestone_set (name not suggested by me) or set_exit_reason or something.

If I had some sort of veto right I would 100% veto this.

@bkchr
Copy link
Member

bkchr commented Nov 23, 2022

I understand the concerns regarding the naming. Fine by me to rename this. So what would be the naming AbortHandler_abort?

I'm also okay with making the abort function directly abort. However, we will still need to insert some unreachable! statement after the actual call to keep the compiler happy or does someone has some better idea?

@koute WDYT?
Also pulling @pepyakin to give him some chance to leave a comment.

@bkchr
Copy link
Member

bkchr commented Nov 23, 2022

Discussion moved here: w3f/polkadot-spec#587

@koute
Copy link
Contributor Author

koute commented Nov 23, 2022

  1. Why did we go for the solution to magically add an unreachable opcode when the function returns? Can the executor not stop the execution when the function is called?

WASM FFI has no concept of functions that never return. And we use this inside of a #[panic_handler] and #[alloc_error_handler], both of which must return !. So we must have an unreachable on the WASM side there anyway.

2. The name says _on_panic, giving the impression that it can only be called in case of a Rust panic. This isn't true. The function can be called at any time. The word "panic" here is very Rust-specific.

True, it can be called at any time, if you manually plomp down an extern block. But I disagree that "panic" is very Rust-specific. There are other languages which use "panic" to specify that the execution is going to abort, and the term itself goes back at least 47 years back where it was used inside of the Unix kernel and was (and I quote) "panic is called on unresolvable fatal errors".

If I had some sort of veto right I would 100% veto this.

With all due respect, I kinda wish you'd veto'd this, err, 15 versions of Polkadot ago, instead of just now when it's already available on essentially all of the nodes. (:

I don't really care if we change this. You're right that this might be slightly confusing for people implementing Polkadot hosts. (Runtime writers, I believe, are not going to care, as they won't be directly calling the underlying host function.) As I've said in the original PR, I'm fine with going with whatever the consensus is. Just please note that changing this will further delay enabling this feature.

@tomaka
Copy link
Contributor

tomaka commented Nov 23, 2022

WASM FFI has no concept of functions that never return. And we use this inside of a #[panic_handler] and #[alloc_error_handler], both of which must return !. So we must have an unreachable on the WASM side there anyway.

The fact that this unreachable is needed is an implementation detail of the Rust language, which shouldn't have any influence in the runtime <-> host interface.

True, it can be called at any time, if you manually plomp down an extern block.

Similarly, this is a Rust-specific concern. How convenient it is to call a function in source code shouldn't have any influence in the design of the runtime <-> host interface.

But I disagree that "panic" is very Rust-specific. There are other languages which use "panic" to specify that the execution is going to abort, and the term itself goes back at least 47 years back where it was used inside of the Unix kernel and was (and I quote) "panic is called on unresolvable fatal errors".

There are two different ways to interpret the name abort_on_panic:

  • One interpretation is "abort execution when an unrecoverable error happens", which is wrong because this function can be called at any time.
  • The other interpretation is "abort execution, [this function is called] when execution reaches the code path corresponding to a [Rust] panic", which has a Rust-specific definition of "panic".

With all due respect, I kinda wish you'd veto'd this, err, 15 versions of Polkadot ago, instead of just now when it's already available on essentially all of the nodes. (:

We have absolutely no concept of which runtime functions are stable and which are unstable. Conceptually, to me, a runtime function becomes stable when it starts being used, as this is pragmatically speaking the moment when we can't change it anymore. I cannot know, at the moment when a function is added to the source code, whether it will be changed before it's stabilized.

We have a lot of things available on all the nodes and that aren't used, another example being #12639 and we removed it without any problem.

@pepyakin
Copy link
Contributor

Left my feedback w3f/polkadot-spec#587 (comment)

@stale
Copy link

stale bot commented Dec 24, 2022

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Dec 24, 2022
@stale stale bot closed this Jan 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit J0-enhancement An additional feature request.
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

4 participants