Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling in hooks instead of panics #2520

Open
2 tasks
Anmol1696 opened this issue Aug 26, 2022 · 12 comments
Open
2 tasks

Error handling in hooks instead of panics #2520

Anmol1696 opened this issue Aug 26, 2022 · 12 comments

Comments

@Anmol1696
Copy link

Anmol1696 commented Aug 26, 2022

Background

Avoiding raising panic on error for hooks by handling the errors instead of recovering from panic in hooks interfaces. Currently a panic from any of the hooks are first recovered and then handled, we should be just handling the error instead

This is possible now since we pass cacheCtx to hooks functions instead of ctx.

Note: All module hooks functions would need to be modified. Each module would be a seperate PR.

Suggested Design

Change the interface for hook functions to return error, and handle the error before write() to context.

// in panicCatchingEpochHooks, executing the hooks functions
...
cacheCtx, write := ctx.CacheContext()
err := hookFn(cacheCtx, epochIdentifier, epochNumber)
if err != nil {
    // Cancel the cacheCtx
    return
}
write()

Example

Change in EpochHooks interface would be something like:

type EpochHooks interface {
    // the first block whose timestamp is after the duration is counted as the end of the epoch
    AfterEpochEnd(ctx sdk.Context, epochIdentifier string, epochNumber int64) error
    // new epoch is next block of epoch end block
    BeforeEpochStart(ctx sdk.Context, epochIdentifier string, epochNumber int64) error
}

Acceptance Criteria

  • Make sure the tests pass
  • All keeper implementation for the EpocHooks interface changed (passing tests)
@puneet2019
Copy link
Member

ah, sweet #2519 already makes this very straight forward..

@ValarDragon
Copy link
Member

This is a great idea, lets do it!

@Anmol1696
Copy link
Author

Lets scope this issue only for the epoch module, but i would want to create these issues for other hooks as well. It would be best to have error handling over panic recovery for functions.

@puneet2019
Copy link
Member

not really @Anmol1696 , it's do it for all, will be weird to have it included in different versions.
I mean you can create smaller sub issues for each module hooks. but every hook should be taken care of

@Anmol1696
Copy link
Author

Yup agreed. I was planning to create smaller issues for all the other hooks as well and then tackle it. Cool let me just add the PRs to this issue itself then.

@ValarDragon
Copy link
Member

Ah my bad, I thought you were doing it for just epochs my bad. I don't think this is something we should do for all hooks, several should have the ability to halt the action. I guess we can always leave it to hook caller level to panic

@puneet2019
Copy link
Member

@ValarDragon
I just wanted it for epochs. But why not for all? Even returning the error halts the action. more or less same thing because we are using ApplyFuncOfNoError, just the keyword panic will be gone. Panic recovery is still a good to have for random store instances that might panic (this should never happen tho)

func ApplyFuncIfNoError(ctx sdk.Context, f func(ctx sdk.Context) error) (err error) {
   // Add a panic safeguard
   defer func() {
   	if recoveryError := recover(); recoveryError != nil {
   		PrintPanicRecoveryError(ctx, recoveryError)
   		err = errors.New("panic occurred during execution")
   	}
   }()
   // makes a new cache context, which all state changes get wrapped inside of.
   cacheCtx, write := ctx.CacheContext()
   err = f(cacheCtx)
   if err != nil {
   	ctx.Logger().Error(err.Error())
   } else {
   	// no error, write the output of f
   	write()
   	// Temporary, should be removed once: https://github.com/cosmos/cosmos-sdk/issues/12912
   	ctx.EventManager().EmitEvents(cacheCtx.EventManager().Events())
   }
   return err
}

@Anmol1696
Copy link
Author

Ideally we should have the ability for the hooks to return an error. They can always panic if there is a usecase for it. Currently for all hooks there is no error handling, just panic recoverying (if at all).

@ValarDragon
Copy link
Member

I see, that makes a lot of sense

@ValarDragon
Copy link
Member

I definitely agree, the semantic for hooks should be return an error, rather than panicking.

But whether the chain should:

  • continue on error and just log
  • halt on error (for beginblock/endblock code)
  • revert action on error (for txs)
  • revert action on error (and at what scope) for beginblock/endblock code

Is pretty subtle thing to navigate. Would rather not do this for the particularly complex / sensitive cases right now.

To mitigate complexity in these, I would like to make some SDK level framework for navigating and defining this tradeoff space. Most tx code should just be revert action on error (for txs).

But Beginblock/Endblock code and what scope we revert at actually takes more intense thought. I'd rather reason about a more general framework for defining this, and if {thing A} errors/gets reverted, ensuring that later things in the pipeline are aware and can not trigger accordingly.

@ValarDragon
Copy link
Member

Right now, reading the code, the x/mint one is super easy to reason about

But others PR's you made were probably very close, but I'm worried about it being subtly off and want to spend more time on that. But the more general case + architecture for a framework dealing with this is going to be the more effective way (that also solves many more isolation issues)

@Anmol1696
Copy link
Author

Makes sense, I would be curious to keep an eye on how you are planing to redesign the system. Could you point me to where the discussion is happening?
Agreed, better to handle it from the more generic perspective than this. I will clean up x/mint PR for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs Triage 🔍
Development

No branches or pull requests

3 participants