Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optimistic Execution #16581

Merged
merged 39 commits into from
Sep 18, 2023
Merged

feat: Optimistic Execution #16581

merged 39 commits into from
Sep 18, 2023

Conversation

facundomedica
Copy link
Member

@facundomedica facundomedica commented Jun 15, 2023

Description

RFC: #16499
Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@github-actions github-actions bot removed the C:x/gov label Jun 15, 2023
baseapp/abci.go Fixed Show fixed Hide fixed
baseapp/abci.go Fixed Show fixed Hide fixed
@facundomedica
Copy link
Member Author

I think we need to be especially careful when OE is aborted, because finalizeBlockState could be left in a bad state.

  1. InitChain happens, finalizeBlockState is not nil
  2. During FinalizeBlock we check if finalizeBlockState == nil, otherwise we re-use it (because we want to be able to access any state written during InitChain)
  3. If we abort, we need to be able to reset to the state that we had right after InitChain.

Maybe to keep it simple we can execute OE only if height > initialHeight. That way we can just clear up the state 100% of the times

baseapp/abci.go Outdated
Comment on lines 492 to 498
// processed the first block, as we want to avoid overwriting the finalizeState
// after state changes during InitChain.
if req.Height > app.initialHeight {
// abort any running OE
if app.oeEnabled && app.oeInfo != nil && app.oeInfo.Running() {
app.oeInfo.Abort()
_, _ = app.oeInfo.WaitResult() // ignore the result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change potentially affects state.

Call sequence:

(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).ProcessProposal (baseapp/abci.go:467)

baseapp/abci.go Fixed Show fixed Hide fixed
baseapp/optimistic_execution.go Fixed Show fixed Hide fixed
baseapp/optimistic_execution.go Fixed Show fixed Hide fixed
baseapp/oe/optimistic_execution.go Fixed Show fixed Hide fixed
baseapp/oe/optimistic_execution.go Fixed Show fixed Hide fixed
oe.logger.Error("OE aborted due to hash mismatch", "oe_hash", hex.EncodeToString(oe.request.Hash), "req_hash", hex.EncodeToString(reqHash), "oe_height", oe.request.Height, "req_height", oe.request.Height)
oe.cancelFunc()
return true
} else if oe.abortRate > 0 && rand.Intn(100) < oe.abortRate {

Check failure

Code scanning / gosec

Use of weak random number generator (math/rand instead of crypto/rand)

Use of weak random number generator (math/rand instead of crypto/rand)
@facundomedica
Copy link
Member Author

I tried cleaning up the concurrency model with little success, now I'm passing a context to internalFinalizeBlock which makes it unaware of the whole OE thing, it uses this context just to know if it needs to return early.

Using channels is a cleaner way to do this, but the problem is that the function we are passing as our FinalizeBlockFn has access to baseapp and to its properties, so we have partial control over what's going on with the state. This ended up in getting wrong app hashes from time to time.

A great improvement would be to have FinalizeBlock to use only whatever is passed to it and with little to no access to baseapp. This way we could run the entire block completely concurrently by passing only cached copies of the contexts and values; and by the time we abort we just discard all of these.
On the other hand, this could complicated this a bunch given that we'd need to make sure no other process modifies the base contexts and values being passed. Also I don't know if it's possible at all without a major refactor on baseapp

Comment on lines +101 to +110
go func() {
start := time.Now()
resp, err := oe.finalizeBlockFunc(ctx, oe.request)
oe.mtx.Lock()
executionTime := time.Since(start)
oe.logger.Debug("OE finished", "duration", executionTime.String(), "height", req.Height, "hash", hex.EncodeToString(req.Hash))
oe.response, oe.err = resp, err
close(oe.stopCh)
oe.mtx.Unlock()
}()

Check notice

Code scanning / CodeQL

Spawning a Go routine

Spawning a Go routine may be a possible source of non-determinism
oe.initialized = true

go func() {
start := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time

Calling the system time may be a possible source of non-determinism
baseapp/abci.go Outdated Show resolved Hide resolved
facundomedica and others added 2 commits August 29, 2023 12:18
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
@facundomedica
Copy link
Member Author

I've added a check so we don't start optimistic execution when ProcessProposal returns a "reject"

Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly doc nits.

@@ -241,7 +242,7 @@ func NewSimApp(
voteExtHandler := NewVoteExtensionHandler()
voteExtHandler.SetHandlers(bApp)
}
baseAppOptions = append(baseAppOptions, voteExtOp)
baseAppOptions = append(baseAppOptions, voteExtOp, baseapp.SetOptimisticExecution())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it purposely done so that simapp only uses optimistic execution?
Or should we add it here:

baseapp.SetQueryGasLimit(cast.ToUint64(appOpts.Get(FlagQueryGasLimit))),
?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wouldn't like to add a flag for this, because that could give node operators the chance of not running optimistic execution if the developers chose to do so (they can still modify the code but that takes some extra effort).
Could we add a flag on simapp only? wdyt?

Copy link
Member

@julienrbrt julienrbrt Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a flag wouldn't make sense.
I was taking about adding it in our default baseapp options.
But I think I remember conversation about having it disabled by default first 🤔, so it makes sense to add it only in simapp.

baseapp/baseapp.go Show resolved Hide resolved
baseapp/oe/optimistic_execution.go Show resolved Hide resolved
baseapp/oe/optimistic_execution.go Show resolved Hide resolved
Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK!

@facundomedica facundomedica added this pull request to the merge queue Sep 18, 2023
Merged via the queue into main with commit 8df065b Sep 18, 2023
@facundomedica facundomedica deleted the facu/oe branch September 18, 2023 14:12
@tac0turtle
Copy link
Member

@Mergifyio backport release/v0.50.x

@mergify
Copy link
Contributor

mergify bot commented Oct 23, 2023

backport release/v0.50.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Oct 23, 2023
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
(cherry picked from commit 8df065b)

# Conflicts:
#	CHANGELOG.md
#	baseapp/abci_test.go
CASABECI

This comment was marked as spam.

julienrbrt pushed a commit that referenced this pull request Oct 23, 2023
Co-authored-by: Facundo Medica <14063057+facundomedica@users.noreply.github.com>
Co-authored-by: Facundo <facundomedica@gmail.com>
@CASABECI

This comment was marked as spam.

CASABECI

This comment was marked as duplicate.

@alexanderbez
Copy link
Contributor

Great work @facundomedica 👏

@faddat faddat mentioned this pull request Nov 8, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants