-
Notifications
You must be signed in to change notification settings - Fork 566
Problem: iterator on deeply nested cache contexts is extremely slow #617
Conversation
Codecov Report
@@ Coverage Diff @@
## main #617 +/- ##
==========================================
- Coverage 51.76% 51.72% -0.05%
==========================================
Files 65 65
Lines 5461 5477 +16
==========================================
+ Hits 2827 2833 +6
- Misses 2474 2482 +8
- Partials 160 162 +2
|
Closes: evmos#616 Solution: - flatten cache contexts before doing `GetTxLogsTransient`
dd0a9e1
to
997c987
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide more context for this change (comments, docs, etc). This PR should also include a benchmark test to check the effect of this change with regard to performance.
@@ -60,6 +60,27 @@ func (cs *ContextStack) Commit() { | |||
cs.cachedContexts = []cachedContext{} | |||
} | |||
|
|||
// CommitToRevision commit the cache after the target revision, | |||
// to improve efficiency of db operations. | |||
func (cs *ContextStack) CommitToRevision(target int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a unit test for this function
// flatten the cache contexts to improve efficiency of following db operations | ||
k.ctxStack.CommitToRevision(revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this relate to k.CommitCachedContexts()
? why is this called here instead of inside the !res.Failed
condition? I need more context to understand the rationale of this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context, we discovered a bug in our testnet which make the network to halt.
Some uniswap call triggers a lot of internal call. And trying to replay the tx will end up in an infinite loop there
https://github.com/tharsis/ethermint/blob/main/x/evm/keeper/keeper.go#L181
https://github.com/cosmos/cosmos-sdk/blob/master/store/cachekv/mergeiterator.go#L206
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here doesn't provide context on the following questions:
- How does flattening improve efficiency? Do you have benchmark results?
- Is it just to commit changes and avoid the infinite loop?
- What's the threshold (i.e max number) of internal calls that the EVM supports without crashing?
- Can we add a test to check the max number of internal calls supported?
- Do we still need to call the
CommitCachedContexts()
if the changes are being committed regardless if it fails or not? Can this logic commit an invalid result or affect the revert logic as well?
Since the underlying issue is from the SDK, you should open an issue there to get it fixed in the long term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add relevant comments and tests based on the questions above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeper, ApplyTransaction,
logs := k.GetTxLogsTransient(txHash)
iter := store.Iterator(txHash.Bytes(), end) <- exact point
defer iter.Close()
blocking point
it hangs forever here,
// Commit commits all the cached contexts from top to bottom in order and clears the stack by setting an empty slice of cache contexts.
func (cs *ContextStack) Commit() {
// commit in order from top to bottom
for i := len(cs.cachedContexts) - 1; i >= 0; i-- {
// keep all the cosmos events
cs.initialCtx.EventManager().EmitEvents(cs.cachedContexts[i].ctx.EventManager().Events())
if cs.cachedContexts[i].commit == nil {
panic(fmt.Sprintf("commit function at index %d should not be nil", i))
} else {
cs.cachedContexts[i].commit()
}
}
cs.cachedContexts = []cachedContext{}
}
Commit is recursive function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in this pr, before calling Commit, calls cached contexts Commit first linearly.
This comment was marked as spam.
This comment was marked as spam.
cherry picked two bug fix: - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617) - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
Solution: - cherry picked two bug fix: - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617) - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
Solution: - cherry picked two bug fix: - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617) - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
Solution: - cherry picked two bug fix: - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617) - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yihuang @thomas-nguy I think this code is not very clear on the fix and the bug itself and seems more like a short-term patch. The issue should ultimately be resolved on the SDK. As for this repo, we should consider using other longer-term alternatives that can address these issues.
I'm happy to merge it but you should address these on your comments or follow up issues and PRs
// flatten the cache contexts to improve efficiency of following db operations | ||
k.ctxStack.CommitToRevision(revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here doesn't provide context on the following questions:
- How does flattening improve efficiency? Do you have benchmark results?
- Is it just to commit changes and avoid the infinite loop?
- What's the threshold (i.e max number) of internal calls that the EVM supports without crashing?
- Can we add a test to check the max number of internal calls supported?
- Do we still need to call the
CommitCachedContexts()
if the changes are being committed regardless if it fails or not? Can this logic commit an invalid result or affect the revert logic as well?
Since the underlying issue is from the SDK, you should open an issue there to get it fixed in the long term
// flatten the cache contexts to improve efficiency of following db operations | ||
k.ctxStack.CommitToRevision(revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add relevant comments and tests based on the questions above
@yihuang please open an issue and follow up on the questions from above |
@yihuang can you fix the tests? |
* Problem: iterator on deeply nested cache contexts is extremely slow Closes: #616 Solution: - flatten cache contexts before doing `GetTxLogsTransient` * Update x/evm/keeper/context_stack.go * changelog Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com> Co-authored-by: Federico Kunze <federico.kunze94@gmail.com>
* docs: v0.6.0 changelog (#605) (#606) * docs: v0.6.0 changelog * update codeowners * build(deps): bump github.com/cosmos/cosmos-sdk from 0.44.0 to 0.44.1 (#610) * build(deps): bump github.com/cosmos/cosmos-sdk from 0.44.0 to 0.44.1 Bumps [github.com/cosmos/cosmos-sdk](https://github.com/cosmos/cosmos-sdk) from 0.44.0 to 0.44.1. - [Release notes](https://github.com/cosmos/cosmos-sdk/releases) - [Changelog](https://github.com/cosmos/cosmos-sdk/blob/v0.44.1/CHANGELOG.md) - [Commits](cosmos/cosmos-sdk@v0.44.0...v0.44.1) --- updated-dependencies: - dependency-name: github.com/cosmos/cosmos-sdk dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * changelog Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Federico Kunze <federico.kunze94@gmail.com> * cmd: use config on genaccounts (#483) * cmd: use config on genaccounts * update * c++ * rpc: fix panic (#611) * rpc: fix panic * fix * c++ * rpc: restructure JSON-RPC directory and rename server config (#612) * Restructure ethermint/rpc repo structure and change import statements * Add #400 to changelog * fix filepath in util and json_rpc * Move #400 to unreleased section * evm: refactor `traceTx` (#613) * DNM: debug traceTx * c++ * deps: bump IBC-go (#621) * deps: bump IBC-go * changelog * evm, rpc: fix tx log attribute value is not parsable by some client (#615) * Problem: tx log attribute value not parsable by some client Closes: #614 Solution: - encode the value to json string rather than bytes Apply suggestions from code review * rm cdc and changelog Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com> Co-authored-by: Federico Kunze <federico.kunze94@gmail.com> * evm: fix iterator on deeply nested cache contexts (#617) * Problem: iterator on deeply nested cache contexts is extremely slow Closes: #616 Solution: - flatten cache contexts before doing `GetTxLogsTransient` * Update x/evm/keeper/context_stack.go * changelog Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com> Co-authored-by: Federico Kunze <federico.kunze94@gmail.com> * evm: add benchmark for deep context stack (#627) * Problem: deep context stack efficienty is not benchmarked Closes: #626 Solution: - add a benchmark to demonstrate an extremely inefficiency in deep context stack * Update x/evm/keeper/benchmark_test.go * prefix storage is irrelevant * add comment to state_transition.go Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com> * rpc: support personal apis with different keyring backends (#591) * UPDATE Unlock keyring on start * ADD comment * ADD validation Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com> * conflicts * changelog Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Burckhardt <daniel.m.burckhardt@gmail.com> Co-authored-by: yihuang <huang@crypto.com> Co-authored-by: davcrypto <88310031+davcrypto@users.noreply.github.com>
Closes: #616
Solution:
GetTxLogsTransient
Description
For contributor use:
docs/
) or specification (x/<module>/spec/
)godoc
comments.Unreleased
section inCHANGELOG.md
Files changed
in the Github PR explorerFor admin use:
WIP
,R4R
,docs
, etc)