Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Problem: iterator on deeply nested cache contexts is extremely slow #617

Merged
merged 6 commits into from
Oct 5, 2021

Conversation

yihuang
Copy link
Contributor

@yihuang yihuang commented Oct 3, 2021

Closes: #616

Solution:

  • flatten cache contexts before doing GetTxLogsTransient

Description


For contributor use:

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

For admin use:

  • Added appropriate labels to PR (ex. WIP, R4R, docs, etc)
  • Reviewers assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@codecov
Copy link

codecov bot commented Oct 3, 2021

Codecov Report

Merging #617 (c96abc2) into main (cda968b) will decrease coverage by 0.04%.
The diff coverage is 37.50%.

❗ Current head c96abc2 differs from pull request most recent head d6e6d38. Consider uploading reports for the commit d6e6d38 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #617      +/-   ##
==========================================
- Coverage   51.76%   51.72%   -0.05%     
==========================================
  Files          65       65              
  Lines        5461     5477      +16     
==========================================
+ Hits         2827     2833       +6     
- Misses       2474     2482       +8     
- Partials      160      162       +2     
Impacted Files Coverage Δ
x/evm/keeper/context_stack.go 66.66% <28.57%> (-14.42%) ⬇️
x/evm/keeper/state_transition.go 68.60% <100.00%> (+0.20%) ⬆️

@yihuang yihuang marked this pull request as draft October 3, 2021 13:05
Closes: evmos#616

Solution:
- flatten cache contexts before doing `GetTxLogsTransient`
@yihuang yihuang marked this pull request as ready for review October 3, 2021 13:09
x/evm/keeper/context_stack.go Outdated Show resolved Hide resolved
Copy link
Contributor

@fedekunze fedekunze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide more context for this change (comments, docs, etc). This PR should also include a benchmark test to check the effect of this change with regard to performance.

@@ -60,6 +60,27 @@ func (cs *ContextStack) Commit() {
cs.cachedContexts = []cachedContext{}
}

// CommitToRevision commit the cache after the target revision,
// to improve efficiency of db operations.
func (cs *ContextStack) CommitToRevision(target int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a unit test for this function

Comment on lines +194 to +195
// flatten the cache contexts to improve efficiency of following db operations
k.ctxStack.CommitToRevision(revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this relate to k.CommitCachedContexts()? why is this called here instead of inside the !res.Failed condition? I need more context to understand the rationale of this change

Copy link
Contributor

@thomas-nguy thomas-nguy Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context, we discovered a bug in our testnet which make the network to halt.

Some uniswap call triggers a lot of internal call. And trying to replay the tx will end up in an infinite loop there
https://github.com/tharsis/ethermint/blob/main/x/evm/keeper/keeper.go#L181
https://github.com/cosmos/cosmos-sdk/blob/master/store/cachekv/mergeiterator.go#L206

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here doesn't provide context on the following questions:

  • How does flattening improve efficiency? Do you have benchmark results?
  • Is it just to commit changes and avoid the infinite loop?
  • What's the threshold (i.e max number) of internal calls that the EVM supports without crashing?
  • Can we add a test to check the max number of internal calls supported?
  • Do we still need to call the CommitCachedContexts() if the changes are being committed regardless if it fails or not? Can this logic commit an invalid result or affect the revert logic as well?

Since the underlying issue is from the SDK, you should open an issue there to get it fixed in the long term

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add relevant comments and tests based on the questions above

Copy link
Contributor

@leejw51crypto leejw51crypto Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2021-10-03 19-34-25

Keeper, ApplyTransaction,
logs := k.GetTxLogsTransient(txHash)

iter := store.Iterator(txHash.Bytes(), end) <- exact point
defer iter.Close()

blocking point
it hangs forever here,

// Commit commits all the cached contexts from top to bottom in order and clears the stack by setting an empty slice of cache contexts.
func (cs *ContextStack) Commit() {
	// commit in order from top to bottom
	for i := len(cs.cachedContexts) - 1; i >= 0; i-- {
		// keep all the cosmos events
		cs.initialCtx.EventManager().EmitEvents(cs.cachedContexts[i].ctx.EventManager().Events())
		if cs.cachedContexts[i].commit == nil {
			panic(fmt.Sprintf("commit function at index %d should not be nil", i))
		} else {
			cs.cachedContexts[i].commit()
		}
	}
	cs.cachedContexts = []cachedContext{}
}

Commit is recursive function

Copy link
Contributor

@leejw51crypto leejw51crypto Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this pr, before calling Commit, calls cached contexts Commit first linearly.

@Piyushblog

This comment was marked as spam.

thomas-nguy referenced this pull request in crypto-org-chain/ethermint Oct 4, 2021
yihuang added a commit to crypto-org-chain/cronos that referenced this pull request Oct 4, 2021
cherry picked two bug fix:
- [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617)
- [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
yihuang added a commit to crypto-org-chain/cronos that referenced this pull request Oct 4, 2021
Solution:
- cherry picked two bug fix:
 - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617)
 - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
yihuang added a commit to crypto-org-chain/cronos that referenced this pull request Oct 4, 2021
Solution:
- cherry picked two bug fix:
 - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617)
 - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
yihuang added a commit to crypto-org-chain/cronos that referenced this pull request Oct 4, 2021
Solution:
- cherry picked two bug fix:
 - [iterator on deeply nested cache contexts is extremely slow ](evmos/ethermint#617)
 - [tx log attribtue value not parsable by some client ](evmos/ethermint#615)
Copy link
Contributor

@fedekunze fedekunze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihuang @thomas-nguy I think this code is not very clear on the fix and the bug itself and seems more like a short-term patch. The issue should ultimately be resolved on the SDK. As for this repo, we should consider using other longer-term alternatives that can address these issues.

I'm happy to merge it but you should address these on your comments or follow up issues and PRs

Comment on lines +194 to +195
// flatten the cache contexts to improve efficiency of following db operations
k.ctxStack.CommitToRevision(revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here doesn't provide context on the following questions:

  • How does flattening improve efficiency? Do you have benchmark results?
  • Is it just to commit changes and avoid the infinite loop?
  • What's the threshold (i.e max number) of internal calls that the EVM supports without crashing?
  • Can we add a test to check the max number of internal calls supported?
  • Do we still need to call the CommitCachedContexts() if the changes are being committed regardless if it fails or not? Can this logic commit an invalid result or affect the revert logic as well?

Since the underlying issue is from the SDK, you should open an issue there to get it fixed in the long term

Comment on lines +194 to +195
// flatten the cache contexts to improve efficiency of following db operations
k.ctxStack.CommitToRevision(revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add relevant comments and tests based on the questions above

@fedekunze fedekunze enabled auto-merge (squash) October 5, 2021 09:57
@fedekunze
Copy link
Contributor

@yihuang please open an issue and follow up on the questions from above

@fedekunze
Copy link
Contributor

@yihuang can you fix the tests?

@fedekunze fedekunze merged commit 202bc5f into evmos:main Oct 5, 2021
fedekunze added a commit that referenced this pull request Oct 7, 2021
* Problem: iterator on deeply nested cache contexts is extremely slow

Closes: #616

Solution:
- flatten cache contexts before doing `GetTxLogsTransient`

* Update x/evm/keeper/context_stack.go

* changelog

Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com>
Co-authored-by: Federico Kunze <federico.kunze94@gmail.com>
fedekunze added a commit that referenced this pull request Oct 7, 2021
* docs: v0.6.0 changelog (#605) (#606)

* docs: v0.6.0 changelog

* update codeowners

* build(deps): bump github.com/cosmos/cosmos-sdk from 0.44.0 to 0.44.1 (#610)

* build(deps): bump github.com/cosmos/cosmos-sdk from 0.44.0 to 0.44.1

Bumps [github.com/cosmos/cosmos-sdk](https://github.com/cosmos/cosmos-sdk) from 0.44.0 to 0.44.1.
- [Release notes](https://github.com/cosmos/cosmos-sdk/releases)
- [Changelog](https://github.com/cosmos/cosmos-sdk/blob/v0.44.1/CHANGELOG.md)
- [Commits](cosmos/cosmos-sdk@v0.44.0...v0.44.1)

---
updated-dependencies:
- dependency-name: github.com/cosmos/cosmos-sdk
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* changelog

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Federico Kunze <federico.kunze94@gmail.com>

* cmd: use config on genaccounts (#483)

* cmd: use config on genaccounts

* update

* c++

* rpc: fix panic (#611)

* rpc: fix panic

* fix

* c++

* rpc: restructure JSON-RPC directory and rename server config (#612)

* Restructure ethermint/rpc repo structure and change import statements

* Add #400 to changelog

* fix filepath in util and json_rpc

* Move #400  to unreleased section

* evm: refactor `traceTx` (#613)

* DNM: debug traceTx

* c++

* deps: bump IBC-go (#621)

* deps: bump IBC-go

* changelog

* evm, rpc:  fix tx log attribute value is not parsable by some client (#615)

* Problem: tx log attribute value not parsable by some client

Closes: #614

Solution:
- encode the value to json string rather than bytes

Apply suggestions from code review

* rm cdc and changelog

Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com>
Co-authored-by: Federico Kunze <federico.kunze94@gmail.com>

* evm: fix iterator on deeply nested cache contexts (#617)

* Problem: iterator on deeply nested cache contexts is extremely slow

Closes: #616

Solution:
- flatten cache contexts before doing `GetTxLogsTransient`

* Update x/evm/keeper/context_stack.go

* changelog

Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com>
Co-authored-by: Federico Kunze <federico.kunze94@gmail.com>

* evm: add benchmark for deep context stack (#627)

* Problem: deep context stack efficienty is not benchmarked

Closes: #626

Solution:
- add a benchmark to demonstrate an extremely inefficiency in deep
  context stack

* Update x/evm/keeper/benchmark_test.go

* prefix storage is irrelevant

* add comment to state_transition.go

Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com>

* rpc: support personal apis with different keyring backends (#591)

* UPDATE Unlock keyring on start

* ADD comment

* ADD validation

Co-authored-by: Federico Kunze Küllmer <31522760+fedekunze@users.noreply.github.com>

* conflicts

* changelog

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Burckhardt <daniel.m.burckhardt@gmail.com>
Co-authored-by: yihuang <huang@crypto.com>
Co-authored-by: davcrypto <88310031+davcrypto@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem: iterator on deeply nested cache contexts is extremely slow
5 participants