Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaia, Under Load, crashes with a p2p error with Tendermint 0.34.13 #972

Closed
4 tasks
faddat opened this issue Sep 17, 2021 · 8 comments
Closed
4 tasks

Gaia, Under Load, crashes with a p2p error with Tendermint 0.34.13 #972

faddat opened this issue Sep 17, 2021 · 8 comments

Comments

@faddat
Copy link
Contributor

faddat commented Sep 17, 2021

Summary of Bug

created by github.com/tendermint/tendermint/p2p.(*peer).OnStart
        github.com/tendermint/tendermint@v0.34.13/p2p/peer.go:186 +0x72

goroutine 16461137 [select]:
github.com/tendermint/tendermint/p2p.(*peer).metricsReporter(0xc063a59800)
        github.com/tendermint/tendermint@v0.34.13/p2p/peer.go:351 +0xd3
created by github.com/tendermint/tendermint/p2p.(*peer).OnStart
        github.com/tendermint/tendermint@v0.34.13/p2p/peer.go:186 +0x72

goroutine 13023215 [semacquire]:
sync.runtime_SemacquireMutex(0xc06d166bd0, 0x20, 0xc07a494410)
        runtime/sema.go:71 +0x25
sync.(*RWMutex).RLock(...)
        sync/rwmutex.go:63
github.com/tendermint/tendermint/consensus.(*State).GetRoundState(0xc0649d4000)
        github.com/tendermint/tendermint@v0.34.13/consensus/state.go:236 +0x45
github.com/tendermint/tendermint/consensus.(*Reactor).queryMaj23Routine(0xc0160e4580, {0x1e635d8, 0xc02878c300}, 0xc02431e9c0)
        github.com/tendermint/tendermint@v0.34.13/consensus/reactor.go:795 +0x306
created by github.com/tendermint/tendermint/consensus.(*Reactor).AddPeer
        github.com/tendermint/tendermint@v0.34.13/consensus/reactor.go:195 +0x1f3
[root@archive ~]#
created by github.com/tendermint/tendermint/p2p.(*peer).OnStart
        github.com/tendermint/tendermint@v0.34.13/p2p/peer.go:186 +0x72

goroutine 16461137 [select]:
github.com/tendermint/tendermint/p2p.(*peer).metricsReporter(0xc063a59800)
        github.com/tendermint/tendermint@v0.34.13/p2p/peer.go:351 +0xd3
created by github.com/tendermint/tendermint/p2p.(*peer).OnStart
        github.com/tendermint/tendermint@v0.34.13/p2p/peer.go:186 +0x72

goroutine 13023215 [semacquire]:
sync.runtime_SemacquireMutex(0xc06d166bd0, 0x20, 0xc07a494410)
        runtime/sema.go:71 +0x25
sync.(*RWMutex).RLock(...)
        sync/rwmutex.go:63
github.com/tendermint/tendermint/consensus.(*State).GetRoundState(0xc0649d4000)
        github.com/tendermint/tendermint@v0.34.13/consensus/state.go:236 +0x45
github.com/tendermint/tendermint/consensus.(*Reactor).queryMaj23Routine(0xc0160e4580, {0x1e635d8, 0xc02878c300}, 0xc02431e9c0)
        github.com/tendermint/tendermint@v0.34.13/consensus/reactor.go:795 +0x306
created by github.com/tendermint/tendermint/consensus.(*Reactor).AddPeer
        github.com/tendermint/tendermint@v0.34.13/consensus/reactor.go:195 +0x1f3

Version

Current main branch

Steps to Reproduce

Take gaia.

Run relayer scripts found in rly folder of https://github.com/notional-labs/notional

she'll die every few hours.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@faddat
Copy link
Contributor Author

faddat commented Sep 17, 2021

Attempting to remedy by switching over to v5.0.6.

I'd updated to 0.34.13 earlier than the release of v5.0.6

However, it does seem that reverting to v5.0.5 solved this issue. I will post updates as more information comes in.

This issue is caused by v5.0.6 and likely tendermint v0.34.13.

I have increased the rpc traffic on my node somewhat ludicrously, and it has not crashed since I reverted.

@cmwaters
Copy link
Contributor

This log doesn't tell me too much. Are you suggesting that consensus dead locks and halts?

@faddat
Copy link
Contributor Author

faddat commented Sep 21, 2021

If you'd like I can maybe do a screen recording?

This one is quite clear and reproducible.

going back to v5.0.5 fixed it entirely.

@vae520283995
Copy link

I upgraded to v5.0.6 and had a similar error, switching back to v5.0.5 worked fine

@vae520283995
Copy link

I upgraded to v5.0.6 and had a similar error, switching back to v5.0.5 worked fine

#976

@yaruwangway
Copy link
Contributor

is this related to the "concurrent map read and map write" ?

cosmos/iavl#427
cosmos/cosmos-sdk#10040

@tac0turtle
Copy link
Member

could you post a complete log dump, maybe in pastebin? I have an idea as to why this is happening. Generally, applications like gaia should not update to latest tendermint without the sdk and other tooling also updating.

@tac0turtle
Copy link
Member

the change that caused this was reverted in 0.34.14 and 15. gaia has upgraded since

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants