-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store: Concurrent iterators cause panic with iavl 0.19 #13220
Comments
We've had to rollback the cachekv iterator and store code to the version from v0.44.5, as the vanilla SDK has always resulted in panics for us. This rolled back code has worked fine for us InjectiveLabs@7c184a9 Here's some more context from a Hackerone bug report we submitted many months ago.
|
The hacked code in CacheKV and tm-db continuously seem to bite us in the ass... |
At first, I thought this issue was caused by the update on the SDK (#12626) which now iterates through the tree without checking if the tree is mutable or not. This update looks wrong given that the MutableTree Iterator has this contract: |
I was able to replicate this issue with these: func TestIterateConcurrency(t *testing.T) {
tree, err := getTestTree(0)
require.NoError(t, err)
for i := 0; i < 100; i++ {
go func() {
for j := 0; j < 1000000; j++ {
tree.Set([]byte(fmt.Sprintf("%d%d", i, j)), rand.Bytes(1))
}
}()
tree.Iterate(func(key []byte, value []byte) bool {
return false
})
}
}
// TestConcurrency throws "fatal error: concurrent map iteration and map write" and
// also sometimes "fatal error: concurrent map writes"
func TestIteratorConcurrency(t *testing.T) {
tree := setupMutableTree(t)
for i := 0; i < 1000; i++ {
go func() {
for j := 0; j < 1000000; j++ {
tree.Set([]byte(fmt.Sprintf("%d%d", i, j)), rand.Bytes(1))
}
}()
itr, _ := tree.Iterator(nil, nil, true)
for ; itr.Valid(); itr.Next() {
}
}
} These tests don't fail on v0.17.3 or v0.18.0 but they do fail on v0.19.0 (after fast cache was added). There are also some new unanswered comments there that we might want to take a look at cosmos/iavl#468 cc: @p0mvn I would appreciate your thoughts on this one 🙏 |
When does this pattern of concurrently From my understanding, all data is kept in cachekv until On a quick look, this comment might be related to the issue: From my memory, we did not synchronize these maps to avoid performance overhead based on the knowledge that this read/write pattern should not happen. If the read/write patterns have changed, the maps in the IAVL discussion probably need to be synchronized |
In the SDK I'm not sure we have that happening, but for sure it's possible to do so; which is what we are seeing in the stack trace shared by Injective. @marbar3778 what do you think of: "this read/write pattern should not happen"? |
This is the main problem. Behaviour changed because it wasn't known and still isn't known. I think we may want to add a mutex to the value to replicate previous behaviour and then define the behaviour we should aim to move towards and refactor accordingly. |
Looking in the sdk is not best here. Users define how they interact with the store because its not well defined in the documentation. SDK is a poor example of complex usage of the store package imo |
Yeah, but also keep in mind the IAVL implementation states various invariants/contracts that must be held. So either we improve docs around this and state what is and is not allowed, or we look into supporting this behavior. Even in the later case, we should improve expected usage/contract docs. |
@facundomedica could we document expected behaviour of iavl in the sdk docs so users understand what is possible and what is not. Since you added the test in iavl I think that's sufficient |
Here: #13386 |
Summary of Bug
Injective reported a concurrent issue with iterators and fast node.
https://gist.github.com/achilleas-kal/1751d83e22b313a7f759add9ff21998f
We should write a couple tests in the store to test concurrency of trees being used in the sdk.
The text was updated successfully, but these errors were encountered: