-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reason for using Mutex instead of RWMutex in querier #6048
Comments
No reason. Do you want to make some benchmarks and show us the difference? |
We tested this with tons of simultaneous valid queries with result while syncing the blocks. With Mutex, one read query blocks other queries. RWMutex mitigates this problem by allowing multiple reads at the same time. Using RWMutex also makes program to utilize more cpu resources, which is good for cost efficiency, and it means you can serve more clients with same machine. with Mutex, 16% (0.49/3.02) of total time spent in Wait: |
cool! do you want to submit a PR? |
We will make a PR once it's stable for real |
@hanjukim Do you have any update on this? It looks like this is now causing significant slowness inside the entire Cosmos Hub as well now. If you want, I can open a PR implementing your solution if you don't have time |
It may be possible to completely remove |
Curious why |
Yes, we have some progress here.
|
One more thing: We have a special project called mantle-sdk, and mantle that wraps over the Tendermint and the CosmosSDK and serves data via GraphQL. We tried to touch the same blocker path
We are guessing from this result that it is probably a problem related to IAVL tree. (Huge amount of rebalancing between blocks?) |
Yeah sounds like it! |
We found out that this could potentially lead to cosmos application to panic. https://github.com/cosmos/cosmos-sdk/blob/v0.39.1/x/staking/keeper/validator.go#L45 In Technically this is not related to tendermint itself, but given most applications are using Tendermint+cosmos stack, we should be careful before merging this. FYI Above map cache thing is removed in current master of cosmos-sdk (stargate) -- maybe safe in the future? |
I can confirm that this has been removed with cosmos/cosmos-sdk#8546 as it was making gRPC queries crash and a lot slower anyway (cosmos/cosmos-sdk#8545). I don't think it will ever return as it was a tricky hack to avoid high deserialization costs due to Amino (which is now gone). For this reason, I think it shouldn't be considered an issue anymore. |
Has there been any progress on this issue since last month? |
See this commit: 1c4dbe3 I believe this is covered in the above change (sorry for not annotating the commit message appropriately!) I think the current view is that that while changing the locking strategy will help relieve some contention, but a lot of the performance bottlenecks are rooted deeper in the client implementations (e.g. IAVL related) which changing locks won't help with. I think it makes sense to close this issue given that the change has landed, but let me know if I've missed something or if it makes sense to keep this open for another reason. |
Is there any reason using just Mutex instead of RWMutex?
Query speed will be super faster with RWMutex (read lock on query and write lock on other parts),
tendermint/abci/client/local_client.go
Line 18 in 9b6d6a3
If we can ensure these queries are not doing write operation, we can change them to use read lock and can support concurrent queries.
tendermint/abci/client/local_client.go
Lines 246 to 251 in 9b6d6a3
tendermint/abci/client/local_client.go
Lines 100 to 102 in 9b6d6a3
The text was updated successfully, but these errors were encountered: