-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False & true sharing issue of RWMutex
in mutexForRowContainer
#37641
Comments
After Patch: 249.8s
|
Update: 190sWhen fixing this issue, I find there is a mistake in my prior description. The rlock is only one for all workers in the current implementation and it's a bug obviously. So the performance issue here is not only false sharing but also true sharing. After fixing this true sharing issue in #37627, the execution time can reduce to 190s. The flame graph shows the time of I also test the case without padding and with a lock per worker. ConclusionThe performance issue is caused by false sharing and true sharing of |
RWMutex
in mutexForRowContainer
RWMutex
in mutexForRowContainer
RWMutex
in mutexForRowContainer
RWMutex
in mutexForRowContainer
Enhancement
@Yui-Song found a complex query has a very unstable execution time.
Slow One: 517.1s
Fast one: 333.3s
The main difference between these two executions is the
probe
time of the second join.The first one is
The second one is
In the flame graph, I find the
rLock.RLock
andrLock.RULock
ofmutexForRowContainer
is extremely slow.tidb/util/chunk/row_container.go
Lines 38 to 50 in aab394f
As the comments said, each goroutine holds its own
rLock
. SorLock
should not have any contention if there is no spill disk operation. Actually, the spill disk operation doesn't exist in these executions.If there is no contention,
rLock.rlock
will just add one toreaderCount
, an atomic variable inRWMutex
.In consideration of this, I extrapolate it's due to false sharing.
I filed a PR(#37627) to verify this extrapolation.
The result is amazing. The time of this query is almost twice as fast as the slow one.
After Patch: 249.8s
See the result of explain analyze below due to the issue that this comment is too long.
Test 4 times and the results are 249.8s, 244.2s, 247.1s, and 250.1s.
Furthermore, the execution time is stable after patching this demo PR.
After discussing with @XuHuaiyu, it's better to use a more elegant method instead of just adding padding for this
RWMutex
to fix this issue because there may be more false sharing issues in this join implementation.The text was updated successfully, but these errors were encountered: