-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
membuffer: compare the keys of ART by chunk #1482
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor grammar tweaks would make these comments better. May let LLM do this. I commented on the wrong PR 😆
Approval by mistake. I haven't reviewed this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to compare it with something like this?
// Use unsafe for faster byte comparison
p1 := unsafe.Pointer(&l1Key[depth])
p2 := unsafe.Pointer(&l2Key[depth])
// Compare 8 bytes at a time
remaining := minLen - depth
for remaining >= 8 {
if *(*uint64)(p1) != *(*uint64)(p2) {
// Find first different byte using trailing zeros
xor := *(*uint64)(p1) ^ *(*uint64)(p2)
return depth + uint32(bits.TrailingZeros64(xor) >> 3)
}
p1 = unsafe.Pointer(uintptr(p1) + 8)
p2 = unsafe.Pointer(uintptr(p2) + 8)
depth += 8
remaining -= 8
}
The unsafe implementation.
That's about +15% speed.
The common cases aren't that stable on my PC, and even the master code runs much slower than when the result in the description now (there's not much difference anyway). The chunk code runs faster primarily due to better CPU pipeline utilization and more accurate prediction through chunk comparison. The unsafe implementation runs faster, the only concern is it seems to only work on little-endian machines, and I'm not sure if we need to worry about big-endian architectures... |
We can check internal/goarch, and fallback to other methods for big-endian architectures. |
That's an internal package, it'll get compile error like.
|
return 0 | ||
} | ||
|
||
p1 := unsafe.Pointer(&l1Key[depth]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On some architectures (e.g., older ARM architectures), unaligned memory access is significantly slower than aligned access.
Skip considering it is generally safe and keeps code simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is said that unaligned access may even crash. Shall we consider an alignment check like
isAligned := (uintptr(p1) & 7) == 0 && (uintptr(p2) & 7) == 0
?
Can we add a GOARCH check and only do this optimization for selected architectures?
We can use runtime.GOARCH
directly. @you06
Signed-off-by: you06 <you1474600@gmail.com> compare byte slices in chunk Signed-off-by: you06 <you1474600@gmail.com> remove unrelated test Signed-off-by: you06 <you1474600@gmail.com> compare by unsafe casting Signed-off-by: you06 <you1474600@gmail.com> remove unused chunk cast func Signed-off-by: you06 <you1474600@gmail.com> only compare by chunk for amd64 and arm64 Signed-off-by: you06 <you1474600@gmail.com>
eecef6b
to
30733e7
Compare
Signed-off-by: you06 <you1474600@gmail.com>
internal/unionstore/art/art_node.go
Outdated
@@ -355,10 +349,49 @@ func (an *artNode) asNode256(a *artAllocator) *node256 { | |||
// longestCommonPrefix returns the length of the longest common prefix of two keys. | |||
// the LCP is calculated from the given depth, you need to guarantee l1Key[:depth] equals to l2Key[:depth] before calling this function. | |||
func longestCommonPrefix(l1Key, l2Key artKey, depth uint32) uint32 { | |||
switch runtime.GOARCH { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can introduce a global boolean and do the comparison only once
Signed-off-by: you06 <you1474600@gmail.com>
/hold The client-go's test seems ok, but the memory is violated when running with tidb.
|
|
Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
#1477 introduced a bug in the The error The CI in TiDB can pass after fixing both bugs. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cfzjywxk, ekexium The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
ref pingcap/tidb#55287
In order to perform path compression in ART, we need to find the longest common prefix the keys in many places, when the common prefix is long, this can be slow.
This PR compares the keys by chunk. Because a chunk is a uint64 which can contains 8 bytes, this can improve the performance a lot when the common prefix is long.