-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TxnKV Scan lose data when table has more than one regions #600
Comments
I change tikv-cdc code, build and test if (currentCache.size() < limit) {
startKey = curRegionEndKey;
lastKey = Key.toRawKey(curRegionEndKey);
} else if (currentCache.size() > limit) {
throw new IndexOutOfBoundsException(
"current cache size = "
+ currentCache.size()
+ ", larger than "
+ conf.getScanBatchSize());
} else {
// Start new scan from exact next key in current region
lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
startKey = lastKey.next().toByteString();
} after change int scanLimit = Math.min(limit, conf.getScanBatchSize());
if (currentCache.size() < scanLimit) {
startKey = curRegionEndKey;
lastKey = Key.toRawKey(curRegionEndKey);
} else if (currentCache.size() > scanLimit) {
throw new IndexOutOfBoundsException(
"current cache size = "
+ currentCache.size()
+ ", larger than "
+ scanLimit);
} else {
// Start new scan from exact next key in current region
lastKey = Key.toRawKey(currentCache.get(currentCache.size() - 1).getKey());
startKey = lastKey.next().toByteString();
} (2) build 3.2.0-SNAPSHOT and test (step as :2. Minimal reproduce step ) mysql> select count(1) from tikv_client_test;
+----------+
| count(1) |
+----------+
| 200000 |
+----------+
1 row in set (0.08 sec) result : no data lost |
Anyone can help me? thanks so much~ |
It is a bug! Thanks for your contribution! The |
… wrong limit setting Signed-off-by: Yi Xie <xieyi01@rd.netease.com> Co-authored-by: Yi Xie <xieyi01@rd.netease.com>
修复的这个bug什么时候发布呀,目前还是 3.2.0版本 @iosmanthus |
Bug Report
1. TxnKV Scan lose data when table has more than one regions
2. Minimal reproduce step (Required)
(1)prepare a TiDB Table which has two regions
for example:
total count of tha table is 200000:
table regions:
(2)useing TxnKV to scan the table
code
notice: set endKey(The following code) larger than 535012(which is the END_KEY of REGION_ID: 15097 in this table), it will Stable repetition: data loss
result:
scan total size: 138082
total count of tha table is 200000. the however,the table scan only 138082 rows
3. What did you see instead (Required)
with debug , I find the reason is :
org.tikv.common.operation.iterator.ScanIterator
function: cacheLoadFails()
when scan REGION_ID: 15097, the currentCache is 10240(whic is control by the conf:tikv.grpc.scan_batch_size).
the startKey would be set as END_KEY of REGION_ID: 15097(curRegionEndKey)
then it will scan the table from the new startKey(curRegionEndKey), which causes loss data(from currentCache.get(currentCache.size()-1) to END_KEY of REGION_ID: 15097)
the key source is :https://github.com/tikv/client-java/blob/v3.2.0/src/main/java/org/tikv/common/operation/iterator/ScanIterator.java#L94
4. What did you expect to see? (Required)
(1) Could you please teel me What is the intent of this design?or it's a bug?
(2)Maybe startKey should be set to(in this situation) :
(3) if it's not a bug, is there some good way to scan all the data when the table has more than one region
5. What are your Java Client and TiKV versions? (Required)
I'm looking forward to your reply, thank you so much!
The text was updated successfully, but these errors were encountered: