Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale read for rawkv with read ts #96

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

iosmanthus
Copy link
Member

@iosmanthus iosmanthus commented Jun 7, 2022

Signed-off-by: iosmanthus myosmanthustree@gmail.com

This pull request is based on the #80.

Rendered

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>
Signed-off-by: iosmanthus <myosmanthustree@gmail.com>
Signed-off-by: iosmanthus <myosmanthustree@gmail.com>

TiKV currently supports **three** features to process read-only queries more efficiently.

1. Follower read.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean replica read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then correct name should be used.

}
```

2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. While TiKV is handling radw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.
2. While TiKV is handling raw read-related requests, construct a `SnapContext` with the `read_ts` before acquiring a snapshot from `storage`.


```diff
class RawKVClient {
+ ByteString rawGet(ByteString key, readTs: Timestamp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be confusing for client to understand what is readTs in RawKV.

Copy link
Member Author

@iosmanthus iosmanthus Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the time type to DataTime instead of using TimeStamp which might be more like the syntax of TiDB: https://docs.pingcap.com/tidb/dev/as-of-timestamp#syntax


### TiKV

While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader or `resolve-ts` worker. As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is safe_ts maintained in RawKV since there is no locks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no locks, the resolve_ts will advance the safe_ts by requesting the TSO for a timestamp periodically. The default config for the resolve_ts worker is 1s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More details are supplemented.

Signed-off-by: iosmanthus <myosmanthustree@gmail.com>
…thus/tikv-rfcs into stale-read-for-rawkv-with-read-ts

TiKV currently supports **three** features to process read-only queries more efficiently.

1. Follower read.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then correct name should be used.


1. Follower read.

Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to it to the client. This feature helps distribute the read stress on the leader but still increases the read latency.
Follower read allows reading from the followers. Without breaking the linear consistency guarantee, the follower will send a read-index request to the leader. The leader will not respond with the actual value, instead, send a round of heartbeats to confirm its leadership and calculate the largest commit index (read index) across the cluster for the follower. After the follower advances its apply index to the read index, it is safe to get data from the local storage and respond to the client. This feature helps distribute the read stress on the leader but still increases the read latency.


The `read_ts` specified by the client could be acquired by the following ways:

1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. **Deploying NTP services** in the cluster might mitigate this issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can preserve this value for the unbound stable read: read the latest data without checking safe_ts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how to preserve compatibility?


The `read_ts` specified by the client could be acquired by the following ways:

1. Calculate a timestamp from the physical time from the local. The `read_ts` might suffer from the clock drift and exceed the max timestamp allocated from TSO. The client will fail to read any data even if that target replica is the leader since the `safe_ts` of the replica don't catch up with the `read_ts`. **Deploying NTP services** in the cluster might mitigate this issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if read_ts exceeds the max timestamp allocated from TSO, maybe we can just return the latest data instead of no data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the read_ts will lost its restriction to the data freshness since some very stale replicas might be chosen.


### TiKV

While trying to read data, clients should specify a timestamp which attachs to the request header as `read_ts`, typically a timestamp few seconds ago. The replica should read the local storage with the `read_ts` and reuses the mechanism from the stale read of TxnKV. This requires the replica to check the `read_ts` against the `safe_ts` which is advaneced by `CheckLeader` message from the store of the leader (for follower) or `resolve-ts` worker (for leader). As long as the `safe_ts` is no less than `read_ts`, the replica is allowed to read the key from local storage. Notice that there is no lock for the RawKV regions, thus the `resolve-ts` worker advanced the `safe_ts` by requesting the TSO for the latest timestamp.
Copy link
Contributor

@pingyu pingyu Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What's the meaning of safe_ts for RawKV ? Suggest to give a definition, e.g, "the minimum timestamp of the on-the-fly RawKV writes", or "all writes before safe_ts can be read".

  2. If the definition depends on "timestamp" of RawKV writes, this feature depends on the timestamp introduced by API V2, is that right ?

  3. The mechanism to get the "minimal timestamp" of the on-the-fly writes between Txn & Raw would be quite different. Although there is no locks, Raw writes would still be "on-the-fly" during Raft procedure.

  4. RawKV CDC faces a very similar problem to track "on-the-fly" for resolved-ts. I think we can reuse it for stale read. Please refer to RawKV Change Data Capture #86 .

@BusyJay
Copy link
Member

BusyJay commented Aug 31, 2022

There is a special case that user may choose availability rather than consistency. So client is OK to read with any ts, that is just return what the replica has currently. In this RFC, it seems keys with larger ts may be skipped during read.

@iosmanthus
Copy link
Member Author

iosmanthus commented Aug 31, 2022

There is a special case that user may choose availability rather than consistency. So client is OK to read with any ts, that is just return what the replica has currently. In this RFC, it seems keys with larger ts may be skipped during read.

This RFC doesn't depend on the keys' timestamp, the underlying storage could have no information about the timestamp. The safe_ts is like the (approximate) timestamp of the leader that writes the key. The read_ts tends to be checked against the safe_ts to guarantee the replicas have already been synced with those writes that happened around safe_ts. To read any data, we could specify the read_ts to 0, and then any replicas with safe_ts > 0 could handle the request.

@BusyJay
Copy link
Member

BusyJay commented Sep 1, 2022

I'm OK with 0 timestamp. Currently, txn stale read consider ts 0 an error. And client (like TiDB) may actually send ts 0 by mistake. This RFC should state clear what 0 means in rawkv and implementation should not break compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants