-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: add a new command for polling new updates by sequence #2469
Comments
Good idea! Some comments:
|
My initial thought is to support the raw batch(hex format) first, and then support the optional argument FORMAT in the following PR.
We now could get the sequence number from the INFO command, to see if it is necessary to add a dedicated command for this.
Sure, |
maybe we can add a RAW flag now? to make it extensible.
I think it's hard to use since users need to parse the output of INFO manually to get it. maybe a seperate command is better. |
Sure, I have updated this.
What about adding a SEQUENCE command? |
As proposed in issue apache#2469, we would like to add a new command for polling updates from Kvrocks. The main purpose is to allow implement feature like CDC(Change Stream Capture) without running an agent alongside with Kvrocks instances to make it easier to operate. The following is the command format: ```shell POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>] ``` - `Sequence Number` represents the start sequence of the polling operation and it’s a required argument. - `MAX` represents the maximum number of items that can be retrieved, it’s an optional argument and uses `16` as the default value if it’s missing - `STRICT` is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. `GetUpdatesSince` will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number. And the output contains the following sections: - last_sequence - updates - format - next_sequence For example, we assume the DB's latest sequence is 100 and we send the command: `POLLUPDATES 90 MAX 3 FORMAT RAW`, it will return the following response: - "latest_sequence" - 100 - "format" - RAW - "updates" - batch-0 - batch-1 - batch-2 - "next_sequence" - 93
As proposed in issue apache#2469, we would like to add a new command for polling updates from Kvrocks. The main purpose is to allow implement feature like CDC(Change Stream Capture) without running an agent alongside with Kvrocks instances to make it easier to operate. The following is the command format: ```shell POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>] ``` - `Sequence Number` represents the start sequence of the polling operation and it’s a required argument. - `MAX` represents the maximum number of items that can be retrieved, it’s an optional argument and uses `16` as the default value if it’s missing - `STRICT` is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. `GetUpdatesSince` will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number. And the output contains the following sections: - last_sequence - updates - format - next_sequence For example, we assume the DB's latest sequence is 100 and we send the command: `POLLUPDATES 90 MAX 3 FORMAT RAW`, it will return the following response: - "latest_sequence" - 100 - "format" - RAW - "updates" - batch-0 - batch-1 - batch-2 - "next_sequence" - 93
As proposed in issue apache#2469, we would like to add a new command for polling updates from Kvrocks. The main purpose is to allow implement feature like CDC(Change Stream Capture) without running an agent alongside with Kvrocks instances to make it easier to operate. The following is the command format: ```shell POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>] ``` - `Sequence Number` represents the start sequence of the polling operation and it’s a required argument. - `MAX` represents the maximum number of items that can be retrieved, it’s an optional argument and uses `16` as the default value if it’s missing - `STRICT` is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. `GetUpdatesSince` will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number. And the output contains the following sections: - last_sequence - updates - format - next_sequence For example, we assume the DB's latest sequence is 100 and we send the command: `POLLUPDATES 90 MAX 3 FORMAT RAW`, it will return the following response: - "latest_sequence" - 100 - "format" - RAW - "updates" - batch-0 - batch-1 - batch-2 - "next_sequence" - 93
Search before asking
Motivation
Currently, RocksDB provides an API
GetUpdatesSince
to allow us to poll the write batched by the sequence number. And Kvrocks is now depending on this mechanism to implement the partial sync(PSYNC). Except for that, the official migration toolkvrocks2redis
is also using it to fetch new updates after parsing the entire DB, but it requires running alongside the DB dir. In some scenarios like CDC(Change Stream Capture) also has this requirement, but it’ll be too troublesome if it requires running an agent alongside each Kvrocks node.As far as I know, some users also have this similar requirement[1]. So I propose to add a new command for this purpose:
Sequence Number
represents the start sequence of the polling operation and it’s a required argument.MAX
represents the maximum number of items that can be retrieved, it’s an optional argument and uses16
as the default value if it’s missingSTRICT
is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument.GetUpdatesSince
will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number.And we also can extend more arguments like TIMEOUT/MIN, etc..
[1] https://www.revenuecat.com/blog/engineering/how-we-replicate-kvrocks-dataset/
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: