Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: add a new command for polling new updates by sequence #2469

Closed
2 tasks done
git-hulk opened this issue Aug 5, 2024 · 4 comments · Fixed by #2472
Closed
2 tasks done

Proposal: add a new command for polling new updates by sequence #2469

git-hulk opened this issue Aug 5, 2024 · 4 comments · Fixed by #2472
Assignees
Labels
enhancement type enhancement

Comments

@git-hulk
Copy link
Member

git-hulk commented Aug 5, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Motivation

Currently, RocksDB provides an API GetUpdatesSince to allow us to poll the write batched by the sequence number. And Kvrocks is now depending on this mechanism to implement the partial sync(PSYNC). Except for that, the official migration tool kvrocks2redis is also using it to fetch new updates after parsing the entire DB, but it requires running alongside the DB dir. In some scenarios like CDC(Change Stream Capture) also has this requirement, but it’ll be too troublesome if it requires running an agent alongside each Kvrocks node.

As far as I know, some users also have this similar requirement[1]. So I propose to add a new command for this purpose:

POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>]
  • Sequence Number represents the start sequence of the polling operation and it’s a required argument.
  • MAX represents the maximum number of items that can be retrieved, it’s an optional argument and uses 16 as the default value if it’s missing
  • STRICT is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. GetUpdatesSince will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number.

And we also can extend more arguments like TIMEOUT/MIN, etc..

[1] https://www.revenuecat.com/blog/engineering/how-we-replicate-kvrocks-dataset/

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@git-hulk git-hulk added the enhancement type enhancement label Aug 5, 2024
@git-hulk git-hulk self-assigned this Aug 5, 2024
@PragmaTwice
Copy link
Member

Good idea! Some comments:

  • What's the output (and output format) of this command?
  • How to get the sequence number (by commands)?
  • POLLING UPDATES seems a little weird, could it be something like POLLUPDATES?

@git-hulk
Copy link
Member Author

git-hulk commented Aug 5, 2024

What's the output (and output format) of this command?

My initial thought is to support the raw batch(hex format) first, and then support the optional argument FORMAT in the following PR.

How to get the sequence number (by commands)?

We now could get the sequence number from the INFO command, to see if it is necessary to add a dedicated command for this.

POLLING UPDATES seems a little weird, could it be something like POLLUPDATES?

Sure, POLLUPDATES is good since I cannot foresee any other behaviors except updates for now.

@PragmaTwice
Copy link
Member

My initial thought is to support the raw batch(hex format) first, and then support the optional argument FORMAT in the following PR.

maybe we can add a RAW flag now? to make it extensible.

We now could get the sequence number from the INFO command, to see if it is necessary to add a dedicated command for this.

I think it's hard to use since users need to parse the output of INFO manually to get it. maybe a seperate command is better.

@git-hulk
Copy link
Member Author

git-hulk commented Aug 5, 2024

maybe we can add a RAW flag now? to make it extensible.

Sure, I have updated this.

I think it's hard to use since users need to parse the output of INFO manually to get it. maybe a seperate command is better.

What about adding a SEQUENCE command?

git-hulk added a commit to git-hulk/kvrocks that referenced this issue Aug 7, 2024
As proposed in issue apache#2469, we would like to add a new command for
polling updates from Kvrocks. The main purpose is to allow implement
feature like CDC(Change Stream Capture) without running an agent
alongside with Kvrocks instances to make it easier to operate.

The following is the command format:

```shell
POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>]
```

- `Sequence Number` represents the start sequence of the polling operation and it’s a required argument.
- `MAX` represents the maximum number of items that can be retrieved, it’s an optional argument and uses `16` as the default value if it’s missing
- `STRICT` is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. `GetUpdatesSince` will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number.

And the output contains the following sections:

- last_sequence
- updates
- format
- next_sequence

For example, we assume the DB's latest sequence is 100 and we send the
command: `POLLUPDATES 90 MAX 3 FORMAT RAW`, it will return the following
response:

- "latest_sequence"
- 100
- "format"
- RAW
- "updates"
  - batch-0
  - batch-1
  - batch-2
- "next_sequence"
- 93
git-hulk added a commit to git-hulk/kvrocks that referenced this issue Aug 7, 2024
As proposed in issue apache#2469, we would like to add a new command for
polling updates from Kvrocks. The main purpose is to allow implement
feature like CDC(Change Stream Capture) without running an agent
alongside with Kvrocks instances to make it easier to operate.

The following is the command format:

```shell
POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>]
```

- `Sequence Number` represents the start sequence of the polling operation and it’s a required argument.
- `MAX` represents the maximum number of items that can be retrieved, it’s an optional argument and uses `16` as the default value if it’s missing
- `STRICT` is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. `GetUpdatesSince` will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number.

And the output contains the following sections:

- last_sequence
- updates
- format
- next_sequence

For example, we assume the DB's latest sequence is 100 and we send the
command: `POLLUPDATES 90 MAX 3 FORMAT RAW`, it will return the following
response:

- "latest_sequence"
- 100
- "format"
- RAW
- "updates"
  - batch-0
  - batch-1
  - batch-2
- "next_sequence"
- 93
git-hulk added a commit to git-hulk/kvrocks that referenced this issue Aug 8, 2024
As proposed in issue apache#2469, we would like to add a new command for
polling updates from Kvrocks. The main purpose is to allow implement
feature like CDC(Change Stream Capture) without running an agent
alongside with Kvrocks instances to make it easier to operate.

The following is the command format:

```shell
POLLUPDATES <Sequence Number> [MAX <N>] [STRICT] [FORMAT <RAW>]
```

- `Sequence Number` represents the start sequence of the polling operation and it’s a required argument.
- `MAX` represents the maximum number of items that can be retrieved, it’s an optional argument and uses `16` as the default value if it’s missing
- `STRICT` is set means the update’s sequence MUST be exactly equal to the sequence number, it’s an optional argument. `GetUpdatesSince` will return the first available sequence if the sequence number is non-existent, so we allow users to specify if required to match the input sequence number.

And the output contains the following sections:

- last_sequence
- updates
- format
- next_sequence

For example, we assume the DB's latest sequence is 100 and we send the
command: `POLLUPDATES 90 MAX 3 FORMAT RAW`, it will return the following
response:

- "latest_sequence"
- 100
- "format"
- RAW
- "updates"
  - batch-0
  - batch-1
  - batch-2
- "next_sequence"
- 93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement type enhancement
Projects
None yet
2 participants