Skip to content

[cdc-connector][cdc-base] Add SNAPSHOT mode for Incremental CDC Source #2867

@loserwang1024

Description

@loserwang1024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Current, Incremental CDC Source connector is always an unbound stream, no matter in any mode (INITIAL,EARLIEST_OFFSET, LATEST_OFFSET, SPECIFIC_OFFSETS,TIMESTAMP).

Sometimes, users just want to replication bounded data then recycle the Flink resource. Moreover, some connectors can also eliminate impact on database after finishing job, for example, remove slot for Postgresql connector.

Solution

Add SNAPSHOT mode in Incremental CDC Source, only read log until reaching the max high_watermarks.

the total process:

  1. Split into multiple chunks(snapshot splits)and read them in same way as initial mode.
  2. Read streaming split until the max high_watmark.
  3. Stop the job(with NoMoreSplitsEvent)

Then we have the consistency snapshot at the point the max high_watmark.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions