Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binding Behavior Updates #1276

Closed
12 of 13 tasks
williamhbaker opened this issue Nov 3, 2023 · 1 comment
Closed
12 of 13 tasks

Binding Behavior Updates #1276

williamhbaker opened this issue Nov 3, 2023 · 1 comment
Assignees

Comments

@williamhbaker
Copy link
Member

williamhbaker commented Nov 3, 2023

Discussed in #1219

These changes enable bindings for captures, derivations, and materializations to work in a consistent way with respect to "resetting" the state of a binding and re-starting its backfill when a binding is removed from a spec, as well as adding capabilities for a binding to be "reset" without removing it from the spec via a new backfillVersion property. Additionally, they will allow materializations to drop & re-create the respective materialized table (or equivalent) when a materialization binding is reset.

  • New backfillVersion and bindingKey concepts:
    • Add backfillVersion to capture/derivation/materialization spec and bindingKey to protocol
    • Implement automatic pruning of driver & runtime checkpoints for bindingKey's that are removed from task specs
  • Captures: Use bindingKey for keying binding state (depends on spec updates, above)
  • Graceful shutdown of connectors:
  • Materializations drop & re-create tables (depends on graceful shutdown of connectors, above):
    • Update "activate" logic to disable materialization shards before applying changes (when needed), and re-enable them after changes have been applied
    • Make materializations drop tables that were previously being materialized when the target tables have a different bindingKey
  • Add temporary migration shim which populates state_key on behalf of connectors even for captures / materializations built without it.
  • Enforce that backfill counters are always monotonically increasing in the control plane
  • Use "asynchronous", runtime-driven strategy for Apply operations
@jgraettinger jgraettinger self-assigned this Nov 20, 2023
jgraettinger added a commit that referenced this issue Nov 21, 2023
Add `backfill` counter to models for capture & materialization bindings
as well as derivation transforms.

Propagate the `backfill` value through the coresponding Validate request
types and into built task specifications.

Refactor `journal_read_suffix` encoding logic for resource paths to be
aware of the `backfill` version. When `backfill` is zero, the encoded
resource path is unchanged. Larger values of `backfill` add a new `.v1`,
`.v2`, and so on suffix to the encoded resource path.

Attach encoded resource paths to built specifications as a new
`state_key` field, which connectors can use to key connector states.

Also re-define the current `journal_read_suffix` fields in terms of the
computed `state_key`.

We may eventually remove `journal_read_suffix`, but not yet.

Issue #1276
@jgraettinger
Copy link
Member

jgraettinger commented Nov 21, 2023

#1292 is up with implementation for backfill counters and threading through a persistent state_key for connector use.

Implement automatic pruning of driver & runtime checkpoints for bindingKey's that are removed from task specs

I did not implement this part yet (though it doesn't seem strictly required to move this feature forward).

Since our initial design conversations, an improvement made as part of the runtime refactors is that handling of connector states is now fully incremental: we don't re-write the complete connector state with each transaction, we only write the delta change. I'd like to do the same thing with runtime checkpoints.

An implication is that there isn't a lot of downside that I can see in simply not pruning these states, which also means we can avoid explicit coordination between the runtime and the connector on precisely where state keys live within the overall connector state.

jgraettinger added a commit that referenced this issue Nov 21, 2023
Add `backfill` counter to models for capture & materialization bindings
as well as derivation transforms.

Propagate the `backfill` value through the coresponding Validate request
types and into built task specifications.

Refactor `journal_read_suffix` encoding logic for resource paths to be
aware of the `backfill` version. When `backfill` is zero, the encoded
resource path is unchanged. Larger values of `backfill` add a new `.v1`,
`.v2`, and so on suffix to the encoded resource path.

Attach encoded resource paths to built specifications as a new
`state_key` field, which connectors can use to key connector states.

Also re-define the current `journal_read_suffix` fields in terms of the
computed `state_key`.

We may eventually remove `journal_read_suffix`, but not yet.

Issue #1276
jgraettinger added a commit that referenced this issue Nov 21, 2023
Add `backfill` counter to models for capture & materialization bindings
as well as derivation transforms.

Propagate the `backfill` value through the coresponding Validate request
types and into built task specifications.

Refactor `journal_read_suffix` encoding logic for resource paths to be
aware of the `backfill` version. When `backfill` is zero, the encoded
resource path is unchanged. Larger values of `backfill` add a new `.v1`,
`.v2`, and so on suffix to the encoded resource path.

Attach encoded resource paths to built specifications as a new
`state_key` field, which connectors can use to key connector states.

Also re-define the current `journal_read_suffix` fields in terms of the
computed `state_key`.

We may eventually remove `journal_read_suffix`, but not yet.

Issue #1276
jgraettinger added a commit that referenced this issue Nov 28, 2023
Add `backfill` counter to models for capture & materialization bindings
as well as derivation transforms.

Propagate the `backfill` value through the coresponding Validate request
types and into built task specifications.

Refactor `journal_read_suffix` encoding logic for resource paths to be
aware of the `backfill` version. When `backfill` is zero, the encoded
resource path is unchanged. Larger values of `backfill` add a new `.v1`,
`.v2`, and so on suffix to the encoded resource path.

Attach encoded resource paths to built specifications as a new
`state_key` field, which connectors can use to key connector states.

Also re-define the current `journal_read_suffix` fields in terms of the
computed `state_key`.

We may eventually remove `journal_read_suffix`, but not yet.

Issue #1276
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants