-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binding Behavior Updates #1276
Comments
Add `backfill` counter to models for capture & materialization bindings as well as derivation transforms. Propagate the `backfill` value through the coresponding Validate request types and into built task specifications. Refactor `journal_read_suffix` encoding logic for resource paths to be aware of the `backfill` version. When `backfill` is zero, the encoded resource path is unchanged. Larger values of `backfill` add a new `.v1`, `.v2`, and so on suffix to the encoded resource path. Attach encoded resource paths to built specifications as a new `state_key` field, which connectors can use to key connector states. Also re-define the current `journal_read_suffix` fields in terms of the computed `state_key`. We may eventually remove `journal_read_suffix`, but not yet. Issue #1276
#1292 is up with implementation for backfill counters and threading through a persistent
I did not implement this part yet (though it doesn't seem strictly required to move this feature forward). Since our initial design conversations, an improvement made as part of the runtime refactors is that handling of connector states is now fully incremental: we don't re-write the complete connector state with each transaction, we only write the delta change. I'd like to do the same thing with runtime checkpoints. An implication is that there isn't a lot of downside that I can see in simply not pruning these states, which also means we can avoid explicit coordination between the runtime and the connector on precisely where state keys live within the overall connector state. |
Add `backfill` counter to models for capture & materialization bindings as well as derivation transforms. Propagate the `backfill` value through the coresponding Validate request types and into built task specifications. Refactor `journal_read_suffix` encoding logic for resource paths to be aware of the `backfill` version. When `backfill` is zero, the encoded resource path is unchanged. Larger values of `backfill` add a new `.v1`, `.v2`, and so on suffix to the encoded resource path. Attach encoded resource paths to built specifications as a new `state_key` field, which connectors can use to key connector states. Also re-define the current `journal_read_suffix` fields in terms of the computed `state_key`. We may eventually remove `journal_read_suffix`, but not yet. Issue #1276
Add `backfill` counter to models for capture & materialization bindings as well as derivation transforms. Propagate the `backfill` value through the coresponding Validate request types and into built task specifications. Refactor `journal_read_suffix` encoding logic for resource paths to be aware of the `backfill` version. When `backfill` is zero, the encoded resource path is unchanged. Larger values of `backfill` add a new `.v1`, `.v2`, and so on suffix to the encoded resource path. Attach encoded resource paths to built specifications as a new `state_key` field, which connectors can use to key connector states. Also re-define the current `journal_read_suffix` fields in terms of the computed `state_key`. We may eventually remove `journal_read_suffix`, but not yet. Issue #1276
Add `backfill` counter to models for capture & materialization bindings as well as derivation transforms. Propagate the `backfill` value through the coresponding Validate request types and into built task specifications. Refactor `journal_read_suffix` encoding logic for resource paths to be aware of the `backfill` version. When `backfill` is zero, the encoded resource path is unchanged. Larger values of `backfill` add a new `.v1`, `.v2`, and so on suffix to the encoded resource path. Attach encoded resource paths to built specifications as a new `state_key` field, which connectors can use to key connector states. Also re-define the current `journal_read_suffix` fields in terms of the computed `state_key`. We may eventually remove `journal_read_suffix`, but not yet. Issue #1276
Discussed in #1219
These changes enable bindings for captures, derivations, and materializations to work in a consistent way with respect to "resetting" the state of a binding and re-starting its backfill when a binding is removed from a spec, as well as adding capabilities for a binding to be "reset" without removing it from the spec via a new
backfillVersion
property. Additionally, they will allow materializations to drop & re-create the respective materialized table (or equivalent) when a materialization binding is reset.backfillVersion
andbindingKey
concepts:backfillVersion
to capture/derivation/materialization spec andbindingKey
to protocolImplement automatic pruning of driver & runtime checkpoints forbindingKey
's that are removed from task specsbindingKey
for keying binding state (depends on spec updates, above)Graceful shutdown of connectors:Makeconnector-init
propagate shutdown signals to connectors and give them a chance to exit gracefullyUpdate materializations to clean up after themselves when signaled to exit (related: SQL warehouse materializations don't do a very good job of cleaning up after themselves connectors#984) - for this scope of work, most relevant for those that track column-like type information and produce constraints for existing tablesUpdate "activate" logic to disable materialization shards before applying changes (when needed), and re-enable them after changes have been appliedbindingKey
state_key
on behalf of connectors even for captures / materializations built without it.backfill
counters are always monotonically increasing in the control planeApply
operationsThe text was updated successfully, but these errors were encountered: