Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent CDK: source-stripe: concurrent execution for incremental syncs #32057

Closed
3 of 4 tasks
maxi297 opened this issue Nov 1, 2023 · 2 comments
Closed
3 of 4 tasks

Comments

@maxi297
Copy link
Contributor

maxi297 commented Nov 1, 2023

What area the feature impact?

Connectors

Relevant Information

Goal: Reduce first incremental sync duration

Once the Concurrent CDK supports incremental syncs with state conversion, stripe can now leverage this.

Why only the first incremental sync?

  • There are a couple of issues with stripe at the moment. For example
  • full_refresh and first incremental syncs leverage mostly the same code path (i.e. stripe has this logic of consuming different endpoint if there is a state is empty example)
  • There is not much value to running things concurrent once the first incremental sync is done as it should be up-to-date and hence should only have one slice

There are other issues that were documented as part of this spreadsheet. So instead of doing all streams first incremental sync concurrent, we will work on a stream by stream basis as follow:

  • Events
  • CreatedCursorIncrementalStripeStream
  • UpdatedCursorIncrementalStripeStream
  • Persons
  • SetupAttempts

For now, we won't tackle:

[DONE] Missing CDK feature:
The cursor needs to know what is the lowest boundary expected. Let's assume the following partitions are generated:

  • start: 0, end: 1
  • start: 2: end: 3
    If the second is successful but not the first, no state should be emitted. This is different than the logic we implemented as we only take the upper boundary of the first continuous range which in that case would be 3.

Acceptance criteria

  • Implement the CDK missing feature described above
  • Have incremental syncs be concurrent for streams
  • Events
  • CreatedCursorIncrementalStripeStream
  • UpdatedCursorIncrementalStripeStream
  • Persons
  • SetupAttempts
  • We sync the same records as a full_refresh. As we don't have much visibility, manual test a couple of connections for that
  • Notify GL of this logic
@maxi297 maxi297 changed the title source-stripe: concurrent execution of first incremental sync Concurrent CDK: source-stripe: concurrent execution of first incremental sync Nov 1, 2023
@maxi297 maxi297 self-assigned this Nov 7, 2023
@maxi297
Copy link
Contributor Author

maxi297 commented Nov 9, 2023

Put in ice box until we can test this better

@maxi297
Copy link
Contributor Author

maxi297 commented Feb 1, 2024

We will only migrate the following streams for now:

  • balance_transactions
  • events
  • file_links
  • files
  • shipping_rates

The reasons are describes by this spreadsheet and this document

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants