-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛Source Klaviyo: add state_checkpoint_interval #32291
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
|
||
### Step 2: Set up the Klaviyo connector in Airbyte | ||
|
||
1. [Log into your Airbyte Cloud](https://cloud.airbyte.io/workspaces) account. | ||
2. Click **Sources** and then click **+ New source**. | ||
2. Click **Sources** and then click **+ new source**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me either, unfortunately. This is just my assumption based on what I've seen on other connectors which don't have such issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Keep me posted on the result of this change and we'll update the checklist based on that. Thanks!
|
||
cursor_field = "updated" | ||
api_revision = "2023-02-22" | ||
page_size = 100 | ||
state_checkpoint_interval = 100 # API can return maximum 100 records per page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit afraid of this change as we don't use sorting in the API endpoints. How can we validate that the records are emitted in the order of the cursor field? If they are not, this is dangerous as we could have:
- emit record with cursor value
10
- emit a state message with cursor value
10
- emit record with cursor value
5
- the sync crash
In that case, we wouldn't lose the record with cursor value 5
but we can see that if the crash happens just before, we would have lost the record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is only applied to the streams which have sorting by cursor field at the API level - we add sorting param here: https://github.com/airbytehq/airbyte/pull/32291/files#diff-eb5f2206f720a1d96317e5a8b62e21de2eaa12b401666efccb798362b337eaf3R148
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given all the checkpoint_interval you added are for IncrementalKlaviyoStream, everything looks good then.
One non blocking question though: why the comment next to the checkpoint interval? How is the page size linked to the checkpoint_interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connector is emitting its state after every page it's read. If the page is not full, the state will be emitted at the end of the read, otherwise, since pages are sorted by cursor, we can checkpointing after each page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has been changed somewhat recently : we are emitting a state message after X number of records not after X number of records in a slice. Does that change the way you look at this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, since there are no overridden stream_slices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one non blocking comment so I'm good to approve this PR. Thanks Anton!
What
Updates requested during certification checklist review:
klaviyo.md
How
Set
state_checkpoint_interval
property to maximum items per page returned by API for streamsProfiles
,GlobalExclusions
,Events
,Flows
, andEmailTemplates
Recommended reading order
streams.py
klaviyo.md
🚨 User Impact 🚨
No breaking changes
Pre-merge Actions
Updating a connector
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.