-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch empty state in incremental SAT #22353
Catch empty state in incremental SAT #22353
Conversation
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-airtable |
2.0.3 |
✅ | ✅ |
source-amazon-ads |
1.0.0 |
✅ | ✅ |
source-amazon-seller-partner |
0.2.31 |
✅ | ✅ |
source-amazon-sqs |
0.1.0 |
✅ | ✅ |
source-amplitude |
0.1.21 |
✅ | ✅ |
source-appsflyer |
0.1.0 |
✅ | ✅ |
source-asana |
0.1.5 |
✅ | ✅ |
source-azure-table |
0.1.3 |
✅ | ✅ |
source-braintree |
0.1.3 |
✅ | ✅ |
source-cart |
0.2.0 |
✅ | ✅ |
source-chargebee |
0.1.16 |
✅ | ✅ |
source-commercetools |
0.1.0 |
✅ | ✅ |
source-confluence |
0.1.1 |
✅ | ✅ |
source-datadog |
0.1.0 |
✅ | ✅ |
source-delighted |
0.2.0 |
✅ | ✅ |
source-drift |
0.2.5 |
✅ | ✅ |
source-facebook-marketing |
0.2.84 |
✅ | ✅ |
source-freshcaller |
0.1.0 |
✅ | ✅ |
source-freshsales |
0.1.2 |
✅ | ✅ |
source-freshservice |
0.1.1 |
✅ | ✅ |
source-github |
0.4.1 |
✅ | ✅ |
source-gitlab |
1.0.2 |
✅ | ✅ |
source-google-ads |
0.2.9 |
✅ | ✅ |
source-google-search-console |
0.1.20 |
✅ | ✅ |
source-greenhouse |
0.3.0 |
✅ | ✅ |
source-harvest |
0.1.15 |
✅ | ✅ |
source-instagram |
1.0.1 |
✅ | ✅ |
source-iterable |
0.1.23 |
✅ | ✅ |
source-klaviyo |
0.1.12 |
✅ | ✅ |
source-lemlist |
0.1.1 |
✅ | ✅ |
source-lever-hiring |
0.1.3 |
✅ | ✅ |
source-linnworks |
0.1.5 |
✅ | ✅ |
source-mailchimp |
0.3.4 |
✅ | ✅ |
source-mailgun |
0.1.0 |
✅ | ✅ |
source-monday |
0.2.2 |
✅ | ✅ |
source-notion |
1.0.1 |
✅ | ✅ |
source-okta |
0.1.14 |
✅ | ✅ |
source-onesignal |
0.1.2 |
✅ | ✅ |
source-openweather |
0.1.6 |
✅ | ✅ |
source-outreach |
0.1.2 |
✅ | ✅ |
source-pardot |
0.1.1 |
✅ | ✅ |
source-paystack |
0.1.1 |
✅ | ✅ |
source-pinterest |
0.2.2 |
✅ | ✅ |
source-pipedrive |
0.1.13 |
✅ | ✅ |
source-plaid |
0.3.2 |
✅ | ✅ |
source-posthog |
0.1.8 |
✅ | ✅ |
source-prestashop |
0.3.0 |
✅ | ✅ |
source-quickbooks-singer |
0.1.5 |
✅ | ✅ |
source-recharge |
0.2.6 |
✅ | ❌ (diff seed version) |
source-retently |
0.1.3 |
✅ | ✅ |
source-salesforce |
2.0.1 |
✅ | ✅ |
source-salesloft |
0.1.3 |
✅ | ✅ |
source-sendgrid |
0.3.1 |
✅ | ✅ |
source-sentry |
0.1.11 |
✅ | ✅ |
source-strava |
0.1.2 |
✅ | ✅ |
source-surveymonkey |
0.1.14 |
✅ | ✅ |
source-tplcentral |
0.1.1 |
✅ | ✅ |
source-twilio |
0.1.15 |
✅ | ✅ |
source-weatherstack |
0.1.0 |
✅ | ✅ |
source-youtube-analytics |
0.1.3 |
✅ | ✅ |
source-zendesk-sunshine |
0.1.1 |
✅ | ✅ |
source-zendesk-talk |
0.1.6 |
✅ | ✅ |
source-zenloop |
0.1.4 |
✅ | ✅ |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
Caveat: I'm also unsure, I haven't done much on incremental tests yet. That being said, my interpretation:
I think that sounds right - if the cursor_path supplied in the
Reading brian's message, it sounds to me like ensuring we don't get data on the second read was intentional - basically saying if we do the sync and output the state message that we already processed all the data, if the state message was correct, on a second run, we shouldn't have more data to process. I think we would have to artificially inject new data after the first sync in order to have newly updated cursor values (e.g. |
Ok so that makes sense that we want to test the case Should we explicitly assert that? Currently in the code: # Redacted above: Run first read and assert valid records and state
output = docker_runner.call_read_with_state(connector_config, configured_catalog_for_incremental, state=state_input)
records = filter_output(output, type_=Type.RECORD)
for record_value, state_value, stream_name in records_with_state(records, complete_state, stream_mapping, cursor_paths):
assert compare_cursor_with_threshold(
record_value, state_value, threshold_days
), f"Second incremental sync should produce records older or equal to cursor value from the state. Stream: {stream_name}"
But sometimes it does not. And only then we test for the case My worry here is by not having test cases explicit for
We end up only ever testing one case but not the other. 😅 But I'm also worried that I dont understand the system well enough here, and this might be a non-issue |
Just going to ping @evantahler and @pedroslopez to get your thoughts. |
This is intentionally structured this way because the implementation is kind of left up to the connector in how it wants to use state. Specifically, in whether > or >= is used when retrieving records from the API. An example is, say you save state with the date 2020-01-01. Since this date isn't very granular, the implementation would likely use a "greater than or equal to" to retrieve the records so that we wouldn't miss any new record that may have been added on that date. This means that we may also get repeated records, but that's ok since we have "at least once" deliverability. If there wasn't any data at this point, or the connector implements the state using ">", then we would not get any records. |
I'm having trouble parsing that python code (I think the test might be backwards? Is that the new state or starting state that's being compared?) but I think the I also think that Airbyte is a "deliver at least once" system, and we should always be choosing to re-send a record if it's the safer thing to do. So, I think that all sources probably should be using |
I realized I had a typo in the message you quoted. (Sorry!) Instead of The second read from (I updated the previous comment to correct this) Does that change your answer at all? |
@bechurch the filtering I'm referring to happens on the source side or by the API it calls, not something in our testing code. It could still come back with or without records for the reasons I mentioned (no new records have been emitted / using > in the source would lead to no records; or using >= in the source would lead to repeated records) |
@pedroslopez Ok that makes sense. I'm still stuck on one thing though. Don't we want to have connectors give us enough data to be able to test in the context of "sequential reads, that both return data"? |
FYI from my wonderful call with Pedro
|
3d053b3
to
7eda185
Compare
@connector-operations This is ready for review again. |
7eda185
to
df81f30
Compare
9672dbc
to
657ccf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, could you please run a couple of /test
and the connectors that have this problem?
/test connector=connectors/source-sentry Note: This test will fail.... but not for the test ive added
Build FailedTest summary info:
|
/test connector=connectors/source-airtable
Build PassedTest summary info:
|
/test connector=connectors/source-twilio
Build PassedTest summary info:
|
* Catch state being empty * Update test_two_sequential_reads to catch empty state on first read * Add integration test of empty state * Fix legacy state test * Move state_name to variable * Clean up * Format * Fix rogue test
What
Improves the incremental SAT test to catch the case in which the STATE message yields no cursor value
closes #21863
Whats left todo
Concerns
@airbytehq/connector-operations I have a few areas im unsure on with these tests that I want to discuss before making any large changes
1.
records_with_state
can emit an empty list even if called with many recordsOne of the issues here is that the
records_with_state
function uses thecontinue
function if nostate_value
is found for a record.If this happens for all records then BOOM empty list.
In the previous context of the
test_two_sequential_reads
test that is considered a pass.But to me that seems like an error, if no state can be infered from any record using the cursor, then they havent given us a passing test config.
Is that right?
Impact if so: changing the SAT to account for it may cause previously passing connectors to fail
2. In
test_two_sequential_reads
we do not ensure that we get data on the second readWe should make sure that the test case can test for two reads that result in data.
Is that right?
Impact if so: changing the SAT to account for it WILL cause previously passing connectors to fail. Latest version of sentry for example