Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch empty state in incremental SAT #22353

Merged

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Feb 3, 2023

What

Improves the incremental SAT test to catch the case in which the STATE message yields no cursor value

closes #21863

Whats left todo

  • Add unit tests
  • Break test into own test case
  • Discuss concerns with team

Concerns

@airbytehq/connector-operations I have a few areas im unsure on with these tests that I want to discuss before making any large changes

1. records_with_state can emit an empty list even if called with many records

One of the issues here is that the records_with_state function uses the continue function if no state_value is found for a record.

If this happens for all records then BOOM empty list.

In the previous context of the test_two_sequential_reads test that is considered a pass.

But to me that seems like an error, if no state can be infered from any record using the cursor, then they havent given us a passing test config.

Is that right?

Impact if so: changing the SAT to account for it may cause previously passing connectors to fail

2. In test_two_sequential_reads we do not ensure that we get data on the second read

We should make sure that the test case can test for two reads that result in data.

Is that right?

Impact if so: changing the SAT to account for it WILL cause previously passing connectors to fail. Latest version of sentry for example

@bnchrch bnchrch requested a review from a team February 3, 2023 02:55
@bnchrch bnchrch self-assigned this Feb 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

❌ Sources (63)

Connector Version Changelog Publish
source-airtable 2.0.3
source-amazon-ads 1.0.0
source-amazon-seller-partner 0.2.31
source-amazon-sqs 0.1.0
source-amplitude 0.1.21
source-appsflyer 0.1.0
source-asana 0.1.5
source-azure-table 0.1.3
source-braintree 0.1.3
source-cart 0.2.0
source-chargebee 0.1.16
source-commercetools 0.1.0
source-confluence 0.1.1
source-datadog 0.1.0
source-delighted 0.2.0
source-drift 0.2.5
source-facebook-marketing 0.2.84
source-freshcaller 0.1.0
source-freshsales 0.1.2
source-freshservice 0.1.1
source-github 0.4.1
source-gitlab 1.0.2
source-google-ads 0.2.9
source-google-search-console 0.1.20
source-greenhouse 0.3.0
source-harvest 0.1.15
source-instagram 1.0.1
source-iterable 0.1.23
source-klaviyo 0.1.12
source-lemlist 0.1.1
source-lever-hiring 0.1.3
source-linnworks 0.1.5
source-mailchimp 0.3.4
source-mailgun 0.1.0
source-monday 0.2.2
source-notion 1.0.1
source-okta 0.1.14
source-onesignal 0.1.2
source-openweather 0.1.6
source-outreach 0.1.2
source-pardot 0.1.1
source-paystack 0.1.1
source-pinterest 0.2.2
source-pipedrive 0.1.13
source-plaid 0.3.2
source-posthog 0.1.8
source-prestashop 0.3.0
source-quickbooks-singer 0.1.5
source-recharge 0.2.6
(diff seed version)
source-retently 0.1.3
source-salesforce 2.0.1
source-salesloft 0.1.3
source-sendgrid 0.3.1
source-sentry 0.1.11
source-strava 0.1.2
source-surveymonkey 0.1.14
source-tplcentral 0.1.1
source-twilio 0.1.15
source-weatherstack 0.1.0
source-youtube-analytics 0.1.3
source-zendesk-sunshine 0.1.1
source-zendesk-talk 0.1.6
source-zenloop 0.1.4
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@erohmensing
Copy link
Contributor

erohmensing commented Feb 3, 2023

Caveat: I'm also unsure, I haven't done much on incremental tests yet. That being said, my interpretation:

If this happens for all records then BOOM empty list.
In the previous context of the test_two_sequential_reads test that is considered a pass.
But to me that seems like an error, if no state can be infered from any record using the cursor, then they havent given us a passing test config.
Is that right?

I think that sounds right - if the cursor_path supplied in the acceptance-test-config.yml doesn't actually point to anything, that sounds like something that should fail and be updated 🤔

In test_two_sequential_reads we do not ensure that we get data on the second read
We should make sure that the test case can test for two reads that result in data.
Is that right?

Reading brian's message, it sounds to me like ensuring we don't get data on the second read was intentional - basically saying if we do the sync and output the state message that we already processed all the data, if the state message was correct, on a second run, we shouldn't have more data to process. I think we would have to artificially inject new data after the first sync in order to have newly updated cursor values (e.g. updated_at) to process in the second sync?

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 3, 2023

Reading brian's message, it sounds to me like ensuring we don't get data on the second read was intentional - basically saying if we do the sync and output the state message that we already processed all the data, if the state message was correct, on a second run, we shouldn't have more data to process. I think we would have to artificially inject new data after the first sync in order to have newly updated cursor values (e.g. updated_at) to process in the second sync?

Ok so that makes sense that we want to test the case second read of same data with new cursor yields no records

Should we explicitly assert that?

Currently in the code:

           # Redacted above: Run first read and assert valid records and state

            output = docker_runner.call_read_with_state(connector_config, configured_catalog_for_incremental, state=state_input)
            records = filter_output(output, type_=Type.RECORD)

            for record_value, state_value, stream_name in records_with_state(records, complete_state, stream_mapping, cursor_paths):
                assert compare_cursor_with_threshold(
                    record_value, state_value, threshold_days
                ), f"Second incremental sync should produce records older or equal to cursor value from the state. Stream: {stream_name}"

records = filter_output(output, type_=Type.RECORD) on second read can, and in the case of sentry:v0.1.9, does return None.

But sometimes it does not.

And only then we test for the case Second incremental sync should produce records older or equal to cursor value from the state

My worry here is by not having test cases explicit for

  • Cursor from records applied to same records yields None
  • Second read leads to records ahead of cursor from first read

We end up only ever testing one case but not the other.


😅 But I'm also worried that I dont understand the system well enough here, and this might be a non-issue

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 3, 2023

Just going to ping @evantahler and @pedroslopez to get your thoughts.

@pedroslopez
Copy link
Contributor

records_with_state on second read can, and in the case of sentry, does return None.

But sometimes it does not.

And only then we test for the case Second incremental sync should produce records older or equal to cursor value from the state

This is intentionally structured this way because the implementation is kind of left up to the connector in how it wants to use state. Specifically, in whether > or >= is used when retrieving records from the API.

An example is, say you save state with the date 2020-01-01. Since this date isn't very granular, the implementation would likely use a "greater than or equal to" to retrieve the records so that we wouldn't miss any new record that may have been added on that date. This means that we may also get repeated records, but that's ok since we have "at least once" deliverability. If there wasn't any data at this point, or the connector implements the state using ">", then we would not get any records.

@evantahler
Copy link
Contributor

evantahler commented Feb 3, 2023

I'm having trouble parsing that python code (I think the test might be backwards? Is that the new state or starting state that's being compared?) but I think the None case is weird. If the CDK were stronger-typed, I think READ should always produce at least [] rather than None, perhaps?

I also think that Airbyte is a "deliver at least once" system, and we should always be choosing to re-send a record if it's the safer thing to do. So, I think that all sources probably should be using >= cursor comparisons ant not >, and the test should always also produce at least one record from the previous read? I guess that's not a hard rule if the source can be sure that cursors are high-resolution and un-changing. There's another test with "abnormally large state" that should produce no records, where the cursor is far higher than what would be reasonable (e.g. updated_at=9999-01-01)

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 3, 2023

@pedroslopez

I realized I had a typo in the message you quoted. (Sorry!)

Instead of records_with_state I meant records = filter_output(output, type_=Type.RECORD)

The second read from test_incremental_two_sequential_reads for sentry currently returns no records. Even before we filter for state.

(I updated the previous comment to correct this)

Does that change your answer at all?

@pedroslopez
Copy link
Contributor

pedroslopez commented Feb 3, 2023

@bechurch the filtering I'm referring to happens on the source side or by the API it calls, not something in our testing code. It could still come back with or without records for the reasons I mentioned (no new records have been emitted / using > in the source would lead to no records; or using >= in the source would lead to repeated records)

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 3, 2023

@pedroslopez Ok that makes sense.

I'm still stuck on one thing though.

Don't we want to have connectors give us enough data to be able to test in the context of "sequential reads, that both return data"?

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 3, 2023

FYI from my wonderful call with Pedro

  1. I'm going to keep this PR focused around empty state during first pass
  2. Im going to add an issue to grooming around testing the return the of second read for data as its likely a useful test case we aren't currently supporting
  3. We likely want to split these cases out into separate named tests

@bnchrch bnchrch force-pushed the bechurch/21863-add-incremental-sync-state-empty-test branch from 3d053b3 to 7eda185 Compare February 8, 2023 17:43
@bnchrch bnchrch marked this pull request as ready for review February 8, 2023 17:44
@bnchrch bnchrch changed the title Draft: Catch empty state in incremental SAT Catch empty state in incremental SAT Feb 8, 2023
@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 8, 2023

@connector-operations This is ready for review again.

@bnchrch bnchrch force-pushed the bechurch/21863-add-incremental-sync-state-empty-test branch from 7eda185 to df81f30 Compare February 8, 2023 22:04
@bnchrch bnchrch removed the request for review from a team February 9, 2023 16:37
@bnchrch bnchrch force-pushed the bechurch/21863-add-incremental-sync-state-empty-test branch from 9672dbc to 657ccf9 Compare February 9, 2023 16:45
@bnchrch bnchrch requested review from a team and brianjlai February 9, 2023 16:52
@bnchrch bnchrch requested a review from pedroslopez February 9, 2023 18:25
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, could you please run a couple of /test and the connectors that have this problem?

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 9, 2023

/test connector=connectors/source-sentry

Note: This test will fail.... but not for the test ive added

🕑 connectors/source-sentry https://github.com/airbytehq/airbyte/actions/runs/4138900260
❌ connectors/source-sentry https://github.com/airbytehq/airbyte/actions/runs/4138900260
🐛 https://gradle.com/s/ajeiok3rxxvnu

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestBasicRead::test_read[inputs0] - Failed: Stream proje...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
============== 1 failed, 35 passed, 2 skipped in 60.31s (0:01:00) ==============

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 9, 2023

/test connector=connectors/source-airtable

🕑 connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/4138906771
✅ connectors/source-airtable https://github.com/airbytehq/airbyte/actions/runs/4138906771
Python tests coverage:

Name                                Stmts   Miss  Cover
-------------------------------------------------------
source_airtable/__init__.py             2      0   100%
source_airtable/streams.py             92      1    99%
source_airtable/source.py              38      1    97%
source_airtable/schema_helpers.py      48      9    81%
source_airtable/auth.py                21     10    52%
-------------------------------------------------------
TOTAL                                 201     21    90%
	 Name                                                    Stmts   Miss  Cover   Missing
	 -------------------------------------------------------------------------------------
	 connector_acceptance_test/base.py                          12      4    67%   16-19
	 connector_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 connector_acceptance_test/conftest.py                     217    101    53%   37, 43-45, 50, 55, 78, 84, 90-92, 111, 116-118, 124-126, 132-133, 138-139, 144, 150, 159-168, 174-179, 194, 218, 249, 255, 263-271, 279-292, 300-313, 318-324, 331-342, 349-365
	 connector_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 connector_acceptance_test/tests/test_core.py              476    117    75%   53, 58, 97-108, 113-120, 124-125, 129-130, 380, 400, 438, 476-493, 506-517, 521-526, 532, 565-570, 608-615, 658-660, 663, 728-736, 748-751, 756, 812-813, 819, 822, 858-868, 881-906
	 connector_acceptance_test/tests/test_incremental.py       162     14    91%   58-65, 70-83, 252
	 connector_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 connector_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 connector_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 connector_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 connector_acceptance_test/utils/json_schema_helper.py     114     13    89%   31-32, 39, 42, 66-69, 97, 121, 203-205
	 -------------------------------------------------------------------------------------
	 TOTAL                                                    1711    347    80%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: Incremental syncs are not supported on this connector.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
================== 44 passed, 4 skipped in 142.99s (0:02:22) ===================

@bnchrch
Copy link
Contributor Author

bnchrch commented Feb 9, 2023

/test connector=connectors/source-twilio

🕑 connectors/source-twilio https://github.com/airbytehq/airbyte/actions/runs/4138914837
✅ connectors/source-twilio https://github.com/airbytehq/airbyte/actions/runs/4138914837
Python tests coverage:

Name                        Stmts   Miss  Cover
-----------------------------------------------
source_twilio/auth.py           8      0   100%
source_twilio/__init__.py       2      0   100%
source_twilio/source.py        28      1    96%
source_twilio/streams.py      283     22    92%
-----------------------------------------------
TOTAL                         321     23    93%
	 Name                                                    Stmts   Miss  Cover   Missing
	 -------------------------------------------------------------------------------------
	 connector_acceptance_test/base.py                          12      4    67%   16-19
	 connector_acceptance_test/config.py                       141      5    96%   87, 93, 239, 243-244
	 connector_acceptance_test/conftest.py                     217    101    53%   37, 43-45, 50, 55, 78, 84, 90-92, 111, 116-118, 124-126, 132-133, 138-139, 144, 150, 159-168, 174-179, 194, 218, 249, 255, 263-271, 279-292, 300-313, 318-324, 331-342, 349-365
	 connector_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 connector_acceptance_test/tests/test_core.py              476    117    75%   53, 58, 97-108, 113-120, 124-125, 129-130, 380, 400, 438, 476-493, 506-517, 521-526, 532, 565-570, 608-615, 658-660, 663, 728-736, 748-751, 756, 812-813, 819, 822, 858-868, 881-906
	 connector_acceptance_test/tests/test_incremental.py       162     14    91%   58-65, 70-83, 252
	 connector_acceptance_test/utils/asserts.py                 39      2    95%   62-63
	 connector_acceptance_test/utils/common.py                  94     10    89%   16-17, 32-38, 72, 75
	 connector_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 connector_acceptance_test/utils/connector_runner.py       133     33    75%   24-27, 46-47, 50-54, 57-58, 73-75, 78-80, 83-85, 88-90, 93-95, 124-125, 159-161, 208
	 connector_acceptance_test/utils/json_schema_helper.py     114     13    89%   31-32, 39, 42, 66-69, 97, 121, 203-205
	 -------------------------------------------------------------------------------------
	 TOTAL                                                    1711    347    80%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:98: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:507: The previous and actual discovered catalogs are identical.
================== 38 passed, 2 skipped in 1991.53s (0:33:11) ==================

@bnchrch bnchrch merged commit ce770d3 into master Feb 9, 2023
@bnchrch bnchrch deleted the bechurch/21863-add-incremental-sync-state-empty-test branch February 9, 2023 23:19
sh4sh pushed a commit that referenced this pull request Feb 10, 2023
* Catch state being empty

* Update test_two_sequential_reads to catch empty state on first read

* Add integration test of empty state

* Fix legacy state test

* Move state_name to variable

* Clean up

* Format

* Fix rogue test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SAT] - Test that STATE emitted during run matches provided sample states
5 participants