-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SATs: allow new records in a sequential read for full refresh test #17660
SATs: allow new records in a sequential read for full refresh test #17660
Conversation
NOTE
|
/test connector=bases/source-acceptance-test
Build PassedTest summary info:
|
NOTE
|
@Phlair take a look please |
will publish/merge this one once I get approve from Airbyte please |
/test connector=connectors/source-slack
Build PassedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make sure to test this on a handful more connectors since its a change to the core validation logic and out of curiosity, do you know how often these flaky tests were occurring and which connectors it was most prominent for?
output_diff = set(map(serializer, stream_records_1)).symmetric_difference(set(map(serializer, stream_records_2))) | ||
if output_diff: | ||
if not set(map(serializer, stream_records_1)).issubset(set(map(serializer, stream_records_2))): | ||
output_diff = set(map(serializer, stream_records_1)).symmetric_difference(set(map(serializer, stream_records_2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need to check the symmetric difference if we're effectively doing the validation above by verifying that records_1
is not a subset of records_2
?
Thinking about all the permutations, records_2
should at minimum have the same records as records_1
but could have more. So I think your above condition is sufficient set(map(serializer, stream_records_1)).issubset(set(map(serializer, stream_records_2)))
If we've gotten into this block where we know that records_1
has records that records_2
is missing, maybe we should be more descriptive and use records_1 - records_2
to show which records records_2
was missing that should have been there. The symmetric difference combines it all together which might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 , asserting that records_2 is a subset of records_1 is good enough for me. And feel free to show the difference for debugging.
As @brianjlai I'd be interested in knowing for which connector you encounter this problem. As SAT run on our sandbox accounts, most do not have new data very often. But for our GitHub account, it might indeed be the case. |
output_diff = set(map(serializer, stream_records_1)).symmetric_difference(set(map(serializer, stream_records_2))) | ||
if output_diff: | ||
if not set(map(serializer, stream_records_1)).issubset(set(map(serializer, stream_records_2))): | ||
output_diff = set(map(serializer, stream_records_1)).symmetric_difference(set(map(serializer, stream_records_2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 , asserting that records_2 is a subset of records_1 is good enough for me. And feel free to show the difference for debugging.
output_diff = set(map(serializer, stream_records_1)).symmetric_difference(set(map(serializer, stream_records_2))) | ||
if output_diff: | ||
if not set(map(serializer, stream_records_1)).issubset(set(map(serializer, stream_records_2))): | ||
output_diff = set(map(serializer, stream_records_1)).symmetric_difference(set(map(serializer, stream_records_2))) | ||
msg = f"{stream}: the two sequential reads should produce either equal set of records or one of them is a strict subset of the other" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message makes me realize the changes you are making are what we expected to checked:
- We want to make sure the first full refresh sync's records are a subset of the second full refresh records
/test connector=bases/source-acceptance-test
Build PassedTest summary info:
|
NOTE
|
@brianjlai @alafanechere thanks for the review. I made the appropriate change in the code. |
/test connector=connectors/source-slack
Build PassedTest summary info:
|
/test connector=connectors/source-instagram
Build PassedTest summary info:
|
/test connector=connectors/source-facebook-marketing
Build PassedTest summary info:
|
/test connector=connectors/source-google-ads
Build PassedTest summary info:
|
/test connector=connectors/source-google-analytics
Build FailedTest summary info:
|
/test connector=connectors/source-google-analytics-v4
Build PassedTest summary info:
|
…al-read-for-full-refresh-test
NOTE
|
…al-read-for-full-refresh-test
NOTE
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve to unblock but please implement the suggestions I left on the unit tests.
airbyte-integrations/bases/source-acceptance-test/unit_tests/test_test_full_refresh.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/bases/source-acceptance-test/unit_tests/test_test_full_refresh.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/bases/source-acceptance-test/unit_tests/test_test_full_refresh.py
Outdated
Show resolved
Hide resolved
…al-read-for-full-refresh-test
/test connector=bases/source-acceptance-test
Build PassedTest summary info:
|
NOTE
|
thanks, done! |
…al-read-for-full-refresh-test
NOTE
|
/publish connector=bases/source-acceptance-test auto-bump-version=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…17660) * SATs: allow new records in a sequential read for full refresh test * SATs: upd changelog * SATs: change the output when failing full refresh test * SATs: upd according to code review
* Implement ColumnSortButton component * Updates component name; Moves component to ui/Table folder; Refactors formattedMessageId property into using render content as children directly; Removes minor SortIcon component * Update airbyte-webapp/src/App.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Updates next properties: wasActive -> isActive, lowToLarge -> isAscending * Skip psql stop in acceptance test for gke (#18023) * Checks for iterator hasNext element (#18041) * Checks for iterator hasNext element * Fix linter with newline * Add Message Migration to Destination Connection Checks (#17954) * Add Message Migration to Destination Connection Checks * Fix test setup * Update helm release workflow (#18048) * Update workflow * Update trigger rules * fix: Update release workflow with abillity to add tags * Update workflow * Remove unused `airbyte-cli` (#18009) * 🐛 [low-code] $options shouldn't overwrite values that are already defined (#18060) * fix * Add missing test * remove prints * extract to method * rename * Add missing test * rename * bump * Update helm chart comments (#18072) * Update helm charts (#18073) * add test * fix chart.yaml * 16250 Destination Redis: Add SSH support (#17951) * 16250 Destination Redis: Add SSH support * 16250 Resolve port issue * 11679 Bump version * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * Bump helm chart version reference to 0.40.20 (#18074) * Bump helm chart version reference to 0.40.20 * remove binary Co-authored-by: xpuska513 <xpuska513@users.noreply.github.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> * Helm Chart: Create service annotations for airbyte-server (#17932) * Support annotations for airbyte-server as well, update version and update docs. * Fix auto-indent. Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> * Bmoric/remove dep server worker (#17894) * test [ci skip] * Autogenerated files * Add missing annotation * Remove unused json2Schema block from worker * Move tess * Missing deps and format * Fix test build * TMP * Add missing dependencies * PR comments * Tmp * [ci skip] Tmp * Fix acceptance test and add the seed dependency * Fix build * For diff * tmp * Build pass * make the worker to be on the platform only * fix setting.yaml * Fix pmd * Fix Cron * Add chart * Fix cron * Fix server build.gradle * Fix jar conflict * PR comments * Add cron micronaut environemnt * Updated connector catalog page (#18076) * Move the port forward outside of the main docker-compose (#17864) * Bump Airbyte version from 0.40.14 to 0.40.15 (#17970) Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> * 🎉 Source Shopify: Add metafield streams (#17962) * 🎉 Source Shopify: Add metafield streams * Source Shopify: fix unittest * Source Shopify: docs update * Source Shopify: fix backward compatibility test * Source Shopify: fix schemas * Source Shopify: fix state filter * Source Shopify: refactor & optimize * Source Shopify: fix test privileges * Source Shopify: fix stream filter * Source Shopify: fix streams * Source Shopify: update abnormal state * Source Shopify: fix abnormal state streams * Source Shopify: fix streams * updated methods, formated code * Source Shopify: typo fix * auto-bump connector version Co-authored-by: Oleksandr Bazarnov <oleksandr.bazarnov@globallogic.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * fix check for streams that do not use a stream slicer (#18080) * fix check for streams that do not use a stream slicer * increment version and changelog before publish * tolerate database nulls in webhook operation configs (#18084) * Implement webhook operation in the sync workflow (#18022) Implements the webhook operation as part of the sync workflow. - Introduces the new activity implementation - Updates the various interfaces that pass input to get the relevant configs to the sync workflow - Hooks the new activity into the sync workflow - Passes the webhook configs along into the sync workflow job * Bump helm chart version reference to 0.40.22 (#18077) * Added new "filters" python file, along with a "hash" filter. This can… (#18000) * Added new "filters" python file, along with a "hash" filter. This can be extended to include other custom filters in the future. * Added additional comments * Moved usage of the hash_obj inside the conditional that confirms it exists * Moved the hash function call inside a condition to ensure that it exists * Fixed the application of the salt , so that it does not modify the hash unless it is actually passed in. * Added unit tests to validate new jinja hash functionality * Updated unit test to pass numeric value as a float instead of string * Removed unreferenced import to pytest * Updated version * format * format * format * format * format Co-authored-by: Alexandre Girard <alexandre@airbyte.io> * Bump helm chart version reference to 0.40.24 (#18081) * Bump helm chart version reference to 0.40.24 * Update .gitignore Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> * SATs: allow new records in a sequential read for full refresh test (#17660) * SATs: allow new records in a sequential read for full refresh test * SATs: upd changelog * SATs: change the output when failing full refresh test * SATs: upd according to code review * Source facebook-marketing: remove `pixel` from custom conversions stream (#18045) * #744 source facebook-marketing: rm pixel from custom conversions stream * #744 source fb marketing: upd changelog * #744 source facebook-marketing - add custom_conversions to the test catalog * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * #17506 fix klaviyo & marketo expected_records (#18101) Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Co-authored-by: terencecho <3916587+terencecho@users.noreply.github.com> Co-authored-by: Ryan Fu <ryan.fu@airbyte.io> Co-authored-by: Jimmy Ma <gosusnp@users.noreply.github.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> Co-authored-by: Evan Tahler <evan@airbyte.io> Co-authored-by: Alexandre Girard <alexandre@airbyte.io> Co-authored-by: Yevhen Sukhomud <suhomud@gmail.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: xpuska513 <xpuska513@users.noreply.github.com> Co-authored-by: Prasanth <72515998+sfc-gh-pkommini@users.noreply.github.com> Co-authored-by: Benoit Moriceau <benoit@airbyte.io> Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com> Co-authored-by: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com> Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> Co-authored-by: Artem Inzhyyants <36314070+artem1205@users.noreply.github.com> Co-authored-by: Oleksandr Bazarnov <oleksandr.bazarnov@globallogic.com> Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> Co-authored-by: Michael Siega <109092231+mfsiega-airbyte@users.noreply.github.com> Co-authored-by: Alexander Marquardt <alexander.marquardt@gmail.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
…irbytehq#17660) * SATs: allow new records in a sequential read for full refresh test * SATs: upd changelog * SATs: change the output when failing full refresh test * SATs: upd according to code review
* Implement ColumnSortButton component * Updates component name; Moves component to ui/Table folder; Refactors formattedMessageId property into using render content as children directly; Removes minor SortIcon component * Update airbyte-webapp/src/App.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Updates next properties: wasActive -> isActive, lowToLarge -> isAscending * Skip psql stop in acceptance test for gke (airbytehq#18023) * Checks for iterator hasNext element (airbytehq#18041) * Checks for iterator hasNext element * Fix linter with newline * Add Message Migration to Destination Connection Checks (airbytehq#17954) * Add Message Migration to Destination Connection Checks * Fix test setup * Update helm release workflow (airbytehq#18048) * Update workflow * Update trigger rules * fix: Update release workflow with abillity to add tags * Update workflow * Remove unused `airbyte-cli` (airbytehq#18009) * 🐛 [low-code] $options shouldn't overwrite values that are already defined (airbytehq#18060) * fix * Add missing test * remove prints * extract to method * rename * Add missing test * rename * bump * Update helm chart comments (airbytehq#18072) * Update helm charts (airbytehq#18073) * add test * fix chart.yaml * 16250 Destination Redis: Add SSH support (airbytehq#17951) * 16250 Destination Redis: Add SSH support * 16250 Resolve port issue * 11679 Bump version * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * Bump helm chart version reference to 0.40.20 (airbytehq#18074) * Bump helm chart version reference to 0.40.20 * remove binary Co-authored-by: xpuska513 <xpuska513@users.noreply.github.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> * Helm Chart: Create service annotations for airbyte-server (airbytehq#17932) * Support annotations for airbyte-server as well, update version and update docs. * Fix auto-indent. Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> * Bmoric/remove dep server worker (airbytehq#17894) * test [ci skip] * Autogenerated files * Add missing annotation * Remove unused json2Schema block from worker * Move tess * Missing deps and format * Fix test build * TMP * Add missing dependencies * PR comments * Tmp * [ci skip] Tmp * Fix acceptance test and add the seed dependency * Fix build * For diff * tmp * Build pass * make the worker to be on the platform only * fix setting.yaml * Fix pmd * Fix Cron * Add chart * Fix cron * Fix server build.gradle * Fix jar conflict * PR comments * Add cron micronaut environemnt * Updated connector catalog page (airbytehq#18076) * Move the port forward outside of the main docker-compose (airbytehq#17864) * Bump Airbyte version from 0.40.14 to 0.40.15 (airbytehq#17970) Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> * 🎉 Source Shopify: Add metafield streams (airbytehq#17962) * 🎉 Source Shopify: Add metafield streams * Source Shopify: fix unittest * Source Shopify: docs update * Source Shopify: fix backward compatibility test * Source Shopify: fix schemas * Source Shopify: fix state filter * Source Shopify: refactor & optimize * Source Shopify: fix test privileges * Source Shopify: fix stream filter * Source Shopify: fix streams * Source Shopify: update abnormal state * Source Shopify: fix abnormal state streams * Source Shopify: fix streams * updated methods, formated code * Source Shopify: typo fix * auto-bump connector version Co-authored-by: Oleksandr Bazarnov <oleksandr.bazarnov@globallogic.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * fix check for streams that do not use a stream slicer (airbytehq#18080) * fix check for streams that do not use a stream slicer * increment version and changelog before publish * tolerate database nulls in webhook operation configs (airbytehq#18084) * Implement webhook operation in the sync workflow (airbytehq#18022) Implements the webhook operation as part of the sync workflow. - Introduces the new activity implementation - Updates the various interfaces that pass input to get the relevant configs to the sync workflow - Hooks the new activity into the sync workflow - Passes the webhook configs along into the sync workflow job * Bump helm chart version reference to 0.40.22 (airbytehq#18077) * Added new "filters" python file, along with a "hash" filter. This can… (airbytehq#18000) * Added new "filters" python file, along with a "hash" filter. This can be extended to include other custom filters in the future. * Added additional comments * Moved usage of the hash_obj inside the conditional that confirms it exists * Moved the hash function call inside a condition to ensure that it exists * Fixed the application of the salt , so that it does not modify the hash unless it is actually passed in. * Added unit tests to validate new jinja hash functionality * Updated unit test to pass numeric value as a float instead of string * Removed unreferenced import to pytest * Updated version * format * format * format * format * format Co-authored-by: Alexandre Girard <alexandre@airbyte.io> * Bump helm chart version reference to 0.40.24 (airbytehq#18081) * Bump helm chart version reference to 0.40.24 * Update .gitignore Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> * SATs: allow new records in a sequential read for full refresh test (airbytehq#17660) * SATs: allow new records in a sequential read for full refresh test * SATs: upd changelog * SATs: change the output when failing full refresh test * SATs: upd according to code review * Source facebook-marketing: remove `pixel` from custom conversions stream (airbytehq#18045) * airbytehq#744 source facebook-marketing: rm pixel from custom conversions stream * airbytehq#744 source fb marketing: upd changelog * airbytehq#744 source facebook-marketing - add custom_conversions to the test catalog * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * #17506 fix klaviyo & marketo expected_records (airbytehq#18101) Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Co-authored-by: terencecho <3916587+terencecho@users.noreply.github.com> Co-authored-by: Ryan Fu <ryan.fu@airbyte.io> Co-authored-by: Jimmy Ma <gosusnp@users.noreply.github.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> Co-authored-by: Evan Tahler <evan@airbyte.io> Co-authored-by: Alexandre Girard <alexandre@airbyte.io> Co-authored-by: Yevhen Sukhomud <suhomud@gmail.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: xpuska513 <xpuska513@users.noreply.github.com> Co-authored-by: Prasanth <72515998+sfc-gh-pkommini@users.noreply.github.com> Co-authored-by: Benoit Moriceau <benoit@airbyte.io> Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com> Co-authored-by: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com> Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> Co-authored-by: Artem Inzhyyants <36314070+artem1205@users.noreply.github.com> Co-authored-by: Oleksandr Bazarnov <oleksandr.bazarnov@globallogic.com> Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> Co-authored-by: Michael Siega <109092231+mfsiega-airbyte@users.noreply.github.com> Co-authored-by: Alexander Marquardt <alexander.marquardt@gmail.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
What
From time to time the source connector builds fail due to the different output of two subsequent reads. A common case is a new record in the second read which causes a fail. Since it takes time for many connectors to perform a full refresh sync, a new record is very likely to appear in the meantime.
How
Allow new records in full refresh test