-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDK: AbstractSource.read()
skips syncing stream if its unavailable (add AvailabilityStrategy
concept)
#19977
Conversation
…make default no AvailabilityStrategy
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here and great readability 👏
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
Outdated
Show resolved
Hide resolved
…tyStrategy unit tests passing for per-stream strategies
…t mocking doesn't support this
…andled HTTPErrors
airbyte-cdk/python/airbyte_cdk/sources/declarative/checks/check_stream.py
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/availability_strategy.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 👏 I left minor comments. LGTM, but please address @girarda questions before publishing the CDK and merging 😄
…hanges (#20523) * Revert "source-github: move known error handling to GithubAvailabilityStrategy (#19978)" This reverts commit f97db17. * Revert "🐛 Python CDK: fix `StopIteration` error for `check_availability` (#20429)" This reverts commit 4e9b014. * Revert "CDK: `AbstractSource.read()` skips syncing stream if its unavailable (add `AvailabilityStrategy` concept) (#19977)" This reverts commit 55a3288. * Restore changelog entries * bump CDK version * Bump Github version * Re-add removed dependencies * auto-bump connector version Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
…(add `AvailabilityStrategy` concept) (#19977) * Rough first implememtation of AvailabilityStrategy s * Basic unit tests for AvailabilityStrategy and ScopedAvailabilityStrategy * Make availability_strategy a property, separate out tests * Remove from DeclarativeSource, remove Source parameter from methods, make default no AvailabilityStrategy * Add skip stream if not available to read() * Changes to CDK to get source-github working using AvailabilityStrategy, flakecheck * reorganize cdk class, add HTTPAvailabilityStrategy test * cleanup, docstrings * pull out error handling into separate method * Pass source and logger to check_connection method * Add documentation links, handle 403 specifically * Fix circular import * Add AvailabilityStrategy to Stream and HTTPStream classes * Remove AS from abstract_source, add to Stream, HTTPStream, AvailabilityStrategy unit tests passing for per-stream strategies * Modify MockHttpStream to set no AvailabilityStrategy since source test mocking doesn't support this * Move AvailabilityStrategy class to sources.streams * Move HTTPAvailabilityStrategy to http module * Use pascal case for HttpAvailabilityStrategy * Remove docs message method :( and default to True availability on unhandled HTTPErrors * add check_availability method to stream class * Add optional source parameter * Add test for connector-specific documentation, small tests refactor * Add test that performs the read() function for stream with default availability strategy * Add test for read function behavior when stream is unavailable * Add 403 info in logger message * Don't return error for other HTTPErrors * Split up error handling into methods 'unavailable_error_codes' and 'get_reason_for_error' * rework overrideable list of status codes to be a dict with reasons, to enforce that users provide reasons for all listed errors * Fix incorrect typing * Move HttpAvailability to its own module, fix flake errors * Fix ScopedAvailabilityStrategy, docstrings and types for streams/availability_strategy.py * Docstrings and types for core.py and http/availability_strategy.py * Move _get_stream_slices to a StreamHelper class * Docstrings + types for stream_helpers.py, cleanup test_availability.py * Clean up test_source.py * Move logic of getting the initial record from a stream to StreamHelper class * Add changelog and bump minor version * change 'is True' and 'is False' behavior * use mocker.MagicMock * Remove ScopedAvailabilityStrategy * Don't except non-403 errors, check_stream uses availability_strategy if possible * CDK: pass error to reasons_for_error_codes * make get_stream_slice public * Add tests for raising unhandled errors and retries are handled * Add tests for CheckStream via AvailabilityStrategy * Add documentation for stream availability of http streams * Move availability unit tests to correct modules, report error message if possible * Add test for reporting specific error if available
…(add `AvailabilityStrategy` concept) (#19977) * Rough first implememtation of AvailabilityStrategy s * Basic unit tests for AvailabilityStrategy and ScopedAvailabilityStrategy * Make availability_strategy a property, separate out tests * Remove from DeclarativeSource, remove Source parameter from methods, make default no AvailabilityStrategy * Add skip stream if not available to read() * Changes to CDK to get source-github working using AvailabilityStrategy, flakecheck * reorganize cdk class, add HTTPAvailabilityStrategy test * cleanup, docstrings * pull out error handling into separate method * Pass source and logger to check_connection method * Add documentation links, handle 403 specifically * Fix circular import * Add AvailabilityStrategy to Stream and HTTPStream classes * Remove AS from abstract_source, add to Stream, HTTPStream, AvailabilityStrategy unit tests passing for per-stream strategies * Modify MockHttpStream to set no AvailabilityStrategy since source test mocking doesn't support this * Move AvailabilityStrategy class to sources.streams * Move HTTPAvailabilityStrategy to http module * Use pascal case for HttpAvailabilityStrategy * Remove docs message method :( and default to True availability on unhandled HTTPErrors * add check_availability method to stream class * Add optional source parameter * Add test for connector-specific documentation, small tests refactor * Add test that performs the read() function for stream with default availability strategy * Add test for read function behavior when stream is unavailable * Add 403 info in logger message * Don't return error for other HTTPErrors * Split up error handling into methods 'unavailable_error_codes' and 'get_reason_for_error' * rework overrideable list of status codes to be a dict with reasons, to enforce that users provide reasons for all listed errors * Fix incorrect typing * Move HttpAvailability to its own module, fix flake errors * Fix ScopedAvailabilityStrategy, docstrings and types for streams/availability_strategy.py * Docstrings and types for core.py and http/availability_strategy.py * Move _get_stream_slices to a StreamHelper class * Docstrings + types for stream_helpers.py, cleanup test_availability.py * Clean up test_source.py * Move logic of getting the initial record from a stream to StreamHelper class * Add changelog and bump minor version * change 'is True' and 'is False' behavior * use mocker.MagicMock * Remove ScopedAvailabilityStrategy * Don't except non-403 errors, check_stream uses availability_strategy if possible * CDK: pass error to reasons_for_error_codes * make get_stream_slice public * Add tests for raising unhandled errors and retries are handled * Add tests for CheckStream via AvailabilityStrategy * Add documentation for stream availability of http streams * Move availability unit tests to correct modules, report error message if possible * Add test for reporting specific error if available
What
How
check_availability
method to theStream
class to check availabilityAvailabilityStrategy
class to theStream
class to perform this checkHttpAvailabilityStrategy
class to theHTTPStream
class, and sets it as the defaultAvailabilityStrategy
for this classAbstractSource.read()
to skip syncing the stream if the stream is unavailableCheckStream
ConnectionChecker
class to first try using the availability strategy, if there is oneRecommended reading order
Implementation:
airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py
airbyte-cdk/python/airbyte_cdk/sources/streams/core.py
airbyte-cdk/python/airbyte_cdk/sources/availability_strategy.py
airbyte-cdk/python/airbyte_cdk/sources/streams/http/http.py
airbyte-cdk/python/airbyte_cdk/sources/streams/http/availability_strategy.py
Testing:
airbyte-cdk/python/unit_tests/sources/streams/test_availability_strategy.py
airbyte-cdk/python/unit_tests/sources/streams/http/test_availability_strategy.py
airbyte-cdk/python/unit_tests/sources/test_source.py
Refactoring/small improvements:
airbyte-cdk/python/airbyte_cdk/sources/utils/stream_helpers.py
🚨 User Impact 🚨
End users:
HttpStream
, theHTTPAvailabilityStrategy
will now be in effect for that stream, which means:Source developers:
Stream
, no change.HttpStream
, theHTTPAvailabilityStrategy
will now be in effect for that stream.Note: Sources that already handle 403 errors in their
read()
methods should move that handling to their ownAvailabilityStrategy
(if handling is more specific than the default implementation). That will be addressed in #17853