-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CDK:
AbstractSource.read()
skips syncing stream if its unavailable …
…(add `AvailabilityStrategy` concept) (#19977) * Rough first implememtation of AvailabilityStrategy s * Basic unit tests for AvailabilityStrategy and ScopedAvailabilityStrategy * Make availability_strategy a property, separate out tests * Remove from DeclarativeSource, remove Source parameter from methods, make default no AvailabilityStrategy * Add skip stream if not available to read() * Changes to CDK to get source-github working using AvailabilityStrategy, flakecheck * reorganize cdk class, add HTTPAvailabilityStrategy test * cleanup, docstrings * pull out error handling into separate method * Pass source and logger to check_connection method * Add documentation links, handle 403 specifically * Fix circular import * Add AvailabilityStrategy to Stream and HTTPStream classes * Remove AS from abstract_source, add to Stream, HTTPStream, AvailabilityStrategy unit tests passing for per-stream strategies * Modify MockHttpStream to set no AvailabilityStrategy since source test mocking doesn't support this * Move AvailabilityStrategy class to sources.streams * Move HTTPAvailabilityStrategy to http module * Use pascal case for HttpAvailabilityStrategy * Remove docs message method :( and default to True availability on unhandled HTTPErrors * add check_availability method to stream class * Add optional source parameter * Add test for connector-specific documentation, small tests refactor * Add test that performs the read() function for stream with default availability strategy * Add test for read function behavior when stream is unavailable * Add 403 info in logger message * Don't return error for other HTTPErrors * Split up error handling into methods 'unavailable_error_codes' and 'get_reason_for_error' * rework overrideable list of status codes to be a dict with reasons, to enforce that users provide reasons for all listed errors * Fix incorrect typing * Move HttpAvailability to its own module, fix flake errors * Fix ScopedAvailabilityStrategy, docstrings and types for streams/availability_strategy.py * Docstrings and types for core.py and http/availability_strategy.py * Move _get_stream_slices to a StreamHelper class * Docstrings + types for stream_helpers.py, cleanup test_availability.py * Clean up test_source.py * Move logic of getting the initial record from a stream to StreamHelper class * Add changelog and bump minor version * change 'is True' and 'is False' behavior * use mocker.MagicMock * Remove ScopedAvailabilityStrategy * Don't except non-403 errors, check_stream uses availability_strategy if possible * CDK: pass error to reasons_for_error_codes * make get_stream_slice public * Add tests for raising unhandled errors and retries are handled * Add tests for CheckStream via AvailabilityStrategy * Add documentation for stream availability of http streams * Move availability unit tests to correct modules, report error message if possible * Add test for reporting specific error if available
- Loading branch information
1 parent
3c8bb42
commit 55a3288
Showing
16 changed files
with
681 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 33 additions & 0 deletions
33
airbyte-cdk/python/airbyte_cdk/sources/streams/availability_strategy.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# | ||
# Copyright (c) 2022 Airbyte, Inc., all rights reserved. | ||
# | ||
|
||
import logging | ||
import typing | ||
from abc import ABC, abstractmethod | ||
from typing import Optional, Tuple | ||
|
||
from airbyte_cdk.sources.streams import Stream | ||
|
||
if typing.TYPE_CHECKING: | ||
from airbyte_cdk.sources import Source | ||
|
||
|
||
class AvailabilityStrategy(ABC): | ||
""" | ||
Abstract base class for checking stream availability. | ||
""" | ||
|
||
@abstractmethod | ||
def check_availability(self, stream: Stream, logger: logging.Logger, source: Optional["Source"]) -> Tuple[bool, Optional[str]]: | ||
""" | ||
Checks stream availability. | ||
:param stream: stream | ||
:param logger: source logger | ||
:param source: (optional) source | ||
:return: A tuple of (boolean, str). If boolean is true, then the stream | ||
is available, and no str is required. Otherwise, the stream is unavailable | ||
for some reason and the str should describe what went wrong and how to | ||
resolve the unavailability, if possible. | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
120 changes: 120 additions & 0 deletions
120
airbyte-cdk/python/airbyte_cdk/sources/streams/http/availability_strategy.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# | ||
# Copyright (c) 2022 Airbyte, Inc., all rights reserved. | ||
# | ||
|
||
import logging | ||
import typing | ||
from typing import Dict, Optional, Tuple | ||
|
||
import requests | ||
from airbyte_cdk.sources.streams import Stream | ||
from airbyte_cdk.sources.streams.availability_strategy import AvailabilityStrategy | ||
from airbyte_cdk.sources.utils.stream_helpers import StreamHelper | ||
from requests import HTTPError | ||
|
||
if typing.TYPE_CHECKING: | ||
from airbyte_cdk.sources import Source | ||
|
||
|
||
class HttpAvailabilityStrategy(AvailabilityStrategy): | ||
def check_availability(self, stream: Stream, logger: logging.Logger, source: Optional["Source"]) -> Tuple[bool, Optional[str]]: | ||
""" | ||
Check stream availability by attempting to read the first record of the | ||
stream. | ||
:param stream: stream | ||
:param logger: source logger | ||
:param source: (optional) source | ||
:return: A tuple of (boolean, str). If boolean is true, then the stream | ||
is available, and no str is required. Otherwise, the stream is unavailable | ||
for some reason and the str should describe what went wrong and how to | ||
resolve the unavailability, if possible. | ||
""" | ||
try: | ||
stream_helper = StreamHelper() | ||
stream_helper.get_first_record(stream) | ||
except HTTPError as error: | ||
return self.handle_http_error(stream, logger, source, error) | ||
return True, None | ||
|
||
def handle_http_error( | ||
self, stream: Stream, logger: logging.Logger, source: Optional["Source"], error: HTTPError | ||
) -> Tuple[bool, Optional[str]]: | ||
""" | ||
Override this method to define error handling for various `HTTPError`s | ||
that are raised while attempting to check a stream's availability. | ||
Checks whether an error's status_code is in a list of unavailable_error_codes, | ||
and gets the associated reason for that error. | ||
:param stream: stream | ||
:param logger: source logger | ||
:param source: optional (source) | ||
:param error: HTTPError raised while checking stream's availability. | ||
:return: A tuple of (boolean, str). If boolean is true, then the stream | ||
is available, and no str is required. Otherwise, the stream is unavailable | ||
for some reason and the str should describe what went wrong and how to | ||
resolve the unavailability, if possible. | ||
""" | ||
try: | ||
status_code = error.response.status_code | ||
reason = self.reasons_for_unavailable_status_codes(stream, logger, source, error)[status_code] | ||
response_error_message = stream.parse_response_error_message(error.response) | ||
if response_error_message: | ||
reason += response_error_message | ||
return False, reason | ||
except KeyError: | ||
# If the HTTPError is not in the dictionary of errors we know how to handle, don't except it | ||
raise error | ||
|
||
def reasons_for_unavailable_status_codes( | ||
self, stream: Stream, logger: logging.Logger, source: Optional["Source"], error: HTTPError | ||
) -> Dict[int, str]: | ||
""" | ||
Returns a dictionary of HTTP status codes that indicate stream | ||
unavailability and reasons explaining why a given status code may | ||
have occurred and how the user can resolve that error, if applicable. | ||
:param stream: stream | ||
:param logger: source logger | ||
:param source: optional (source) | ||
:return: A dictionary of (status code, reason) where the 'reason' explains | ||
why 'status code' may have occurred and how the user can resolve that | ||
error, if applicable. | ||
""" | ||
forbidden_error_message = f"The endpoint to access stream '{stream.name}' returned 403: Forbidden. " | ||
forbidden_error_message += "This is most likely due to insufficient permissions on the credentials in use. " | ||
forbidden_error_message += self._visit_docs_message(logger, source) | ||
|
||
reasons_for_codes: Dict[int, str] = {requests.codes.FORBIDDEN: forbidden_error_message} | ||
return reasons_for_codes | ||
|
||
@staticmethod | ||
def _visit_docs_message(logger: logging.Logger, source: Optional["Source"]) -> str: | ||
""" | ||
Creates a message indicicating where to look in the documentation for | ||
more information on a given source by checking the spec of that source | ||
(if provided) for a 'documentationUrl'. | ||
:param logger: source logger | ||
:param source: optional (source) | ||
:return: A message telling the user where to go to learn more about the source. | ||
""" | ||
if not source: | ||
return "Please visit the connector's documentation to learn more. " | ||
|
||
try: | ||
connector_spec = source.spec(logger) | ||
docs_url = connector_spec.documentationUrl | ||
if docs_url: | ||
return f"Please visit {docs_url} to learn more. " | ||
else: | ||
return "Please visit the connector's documentation to learn more. " | ||
|
||
except FileNotFoundError: # If we are unit testing without implementing spec() method in source | ||
if source: | ||
docs_url = f"https://docs.airbyte.com/integrations/sources/{source.name}" | ||
else: | ||
docs_url = "https://docs.airbyte.com/integrations/sources/test" | ||
|
||
return f"Please visit {docs_url} to learn more." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
airbyte-cdk/python/airbyte_cdk/sources/utils/stream_helpers.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# | ||
# Copyright (c) 2022 Airbyte, Inc., all rights reserved. | ||
# | ||
|
||
from typing import Any, Mapping, Optional | ||
|
||
from airbyte_cdk.models import SyncMode | ||
from airbyte_cdk.sources.streams import Stream | ||
from airbyte_cdk.sources.streams.core import StreamData | ||
|
||
|
||
class StreamHelper: | ||
def get_first_record(self, stream: Stream) -> StreamData: | ||
""" | ||
Gets the first record for a stream. | ||
:param stream: stream | ||
:return: StreamData containing the first record in the stream | ||
""" | ||
# Some streams need a stream slice to read records (e.g. if they have a SubstreamSlicer) | ||
stream_slice = self.get_stream_slice(stream) | ||
records = stream.read_records(sync_mode=SyncMode.full_refresh, stream_slice=stream_slice) | ||
next(records) | ||
|
||
@staticmethod | ||
def get_stream_slice(stream: Stream) -> Optional[Mapping[str, Any]]: | ||
""" | ||
Gets the first stream_slice from a given stream's stream_slices. | ||
:param stream: stream | ||
:return: First stream slice from 'stream_slices' generator | ||
""" | ||
# We wrap the return output of stream_slices() because some implementations return types that are iterable, | ||
# but not iterators such as lists or tuples | ||
slices = iter( | ||
stream.stream_slices( | ||
cursor_field=stream.cursor_field, | ||
sync_mode=SyncMode.full_refresh, | ||
) | ||
) | ||
try: | ||
return next(slices) | ||
except StopIteration: | ||
return {} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.