Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-cdk: Improve Error Handling in Legacy CDK #37576

Merged
merged 51 commits into from
May 17, 2024

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented Apr 25, 2024

What

Part of initiative to improve error handling and debugging by providing new interfaces responsible for managing requests and error handling which include sensible error handling defaults. At this time these interfaces are not integrated into existing HttpStream and are only available for ad-hoc use for connectors.

How

  • HttpClient - Prepares and sends requests, executes follow-up actions (retrying, raising AirbyteTracedExceptions) depending on error mapping
  • ErrorHandler abstract class and HttpStatusErrorHandler - enables response error/exception mapping to response actions, failure types, and error messages and error mapping lookup. Defaults to HttpStatusErrorHandler.
  • ErrorResolution dataclass to encapsulate response actions, failure types, and error messages for defined exceptions or request status codes.
  • BackoffStrategy abstract class and DefaultBackoffStrategy - manages max_retries, max_time, and backoff_time. Defaults to DefaultBackoffStrategy.
  • ErrorMessageParser abstract class and JsonErrorMessageParser - into be instantiated in in stream and passed to HttpClient to parse error messages in responses. Defaults to JsonErrorMessageParser.

Recommended Reading Order

  1. http_client.py, start w/ _init__ and then send_request
  2. error_handler.py and http_status_error_handler.py
  3. response_models.py
  4. backoff_strategy.py and default_backoff_strategy.py
  5. rate_limiting.py
  6. exceptions.py
  7. json_error_message_parser.py and error_message_parser.py

Copy link

vercel bot commented Apr 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview May 16, 2024 6:55pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Apr 25, 2024
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, out-of-scope for now but interesting to keep in mind is the parsing of the response. For example:

# We've had a couple of customers with ProtocolErrors, namely:
# * A self-managed instance during `BulkSalesforceStream.download_data`. This customer had an abnormally high number of ConnectionError
# which seems to indicate problems with his network infrastructure in general. The exact error was: `urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(905 bytes read, 119 more expected)', IncompleteRead(905 bytes read, 119 more expected))`
# * A cloud customer with very long syncs. All those syncs would end up with the following error: `urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))`
# Without much more information, we will make it retryable hoping that performing the same request will work.
exceptions.ChunkedEncodingError,
# We've had examples where the response from Salesforce was not a JSON response. Those cases where error cases though. For example:
# https://github.com/airbytehq/airbyte-internal-issues/issues/6855. We will assume that this is an edge issue and that retry should help
exceptions.JSONDecodeError,

@girarda @brianjlai, I also had the following question: should we remove the dependency to requests from the interfaces? Right now, HttpRequestSender. send_request returns objects that are specific to the requests library and if we change that in the CDK (I think we discussed about moving to aiohttp at some point maybe), we would have to update all the sources using the HttpRequestSender. This is a much bigger lift though hence why I would like your take on this

pnilan added 9 commits April 26, 2024 12:57
…efaultRetryStrategy`, `HttpStatusErrorHandler` classes. Adds new `ErrorMapping` class. Updates `HttpRequestSender` to `HttpClient`.
…MessageParser` classes and default subclasses. Update `HttpClient` for better exception handling and to encapsulate session/caching
@pnilan pnilan changed the title airbyte-cdk: new HttpRequestSender & HttpErrorHandler to improve errors airbyte-cdk: Improve Error Handling in Legacy CDK Apr 30, 2024
@brianjlai
Copy link
Contributor

I also had the following question: should we remove the dependency to requests from the interfaces? Right now, HttpRequestSender. send_request returns objects that are specific to the requests library and if we change that in the CDK (I think we discussed about moving to aiohttp at some point maybe), we would have to update all the sources using the HttpRequestSender. This is a much bigger lift though hence why I would like your take on this

@maxi297 This is a good question! I do like the overall idea that the HttpClient does not bleed an implementation detail like which requester library is being used under the hood. I think our ideal set up would be to return our own Request and Response interfaces that we control instead of the library PreparedRequest/PreparedResponse.

What does concern me is as you mention the increase in lift. I think to avoid scope creep we can say that we can leave the interface as it is today and return the request/response from the requests library. A lot of our existing code like the existing HttpStream still reference prepared request/response classes too which also might add to the complexity of a migration. I feel like we may need to just punt on this in the immediate even though I agree that it's probably the ideal interface

@pnilan pnilan marked this pull request as ready for review May 4, 2024 00:16
@pnilan pnilan requested a review from a team as a code owner May 4, 2024 00:16
@pnilan
Copy link
Contributor Author

pnilan commented May 4, 2024

As we discussed, out-of-scope for now but interesting to keep in mind is the parsing of the response. For example:

# We've had a couple of customers with ProtocolErrors, namely:
# * A self-managed instance during `BulkSalesforceStream.download_data`. This customer had an abnormally high number of ConnectionError
# which seems to indicate problems with his network infrastructure in general. The exact error was: `urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(905 bytes read, 119 more expected)', IncompleteRead(905 bytes read, 119 more expected))`
# * A cloud customer with very long syncs. All those syncs would end up with the following error: `urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))`
# Without much more information, we will make it retryable hoping that performing the same request will work.
exceptions.ChunkedEncodingError,
# We've had examples where the response from Salesforce was not a JSON response. Those cases where error cases though. For example:
# https://github.com/airbytehq/airbyte-internal-issues/issues/6855. We will assume that this is an edge issue and that retry should help
exceptions.JSONDecodeError,

@girarda @brianjlai, I also had the following question: should we remove the dependency to requests from the interfaces? Right now, HttpRequestSender. send_request returns objects that are specific to the requests library and if we change that in the CDK (I think we discussed about moving to aiohttp at some point maybe), we would have to update all the sources using the HttpRequestSender. This is a much bigger lift though hence why I would like your take on this

Re: aiohttp -- This is a valid point. I'm not going to include this in this PR but will incorporate down the line.

@maxi297
Copy link
Contributor

maxi297 commented May 6, 2024

I feel like we may need to just punt on this in the immediate even though I agree that it's probably the ideal interface

I'm fine with this, yes. We can do the interface pretty similar with the request one anyway to smooth the transition

@pnilan pnilan force-pushed the pnilan/airbyte-cdk-improved-errors-and-sends branch from 26047ef to 94bd7a4 Compare May 14, 2024 14:23
@pnilan pnilan requested review from brianjlai and maxi297 May 14, 2024 21:26
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll confirm everything once I have time to integrate this properly in salesforce but looking at the code, I don't see any blocker. Thanks for addressing all those comments!

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last comment to discuss if you don't mind checking before ✅

@pnilan pnilan requested review from brianjlai, maxi297 and girarda May 16, 2024 20:28
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing work patrick. this looks great and thanks for and the discussion points

@pnilan pnilan merged commit 8396fd2 into master May 17, 2024
36 checks passed
@pnilan pnilan deleted the pnilan/airbyte-cdk-improved-errors-and-sends branch May 17, 2024 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants