Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Harvest: Improve HTTP Availability #35541

Merged
merged 8 commits into from
Feb 26, 2024

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Feb 22, 2024

What

Following the lack of page alerts for harvest, here is the fix.

How

Use the HTTP Availability

🚨 User Impact 🚨

Before

400

{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "\"Unable to connect to Harvest API with the provided credentials - HTTPError('401 Client Error: Unauthorized for url: https://api.harvestapp.com/v2/users?per_page=50&updated_since=2022-12-13+00%3A00%3A00%2B00%3A00')\""}}

401

{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "\"Unable to connect to Harvest API with the provided credentials - HTTPError('401 Client Error: Unauthorized for url: https://api.harvestapp.com/v2/users?per_page=50&updated_since=2022-12-13T00%3A00%3A00%2B00%3A00')\""}}

After

400

{"type": "TRACE", "trace": {"type": "ERROR", "emitted_at": 1708612772195.748, "error": {"message": "Something went wrong in the connector. See the logs for more details.", "internal_message": "400 Client Error: Bad Request for url: https://api.harvestapp.com/v2/users?per_page=50&updated_since=2022-12-13+00%3A00%3A00%2B00%3A00", "stack_trace": "Traceback (most recent call last):\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/main.py\", line 8, in <module>\n    run()\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/source_harvest/run.py\", line 14, in run\n    launch(source, sys.argv[1:])\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py\", line 214, in launch\n    for message in source_entrypoint.run(parsed_args):\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py\", line 114, in run\n    yield from map(AirbyteEntrypoint.airbyte_message_to_string, self.check(source_spec, config))\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/entrypoint.py\", line 138, in check\n    check_result = self.source.check(self.logger, config)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/abstract_source.py\", line 84, in check\n    check_succeeded, error = self.check_connection(logger, config)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/source_harvest/source.py\", line 73, in check_connection\n    return HarvestAvailabilityStrategy().check_availability(users_stream, logger, self)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py\", line 56, in check_availability\n    is_available, reason = self.handle_http_error(stream, logger, source, error)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py\", line 85, in handle_http_error\n    raise error\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/availability_strategy.py\", line 50, in check_availability\n    get_first_record_for_slice(stream, stream_slice)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/utils/stream_helper.py\", line 40, in get_first_record_for_slice\n    return next(records_for_slice)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 482, in read_records\n    yield from self._read_pages(\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 498, in _read_pages\n    request, response = self._fetch_next_page(stream_slice, stream_state, next_page_token)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 524, in _fetch_next_page\n    response = self._send_request(request, request_kwargs)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 422, in _send_request\n    return backoff_handler(user_backoff_handler)(request, request_kwargs)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/backoff/_sync.py\", line 105, in retry\n    ret = target(*args, **kwargs)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/backoff/_sync.py\", line 105, in retry\n    ret = target(*args, **kwargs)\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 381, in _send\n    raise exc\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 378, in _send\n    response.raise_for_status()\n  File \"/Users/maxime/devel/code/airbyte/airbyte-integrations/connectors/source-harvest/.venv/lib/python3.10/site-packages/requests/models.py\", line 1021, in raise_for_status\n    raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api.harvestapp.com/v2/users?per_page=50&updated_since=2022-12-13+00%3A00%3A00%2B00%3A00\n", "failure_type": "system_error"}}}

401

{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "'Unable to read users stream. The endpoint https://api.harvestapp.com/v2/users?per_page=50&updated_since=2022-12-13T00%3A00%3A00%2B00%3A00 returned 401: Unauthorized. Please ensure your credentials as valid.. Please visit https://docs.airbyte.com/integrations/sources/harvest to learn more.  invalid_token'"}}

I assume the before/after for 403 and 404 HTTP status is similar to 401

Copy link

vercel bot commented Feb 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 23, 2024 3:42pm

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Feb 22, 2024
@maxi297 maxi297 changed the title Handle availability properly 🐛 Source Harvest: Improve HTTP Availability Feb 22, 2024
Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense. To summarize the impact:

  • We've removed generalized exception swallowing. We expect more errors to come through
  • We explicitly except some invalid config setups (missing required properties) and through the http availability strategy, some known user-impacted http codes (401, 403, maybe 404)

We might see more exceptions we didn't know about that the user could actually fix, and if we do we should explicilty except them. We might also see more exceptions we didn't know about that are errors in our code that weren't getting surfaced before and we should fix if that happens.

reasons_for_codes: Dict[int, str] = {
requests.codes.UNAUTHORIZED: "Please ensure your credentials are valid.",
requests.codes.FORBIDDEN: "This is most likely due to insufficient permissions on the credentials in use.",
requests.codes.NOT_FOUND: "Please ensure that your account ID is properly set. If it is the case and you are still seeing this error, please contact Airbyte support.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an error we've encountered before, in that we know if we provide the wrong ID, it will give us a 404?

If not I'd take this one out of the list in case we get 404 because the stream slice was messed up or something like that

Copy link
Contributor Author

@maxi297 maxi297 Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While messing with the config, yes. Like mentioned in this error message, if the account ID is not set properly, this is the HTTP status code that will be provided. Hence, it felt like it made sense to put this as a config error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! If I were only thinking about the platform workflow, I would wonder if we should put this exclusively in check and not in the availability strategy, because the value of the account ID in the config is not something you can change without going through check again. Compared to an account's permissions to access a certain resource, which could be revoked on the user's end. I guess a whole account could be revoked...

But given that for other use cases (API etc) this could be the case that the account ID has changed in between, I'm fine leaving it. This is a wider problem though that made me try to start the conversation here.

Comment on lines +57 to +61
if "account_id" not in config:
raise AirbyteTracedException(
"Config validation error: 'account_id' is a required property",
failure_type=FailureType.config_error,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General step-back question: Do we have a generalized exception we raise if the input is not valid against the JSON Schema? (Could that handle this case if it did? I'm not actually sure, re: conditional requirements, but might work if it's a oneOf)

Obviously we enforce this on the frontend in the platform but with terraform, pyairbyte etc. we should catch these things more generally if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this make sense. I'm not sure if this was a possible in the whole workflow of the platform, but it is definitely possible from the source's perpective. Not sure if we should have something in the CDK to validate the config in the entrypoint before passing this to the source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we do! We validate the config against the spec in check and then optionally at the beginning of discover and read https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/entrypoint.py#L171

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the spec is correct re: what is required and what isn't, a Config Error should be raised by this validation

@maxi297 maxi297 merged commit 5ff133f into master Feb 26, 2024
37 checks passed
@maxi297 maxi297 deleted the maxi297/harvest-postmortem-availability-strategy branch February 26, 2024 16:44
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/harvest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants