-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Harvest: Improve HTTP Availability #35541
🐛 Source Harvest: Improve HTTP Availability #35541
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
airbyte-integrations/connectors/source-harvest/source_harvest/availability_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-harvest/source_harvest/availability_strategy.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense. To summarize the impact:
- We've removed generalized exception swallowing. We expect more errors to come through
- We explicitly except some invalid config setups (missing required properties) and through the http availability strategy, some known user-impacted http codes (401, 403, maybe 404)
We might see more exceptions we didn't know about that the user could actually fix, and if we do we should explicilty except them. We might also see more exceptions we didn't know about that are errors in our code that weren't getting surfaced before and we should fix if that happens.
reasons_for_codes: Dict[int, str] = { | ||
requests.codes.UNAUTHORIZED: "Please ensure your credentials are valid.", | ||
requests.codes.FORBIDDEN: "This is most likely due to insufficient permissions on the credentials in use.", | ||
requests.codes.NOT_FOUND: "Please ensure that your account ID is properly set. If it is the case and you are still seeing this error, please contact Airbyte support.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an error we've encountered before, in that we know if we provide the wrong ID, it will give us a 404?
If not I'd take this one out of the list in case we get 404 because the stream slice was messed up or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While messing with the config, yes. Like mentioned in this error message, if the account ID is not set properly, this is the HTTP status code that will be provided. Hence, it felt like it made sense to put this as a config error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! If I were only thinking about the platform workflow, I would wonder if we should put this exclusively in check
and not in the availability strategy, because the value of the account ID in the config is not something you can change without going through check again. Compared to an account's permissions to access a certain resource, which could be revoked on the user's end. I guess a whole account could be revoked...
But given that for other use cases (API etc) this could be the case that the account ID has changed in between, I'm fine leaving it. This is a wider problem though that made me try to start the conversation here.
if "account_id" not in config: | ||
raise AirbyteTracedException( | ||
"Config validation error: 'account_id' is a required property", | ||
failure_type=FailureType.config_error, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General step-back question: Do we have a generalized exception we raise if the input is not valid against the JSON Schema? (Could that handle this case if it did? I'm not actually sure, re: conditional requirements, but might work if it's a oneOf)
Obviously we enforce this on the frontend in the platform but with terraform, pyairbyte etc. we should catch these things more generally if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this make sense. I'm not sure if this was a possible in the whole workflow of the platform, but it is definitely possible from the source's perpective. Not sure if we should have something in the CDK to validate the config in the entrypoint before passing this to the source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we do! We validate the config against the spec in check
and then optionally at the beginning of discover and read https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/entrypoint.py#L171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the spec is correct re: what is required and what isn't, a Config Error should be raised by this validation
What
Following the lack of page alerts for harvest, here is the fix.
How
Use the HTTP Availability
🚨 User Impact 🚨
Before
400
401
After
400
401
I assume the before/after for 403 and 404 HTTP status is similar to 401