-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-disable failing connections #9715
Comments
We should notify users when this occurs. |
See this metabase question: |
The parameters should probably be tweaked... This workspace: https://cloud.airbyte.io/workspaces/b264bf09-60e0-4a81-9cec-2b64cd613792/connections/5b3f66da-8739-405c-9407-cb2a39b03806 is not showing up based on Charles thresholds since it has:
|
Are those failure limits too high? They might still result in high warehouse costs for folks with normalisation turned on. To clarify, can we define 14 days of straight failures? Does this refer to a job failing after running for 14 days? |
I interpreted I have a draft PR up that only does this check after a job failure, so in that implementation, there shouldn't be a job in a currently running state. |
@cgardens I have this set for only jobs of configType |
That makes sense to me to include it. |
When a new connection is set up and fails its first job, this will trigger the connection to be auto-disabled since all the jobs in the last 14 days (so only that first job) will be failures. This behavior seems not ideal for users, since when they fix the connection, they will also need to remember to enable it. One options was to make sure that the connection is 14 days old before doing that check, but this could lead to many failures in that 14 day period. We can still do the 100 consecutive failures check in this case though. @cgardens do you have any opinions on what behavior might be better for our users? |
hahaha. yeah the way i was thinking about that is there is condition 1: disable due to 14 day of failures OR condition 2: disable due to 100 consecutive failures. condition 1 can't trigger until there are at least 14 days of jobs. |
Update for this issue. This feature can be turned on by setting the |
Tell us about the problem you're trying to solve
If a connection is failing, Airbyte should automatically stop retrying at some point to avoid wasting resources. We will start with the following rule: Disable a sync if it has 14 days of straight failures OR 100 failures in a row, whichever comes first. The goal with the rule is handle syncs with a long period.
So a constant failing 5 minute sync will disable itself after about 6 hours. While a daily sync will disable itself after 7 days. For syncs with a longer period, they will get disabled after 1 bad run.
While this is something that we will want to make configurable, because we are still not sure exactly what the right threshold will be or if we have even chosen the right metrics to use as trigger, we should hold off on making the thresholds configurable via environment variable until we are more certain. If we want, for now we can flag it, and allow OSS users to simply turn off auto disable altogether if they want.
Acceptance Criteria
The text was updated successfully, but these errors were encountered: