-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query to find out jobs running unusally long #17978
Conversation
airbyte-metrics/metrics-lib/src/main/java/io/airbyte/metrics/lib/OssMetricsRegistry.java
Show resolved
Hide resolved
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
airbyte-metrics/reporter/src/test/java/io/airbyte/metrics/reporter/MetricRepositoryTest.java
Show resolved
Hide resolved
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xiaohansong
Cool query! Some thoughts on the main query and readabilty. Nice tests!
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
jobs.scope as connection_id, | ||
extract(epoch | ||
from | ||
age(NOW(), attempts.created_at)) as running_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I started working on this, I remember the attempts table not being in a great state. Unfortunately, I never found the time to come back to this.
Want to confirm this pulls the latest running attempt for the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the problem still exists, but coupling attempts status with jobs status could eliminate the problem:
select count(*) from attempts inner join jobs on jobs.id = attempts.job_id where attempts.status = 'running' and jobs.status = 'running';
yields 62 result which is much more reasonable
airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java
Outdated
Show resolved
Hide resolved
@@ -140,6 +140,66 @@ having count(*) < 1440 / cast(c.schedule::jsonb->'units' as integer) | |||
+ ctx.fetchOne(queryForAbnormalSyncInMinutesInLastDay).get("cnt", long.class); | |||
} | |||
|
|||
long numberOfJobsRunningUnusuallyLong() { | |||
// Definition of unusually long means runtime is more than 2x historic avg run time or 15 | |||
// minutes more than avg run time, whichever is greater. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment to specify that we ignore jobs with less than 4 runs to:
- not count starting job.
- give jobs time to build up run history.
( | ||
select | ||
jobs.scope as connection_id, | ||
extract(epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this extract should be collapsed into one row similar to line 176
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @xiaohansong !
Main feedback is to confirm the running query picks up the latest running attempt since I saw inconsistent attempt state when working on this.
Rest of the comments are cosmetic and non-blocking.
…rter/MetricRepository.java Co-authored-by: Davin Chia <davinchia@gmail.com>
* query to find out jobs running unusally long * comments fix * add more indents for better readability * Update airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java Co-authored-by: Davin Chia <davinchia@gmail.com> * formatting Co-authored-by: Davin Chia <davinchia@gmail.com>
What
#15930
Add a query to find out how many jobs running unusally long.
At this moment, I define this 'long' to be 2x of avg run time in last 7 days. To minimize the noise, if the runtime is less than 15 minutes of the average runtime we choose to not report either.