Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query to find out jobs running unusally long #17978

Merged
merged 6 commits into from
Oct 19, 2022
Merged

Conversation

xiaohansong
Copy link
Contributor

What

#15930

Add a query to find out how many jobs running unusally long.

At this moment, I define this 'long' to be 2x of avg run time in last 7 days. To minimize the noise, if the runtime is less than 15 minutes of the average runtime we choose to not report either.

@xiaohansong xiaohansong requested a review from davinchia October 13, 2022 22:16
@xiaohansong xiaohansong temporarily deployed to more-secrets October 13, 2022 22:16 Inactive
Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiaohansong

Cool query! Some thoughts on the main query and readabilty. Nice tests!

@xiaohansong xiaohansong requested a review from davinchia October 17, 2022 19:40
@xiaohansong xiaohansong temporarily deployed to more-secrets October 17, 2022 19:41 Inactive
@xiaohansong xiaohansong requested a review from davinchia October 17, 2022 23:09
@xiaohansong xiaohansong temporarily deployed to more-secrets October 17, 2022 23:12 Inactive
jobs.scope as connection_id,
extract(epoch
from
age(NOW(), attempts.created_at)) as running_time
Copy link
Contributor

@davinchia davinchia Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I started working on this, I remember the attempts table not being in a great state. Unfortunately, I never found the time to come back to this.

Want to confirm this pulls the latest running attempt for the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the problem still exists, but coupling attempts status with jobs status could eliminate the problem:

select count(*) from attempts inner join jobs on jobs.id = attempts.job_id where attempts.status = 'running' and jobs.status = 'running'; yields 62 result which is much more reasonable

@@ -140,6 +140,66 @@ having count(*) < 1440 / cast(c.schedule::jsonb->'units' as integer)
+ ctx.fetchOne(queryForAbnormalSyncInMinutesInLastDay).get("cnt", long.class);
}

long numberOfJobsRunningUnusuallyLong() {
// Definition of unusually long means runtime is more than 2x historic avg run time or 15
// minutes more than avg run time, whichever is greater.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment to specify that we ignore jobs with less than 4 runs to:

  1. not count starting job.
  2. give jobs time to build up run history.

(
select
jobs.scope as connection_id,
extract(epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this extract should be collapsed into one row similar to line 176

Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @xiaohansong !

Main feedback is to confirm the running query picks up the latest running attempt since I saw inconsistent attempt state when working on this.

Rest of the comments are cosmetic and non-blocking.

…rter/MetricRepository.java

Co-authored-by: Davin Chia <davinchia@gmail.com>
@xiaohansong xiaohansong temporarily deployed to more-secrets October 19, 2022 18:43 Inactive
@xiaohansong xiaohansong temporarily deployed to more-secrets October 19, 2022 20:33 Inactive
@xiaohansong xiaohansong temporarily deployed to more-secrets October 19, 2022 21:00 Inactive
@xiaohansong xiaohansong merged commit 6f88bdb into master Oct 19, 2022
@xiaohansong xiaohansong deleted the xiaohan/metrics branch October 19, 2022 23:21
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
* query to find out jobs running unusally long

* comments fix

* add more indents for better readability

* Update airbyte-metrics/reporter/src/main/java/io/airbyte/metrics/reporter/MetricRepository.java

Co-authored-by: Davin Chia <davinchia@gmail.com>

* formatting

Co-authored-by: Davin Chia <davinchia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants