Incorrect metric at the dashboard overview #621

ghobaty · 2019-06-22T12:37:22Z

Horizon Version: 3.2.3
Laravel Version: 5.7.28
PHP Version: 7.2.19
Redis Driver & Version: predis/predis 1.1.1
Database Driver & Version: MySQL 5.7

Description:

I believe the dashboard displays an incorrect label or number for the failed jobs metric.

In the screenshot, 19 is the number of jobs failed within the last hour and not last 7 days.
Not sure if it is related, but my horizon trim configuration is:

'trim' => [
    'recent' => 60,
    'failed' => 10080,
],

The text was updated successfully, but these errors were encountered:

shawnhind · 2019-06-24T15:58:19Z

I've also seen this value go down between reloads as if the number of failed jobs in the last 7 days has decreased, meanwhile new failed jobs are actually still occurring. Would this be related? I guess this would just be because it's actually over the last hour that is decreasing despite more jobs failing just because the last hour's failed jobs are no longer being counted.

JayBizzle · 2019-06-25T21:09:04Z

We are also seeing the same issue...

These screenshots were taken on the 13th June...

Says there is 1 failed job in the last 7 days...

...but surely it should say 6?

derekmd · 2019-07-07T21:40:41Z

The problem is (by default) only the past hour of failed jobs are included in the dashboard count. The labelled "FAILED JOBS PAST 7 DAYS" period is from config('horizon.trim.failed') but the metric is actually filtered by config('horizon.trim.recent').

I put some branches together, one of them can be put up as a pull request.

Option 1 - make count behave as already labelled

Use config('horizon.trim.failed') (default 7 days) when counting how many minutes ago to include. This will make the dashboard count exactly match the number of rows under the "Failed Jobs" tab for URI /horizon/failed.

Proposed branch: 3.0...derekmd:dashboard-7-day-failed-job-count

Option 2 - fix label to match metric count being show

Keep the current config('horizon.trim.recent') count, correcting the label to show "FAILED JOBS PAST HOUR". I personally don't find such a metric useful since not many devs view this page every hour. I would assume a queue status check-in every day or two would be the typical use case.

Branch: 3.0...derekmd:fix-dashboard-failed-jobs-duration-label

Option 3 - add a new configuration variable

Add a new config('horizon.trim.recent_failed') that can be a value between config('horizon.trim.recent') and config('horizon.trim.failed'), by default set to 24 hours. Reason being:

config('horizon.trim.recent') ("JOBS PAST HOUR" on the dashboard) gives an idea of queue throughput and it doesn't grow Redis memory use too much.
config('horizon.trim.failed') ("Failed Jobs" page results) has a longer time period that indicate trends and historical context for application exceptions.
A dashboard failed jobs count gives an idea of something that should be looked at today.
- If a 7 day count is shown, devs may be bothered by a failing count metric not resetting back to 0 shortly after the issue is fixed.
- Instead you have to remember, "what was the failure count yesterday? Has it lowered?"
- A 12hr or 24hr config allows the above two metrics to continue as normal while allowing devs to choose how they want this counter to behave.

Branch: 3.0...derekmd:dashboard-failed-job-count-config-filter

Cast a Vote

If there isn't a consensus what to do by Tuesday, I'll pull request option 1 since it has the least friction.

JayBizzle · 2019-07-08T09:14:24Z

Option 1 for me 👍

themsaid · 2019-07-08T13:08:25Z

I wonder what @halaei thinks here.

shawnhind · 2019-07-08T13:11:57Z

Option 1 for me

halaei · 2019-07-09T09:52:58Z

Option 1 for me too

mfn · 2019-07-09T10:19:49Z

1️⃣

driesvints added the bug label Jun 24, 2019

derekmd mentioned this issue Jul 9, 2019

[3.0] Correct dashboard "Failed Jobs Past 7 Days" metric #633

Merged

taylorotwell closed this as completed in #633 Jul 9, 2019

ghobaty mentioned this issue Jul 20, 2019

Make the recent failed jobs metric period configurable #637

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect metric at the dashboard overview #621

Incorrect metric at the dashboard overview #621

ghobaty commented Jun 22, 2019 •

edited

Loading

shawnhind commented Jun 24, 2019 •

edited

Loading

JayBizzle commented Jun 25, 2019

derekmd commented Jul 7, 2019

JayBizzle commented Jul 8, 2019

themsaid commented Jul 8, 2019

shawnhind commented Jul 8, 2019

halaei commented Jul 9, 2019

mfn commented Jul 9, 2019

Incorrect metric at the dashboard overview #621

Incorrect metric at the dashboard overview #621

Comments

ghobaty commented Jun 22, 2019 • edited Loading

Description:

shawnhind commented Jun 24, 2019 • edited Loading

JayBizzle commented Jun 25, 2019

derekmd commented Jul 7, 2019

Option 1 - make count behave as already labelled

Option 2 - fix label to match metric count being show

Option 3 - add a new configuration variable

Cast a Vote

JayBizzle commented Jul 8, 2019

themsaid commented Jul 8, 2019

shawnhind commented Jul 8, 2019

halaei commented Jul 9, 2019

mfn commented Jul 9, 2019

ghobaty commented Jun 22, 2019 •

edited

Loading

shawnhind commented Jun 24, 2019 •

edited

Loading