Add id and status labels to pipeline and job metrics #455

ErezArbell · 2022-05-11T06:56:19Z

See details in issue 453.
This can be helpful to better filter the queries and also to present more than the last pipeline/job in the dashboard.

maciej-gol · 2022-05-26T21:21:33Z

Why joining on the gitlab_ci_pipeline_id is not sufficient? You can lookup how it works in the example dashboards.

ErezArbell · 2022-05-28T11:23:53Z

Why joining on the gitlab_ci_pipeline_id is not sufficient? You can lookup how it works in the example dashboards.

@maciej-gol
In the example dashboards you can see only the latest pipeline/job. You cannot see historic data.
Example for something that I would like to have: to show all runs of a scpecific job name during the last week and so you can see when it started to fail.
Such things cannot be done without having the extra labels this PR adds.

Do you have a way to create a dashboard with historical data using the curernt implementation, I would like to hear it. We need such dashaboard and I did not find any way to get a list of historic pipelines/jobs with option to filter.

maciej-gol · 2022-05-28T11:40:37Z

I understand your issue, as I'm facing it, too. Having said that I don't believe adding labels will solve it (on its' own). Why? You can already figure out what pipeline relevant metrics refer to by looking up the pipeline_id metric. Adding labels will only duplicate the data exported whilst opening you to the problem of growing metrics. First of all, you need to tweak the exporter to crawl all the pipelines, not only the most recent ones. I might be mistaken, but your MR only tackles the labels, not the crawling. Secondly, growing metrics issue. The problem with the prometheus library is that it doesn't forget metrics' labels once observed. This is important, because in infinity, the exporter will present Prometheus with ALL the jobs ever seen, on every scrape. That's the same as just querying your Gitlab DB directly. You could restart the exporter, but things get messy when you use redis for HA. Having said all of this, I believe this exporter is not suitable to monitor the health of your GCI system when you allow more than one pipeline per ref. In such, I'm currently opting to building the state of ALL of the pipeline by querying webhooks data (although that's not all). I share your need of tracking ALL running pipeline, but I'm worried this exporter would need architectural changes to work to address this need. sob., 28 maj 2022, 13:24 użytkownik ErezArbell ***@***.***> napisał:

…

Why joining on the gitlab_ci_pipeline_id is not sufficient? You can lookup how it works in the example dashboards. @maciej-gol <https://github.com/maciej-gol> In the example dashboards you can see only the latest pipeline/job. You cannot see historic data. Example for something that I would like to have: to show all runs of a scpecific job name during the last week and so you can see when it started to fail. Such things cannot be done without having the extra labels this PR adds. Do you have a way to create a dashboard with historical data using the curernt implementation, I would like to hear it. We need such dashaboard and I did not find any way to get a list of historic pipelines/jobs with option to filter. — Reply to this email directly, view it on GitHub <#455 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOGQTPLBFFUIPZA2ZGWDOLVMH65HANCNFSM5VT26MXA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ErezArbell · 2022-05-28T12:24:31Z

Thank you @maciej-gol for the insightful comments.

I might be mistaken, but your MR only tackles the labels, not the crawling.

You are correct. However, this MR does create an improvement with collecting the data: the way it works is that it always publish only the latest job (for example) that have that same set of labels-values set. So In the current implementation, if a new pipeline starts ont he same ref before the old one ends then only the job from thenew pipeline will be published. This MR add the pipeline_id and job_id labels and they are unique. So the jobs from the older pipelines will still be published.

in infinity, the exporter will present Prometheus with ALL the jobs ever seen, on every scrape

You have a good point here. Now that I think about it, it is indeed what is expected to happen, but it is not what I see when I look at the '/metrics' endpoint. Anyway, it is a good point.

Having said all of this, I believe this exporter is not suitable to monitor the health of your GCI system when you allow more than one pipeline per ref
...
I share your need of tracking ALL running pipeline, but I'm worried this
exporter would need architectural changes to work to address this need.

I agree. This is not the suitable tool. This was the closest I found so I thought to use it.
I understand that this PR will not be pulled. I will, however, leave this PR open since I would like to get a response from the repo owner, maybe he will have a suggestion.

It is strage that no such tools is avaiable for GitLab, which is a popular commercial product.

BYW, what is "GCI system"?

maciej-gol · 2022-05-28T12:28:37Z

Since I've been working quite a lot with Gitlab here at Codility, I've started to use GCI in place of Gitlab CI, as it gets tiresome writing the full name over and over :D sob., 28 maj 2022, 14:24 użytkownik ErezArbell ***@***.***> napisał:

…

Thank you @maciej-gol <https://github.com/maciej-gol> for the insightful comments. I might be mistaken, but your MR only tackles the labels, not the crawling. You are correct. However, this MR does create an improvement with collecting the data: the way it works is that it always publish only the latest job (for example) that have that same set of labels-values set. So In the current implementation, if a new pipeline starts ont he same ref before the old one ends then only the job from thenew pipeline will be published. This MR add the pipeline_id and job_id labels and they are unique. So the jobs from the older pipelines will still be published. in infinity, the exporter will present Prometheus with ALL the jobs ever seen, on every scrape You have a good point here. Now that I think about it, it is indeed what is expected to happen, but it is not what I see when I look at the '/metrics' endpoint. Anyway, it is a good point. Having said all of this, I believe this exporter is not suitable to monitor the health of your GCI system when you allow more than one pipeline per ref ... I share your need of tracking ALL running pipeline, but I'm worried this exporter would need architectural changes to work to address this need. I agree. This is not the suitable tool. This was the closest I found so I thought to use it. I understand that this PR will not be pulled. I will, however, leave this PR open since I would like to get a response from the repo owner, maybe he will have a suggestion. It is strage that no such tools is avaiable for GitLab, which is a popular commercial product. BYW, what is "GCI system"? — Reply to this email directly, view it on GitHub <#455 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOGQTJ6HXPU2LJZHORPRKLVMIGAVANCNFSM5VT26MXA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

maciej-gol · 2022-05-28T20:11:17Z

@ErezArbell since your use-case is monitoring general ratio of successes of your jobs (per ref, perhaps), I believe implementing job hooks to simply store success/failures counters would be enough, without opening yourself to the growing metrics problem.

You could export job status counters, and just expose it via gitlab_ci_pipeline_job_status_counter{job_name, ref, project, status}. Fail rate would be increase(_counter{status='failed'}) / (increase(_counter{status='failed'} + success). Tracking should also be easy, if we start with hooks only first.

It might solve my problem (tracking all pending jobs), but I would need to give it more thought.

What do you think?

tinchoram · 2022-05-31T15:03:18Z

👋 Hi everyone! this issue is very interesting. We are having the same problem to be able to track the final status of all the jobs and their evolution, since as @ErezArbell comments, it only reports the status of the last job.

I'm going to try running the app with the changes incorporated by @ErezArbell and see if it fixes our problem.

I look forward to the resolution of this issue 🦊

ErezArbell · 2022-06-01T07:01:16Z

@tinchoram, I added to the "quickstart" example two dashboards that I created to use those changes.

Pipelines History
Jobs History
Those dashboards allow to present the full history and also let you filter what is shown by various parameters.
As @maciej-gol wrote, this is not production ready. But those dashboards will let you use those changes and also see the benefits of them and the problems we have, like this issue

ErezArbell · 2022-06-01T07:02:31Z

@maciej-gol, I do not need the ratios. I need to see the history of pipelines and jobs in a table, with options to filter.

arbell added 2 commits May 11, 2022 09:52

add id label to pipeline and job

16adabb

add status label to pipeline and job

fc291a8

ErezArbell mentioned this pull request May 11, 2022

Allow the dashboard to show historic data of pipelines and jobs (maybe by add "id" label to the metrics) #453

Open

Add dashboards with history

719436c

mvisonneau force-pushed the main branch 3 times, most recently from f1d1bc5 to 718e730 Compare May 23, 2023 06:51

mvisonneau force-pushed the main branch from ca6ba6c to 87f505e Compare March 4, 2024 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add id and status labels to pipeline and job metrics #455

Add id and status labels to pipeline and job metrics #455

ErezArbell commented May 11, 2022

maciej-gol commented May 26, 2022

ErezArbell commented May 28, 2022

maciej-gol commented May 28, 2022 via email

ErezArbell commented May 28, 2022

maciej-gol commented May 28, 2022 via email

maciej-gol commented May 28, 2022

tinchoram commented May 31, 2022

ErezArbell commented Jun 1, 2022

ErezArbell commented Jun 1, 2022

Add id and status labels to pipeline and job metrics #455

Are you sure you want to change the base?

Add id and status labels to pipeline and job metrics #455

Conversation

ErezArbell commented May 11, 2022

maciej-gol commented May 26, 2022

ErezArbell commented May 28, 2022

maciej-gol commented May 28, 2022 via email

ErezArbell commented May 28, 2022

maciej-gol commented May 28, 2022 via email

maciej-gol commented May 28, 2022

tinchoram commented May 31, 2022

ErezArbell commented Jun 1, 2022

ErezArbell commented Jun 1, 2022