You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feature (What you would like to be added):
As Druid runs in the namespace different than the shoot control plane but the compaction jobs triggered by it runs in the shoot control plane, it's not straightforward to collect the metrics of compaction jobs and create the dashboard out of it. There are a number of prometheus involved in the process that should collect and forward them to others. The compaction metrics are needed to be channelized in such a way so that it ultimately reaches to prometheus running in shoot control plane. Only then the metrics would be ready for consumption by Dashboards running in shoot control planes.
As Druid is running in Garden namespace, Cache prometheus will be able to collect the Druid controller metrics i.e. compaction metrics. Then, control plane prometheus can fedarate those metrics along with cadvisor metrics for Compaction job. We can use these scraped metrics from control plane prometheus and filter out the shoot specific compaction job metrics to show the dashboard for a particular shoot
To further enhance the visualization of compaction metrics, we can also create a dashboard in seed. The dashboard may show aggregated compaction job performance.
In my first comment, I attached an image shared by @istvanballok and @rickardsjp to better understand the flow.
Motivation (Why is this needed?):
We have druids that triggers compaction jobs after a certain threshold of delta events are crossed in control plane ETCD. Compaction jobs compacts the delta events that accumulated in object storage and create full snapshots out of it. But the jobs may be heavy at certain times. and we need proper monitoring for the jobs running in each shoot control planes. Approach/Hint to the implement solution (optional):
Feature (What you would like to be added):
As Druid runs in the namespace different than the shoot control plane but the compaction jobs triggered by it runs in the shoot control plane, it's not straightforward to collect the metrics of compaction jobs and create the dashboard out of it. There are a number of prometheus involved in the process that should collect and forward them to others. The compaction metrics are needed to be channelized in such a way so that it ultimately reaches to prometheus running in shoot control plane. Only then the metrics would be ready for consumption by Dashboards running in shoot control planes.
As Druid is running in Garden namespace, Cache prometheus will be able to collect the Druid controller metrics i.e. compaction metrics. Then, control plane prometheus can fedarate those metrics along with cadvisor metrics for Compaction job. We can use these scraped metrics from control plane prometheus and filter out the shoot specific compaction job metrics to show the dashboard for a particular shoot
To further enhance the visualization of compaction metrics, we can also create a dashboard in seed. The dashboard may show aggregated compaction job performance.
In my first comment, I attached an image shared by @istvanballok and @rickardsjp to better understand the flow.
Motivation (Why is this needed?):
We have druids that triggers compaction jobs after a certain threshold of delta events are crossed in control plane ETCD. Compaction jobs compacts the delta events that accumulated in object storage and create full snapshots out of it. But the jobs may be heavy at certain times. and we need proper monitoring for the jobs running in each shoot control planes.
Approach/Hint to the implement solution (optional):
The text was updated successfully, but these errors were encountered: