Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete stale metrics on object delete #1362

Merged
merged 3 commits into from
Aug 13, 2024
Merged

Conversation

TarasLykhenko
Copy link
Contributor

Change:
Moved all the metrics recording to be performed at the very end of the reconciliation. Realtime metrics for readiness are no longer recorded to match the Flux metrics recording.

Before this change, the following metrics continued to be exported even after the associated object was deleted:

gotk_reconcile_condition{kind="Terraform",name="helloworld",namespace="flux-system",status="False",type="Ready"} 0
gotk_reconcile_condition{kind="Terraform",name="helloworld",namespace="flux-system",status="True",type="Ready"} 0
gotk_reconcile_condition{kind="Terraform",name="helloworld",namespace="flux-system",status="Unknown",type="Ready"} 1
# HELP gotk_reconcile_duration_seconds The duration in seconds of a GitOps Toolkit resource reconciliation.
# TYPE gotk_reconcile_duration_seconds histogram
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="0.01"} 0
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="0.038363583488692544"} 15
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="0.1471764538093883"} 17
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="0.5646216173286169"} 17
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="2.166090855590701"} 18
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="8.309900738254731"} 18
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="31.879757075478317"} 18
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="122.30217221643493"} 18
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="469.19495946736544"} 18
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="1799.9999999999986"} 18
gotk_reconcile_duration_seconds_bucket{kind="Terraform",name="helloworld",namespace="flux-system",le="+Inf"} 18
gotk_reconcile_duration_seconds_sum{kind="Terraform",name="helloworld",namespace="flux-system"} 1.2276592080000002
gotk_reconcile_duration_seconds_count{kind="Terraform",name="helloworld",namespace="flux-system"} 18
# HELP gotk_suspend_status The current suspend status of a GitOps Toolkit resource.
# TYPE gotk_suspend_status gauge
gotk_suspend_status{kind="Terraform",name="helloworld",namespace="flux-system"} 0

With this change, they get deleted once the associated object is deleted.

@ilithanos
Copy link
Collaborator

First of @TarasLykhenko thanks for contributing to the project.

I did a quick look through the changes within the PR, and it seems like some of the changes aren't really needed for this PR to do what it needs to do. Is there a specific reason for those changes?

I'll see if i can do a more detailed review over the weekend.

@TarasLykhenko
Copy link
Contributor Author

Hi @ilithanos,
I redid the PR and added only the necessary changes.
bumped the LIBCRYPTO_VERSION, as the image was falling to build.

@ilithanos
Copy link
Collaborator

@TarasLykhenko Thanks for updating the code.

I'm very sorry for the delay, i ended up in the middle of a job change, and havn't had much time on my hands.

Could i get you to resolve the merge conflicts and rebase on master so that we can rerun the tests and ensure this still doesn't break anything else :)

@TarasLykhenko
Copy link
Contributor Author

I resolved the merge conflicts, @ilithanos can you pls check again

@TiagoJVO
Copy link

Hi there @ilithanos! Any ETA to merge this PR? This would be really valuable for a monitoring use case of ours 😄

@ermirry
Copy link

ermirry commented Aug 12, 2024

@ilithanos Also waiting for this to get merged. We have a bug that this would fix 🙏🏻

@ilithanos ilithanos merged commit cb032d4 into flux-iac:main Aug 13, 2024
15 checks passed
@ilithanos
Copy link
Collaborator

@TiagoJVO @ermirry just merged it now.

@TiagoJVO
Copy link

TiagoJVO commented Aug 13, 2024

Is there a scheduled Pre-Release nearby?
We're using the latest one that dates 5 months ago (v0.16.0-rc.4), so I guess it is close right?

@greenu
Copy link

greenu commented Aug 14, 2024

@TarasLykhenko This PR has fixed issue with existing Terraform objects (they were in Unknown status in metrics instead of True), but deleted Terraform objects still remain in metrics in Unknown status

Using latest tofu-controller main-cb032d47.

Deleting TF resource:

kubectl delete tf -n my-dev-ns my-tf
<..wait for runner..>
kubectl get tf -n my-dev-ns my-tf
Error from server (NotFound): terraforms.infra.contrib.fluxcd.io "my-tf" not found

And checking metrics: curl http://tf-controller:8080/metrics

...skipped...
...
gotk_reconcile_condition{kind="Terraform",name="my-tf",namespace="my-dev-ns",status="Unknown",type="Ready"} 1
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants