Skip to content

Comments

security: bugfix, ensure cert expiry metrics reflect reloaded certs#135596

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
angles-n-daemons:fix-certificate-metrics-reload
Nov 26, 2024
Merged

security: bugfix, ensure cert expiry metrics reflect reloaded certs#135596
craig[bot] merged 1 commit intocockroachdb:masterfrom
angles-n-daemons:fix-certificate-metrics-reload

Conversation

@angles-n-daemons
Copy link
Contributor

@angles-n-daemons angles-n-daemons commented Nov 18, 2024

security: bugfix, ensure cert expiry metrics reflect reloaded certs

The PR #130110 added certificate TTL metrics alongside our existing expiration metrics. Prior to that change, the certificate metrics values were updated on each metrics load. Afterwards, new metrics objects were created for each load of certificates.

This created a bug in that the new expiration values would not be found in any of the system exhaust (metrics scrape or tsdb) because the registered metrics objects were the ones created on startup.

This new change instead allows the metrics to close the whole CertificateManager object, so that they only need to be created once, and therefore the initial registration of metrics reflects persistently valid values.

Release note (bug fix): security.certificate.* metrics will now be updated if a node loads new certificates while running.

Epic: none
Fixes: #135093

@angles-n-daemons angles-n-daemons requested review from a team and kyle-a-wong and removed request for a team November 18, 2024 17:11
@angles-n-daemons angles-n-daemons requested a review from a team as a code owner November 18, 2024 17:11
@angles-n-daemons angles-n-daemons requested a review from a team November 18, 2024 17:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@angles-n-daemons angles-n-daemons force-pushed the fix-certificate-metrics-reload branch from 5b5a0d2 to f609790 Compare November 19, 2024 14:50
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just some minor notes. Please add backport labels as well.

Reviewed 1 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)


pkg/security/certificate_metrics.go line 168 at r1 (raw file):

)

type certClosure func() *CertInfo

I think if you're going to name this type, might as well add a docstring explaining why it's needed.


pkg/security/certificate_metrics.go line 212 at r1 (raw file):

	b := aggmetric.MakeBuilder(SQLUserLabel)
	return &Metrics{
		CAExpiration:         expirationGauge(metaCAExpiration, cm.CACert),

I don't see the diff that makes this exported. Am I missing something?


pkg/security/certificate_metrics.go line 213 at r1 (raw file):

	return &Metrics{
		CAExpiration:         expirationGauge(metaCAExpiration, cm.CACert),
		TenantExpiration:     expirationGauge(metaTenantExpiration, func() *CertInfo { return cm.tenantCert }),

should we use certClosure here and below?

Copy link
Contributor Author

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @kyle-a-wong)


pkg/security/certificate_metrics.go line 168 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I think if you're going to name this type, might as well add a docstring explaining why it's needed.

that's a good idea, adding it now.


pkg/security/certificate_metrics.go line 212 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I don't see the diff that makes this exported. Am I missing something?

cm.caCert is a reference, cm.CACert is a function. It's not just that the latter is public, it's that it always returns the right certificate.


pkg/security/certificate_metrics.go line 213 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

should we use certClosure here and below?

I'm not sure I understand how, do you mean to cast the whole thing explicitly?

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No further comments, thanks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @kyle-a-wong)


pkg/security/certificate_metrics.go line 212 at r1 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

cm.caCert is a reference, cm.CACert is a function. It's not just that the latter is public, it's that it always returns the right certificate.

Ah gotcha. Thanks for clarifying.


pkg/security/certificate_metrics.go line 213 at r1 (raw file):

Previously, angles-n-daemons (Brian Dillmann) wrote…

I'm not sure I understand how, do you mean to cast the whole thing explicitly?

My brain wasn't working. Disregard :) I think I was just visually pattern matching and forgot that the custom type does not magically become a new function constructor.

The PR cockroachdb#130110 added certificate TTL metrics alongside our existing
expiration metrics. Prior to that change, the certificate metrics values
were updated on each metrics load. Afterwards, new metrics objects were
created for each load of certificates.

This created a bug in that the new expiration values would not be
found in any of the system exhaust (metrics scrape or tsdb) because the
registered metrics objects were the ones created on startup.

This new change instead allows the metrics to close the whole
CertificateManager object, so that they only need to be created once, and
therefore the initial registration of metrics reflects persistently
valid values.

Release note (bug fix): security.certificate.* metrics will now be
updated if a node loads new certificates while running.
@angles-n-daemons angles-n-daemons force-pushed the fix-certificate-metrics-reload branch from f609790 to 0e19000 Compare November 22, 2024 19:05
@angles-n-daemons
Copy link
Contributor Author

bors r+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: certificate ttl metric is not correctly registered after in-process load

3 participants