Skip to content

KAFKA-15129;[2/N] Remove metrics in GroupMetadataManager when shutdown#13926

Closed
hudeqi wants to merge 1 commit intoapache:trunkfrom
hudeqi:clean_metric_groupMetadataManager
Closed

KAFKA-15129;[2/N] Remove metrics in GroupMetadataManager when shutdown#13926
hudeqi wants to merge 1 commit intoapache:trunkfrom
hudeqi:clean_metric_groupMetadataManager

Conversation

@hudeqi
Copy link
Contributor

@hudeqi hudeqi commented Jun 28, 2023

This pr is used to remove the metrics in GroupMetadataManager when shutdown.
This pr has passed the corresponding unit test, and it is part of KAFKA-15129.

A special advantage is that since the metric registered through metricsGroup is removed during shutdown, the original "recreateGauge" method (mentioned in #3506 (comment)) is no longer needed.

@clolov
Copy link
Contributor

clolov commented Jun 30, 2023

In theory this change makes sense to me. As with the [1/N] one I would prefer if variable names start with a lowercase unless there is a good reason for them not to. I have reached out to @guozhangwang who was also wondering what the purpose of the recreateGauge method is (#3506 (comment)). I thought that there is only ever one GroupMetadataManager so I do not understand the purpose of this method in the first place and I would like to get a confirmation before we remove it.

@dajac
Copy link
Member

dajac commented Jun 30, 2023

@clolov Constants (in the companion object) in Scala start with a capital letter in our code base.

@clolov
Copy link
Contributor

clolov commented Jun 30, 2023

Okay, that makes sense @dajac, do you happen to know (or are able to deduce) the answer to the other question about why we needed the recreateGauge method in the first place and is it safe to get rid of it now?

@dajac
Copy link
Member

dajac commented Jun 30, 2023

I was wondering if it is because we run multiple brokers in the same JVM in tests but I am not sure.

@clolov
Copy link
Contributor

clolov commented Jun 30, 2023

But if this is the case won't we run into the same problem for at least one other subset of metrics? Also if this is the case, that would mean that if we remove the method the tests running as part of the auto-build will start failing, no?

@dajac
Copy link
Member

dajac commented Jun 30, 2023

Found this: #3506 (comment).

@clolov
Copy link
Contributor

clolov commented Jun 30, 2023

Hudeqi mentioned this comment as well, but I still do not understand how this could happen - is it that we keep the same metric registry between individual unit tests? If this is the case, don't we have this problem for other groups of metrics as well?

@dajac
Copy link
Member

dajac commented Jun 30, 2023

Yeah, that seems to be a general issue with Yammer based metrics. It is not about individual unit tests. It is about integration tests that create multiple KafkaServers. In this case, the metric registry is shared by all instances:

    public final <T> Gauge<T> newGauge(String name, Gauge<T> metric, Map<String, String> tags) {
        return KafkaYammerMetrics.defaultRegistry().newGauge(metricName(name, tags), metric);
    }

That being said, it seem that newGauge here uses getOrAdd internally in the registry so it may be fine in the recent version of Yammer. It may be worth checking if this is always true (may depend on the version).

@hudeqi
Copy link
Contributor Author

hudeqi commented Jun 30, 2023

Combined with the explanation of #3506 (comment) and the results of my actual test, I think this is the case: All unit tests in GroupMetadataManagerTest involve many times of "new GroupMetadataManager" behavior , since "calls newGauge" is a global registry, it will affect the subsequent metric verification (testMetrics() method) in multiple unit tests.
I remember that I replaced "recreateGauge" directly before mentioning this PR. The unit test failed. Now because I added "removeMetrics()" to "groupMetadataManager.shutdown()"(also in@AfterEach), it will be the same as "recreateGauge" effect. @clolov @dajac

@hudeqi
Copy link
Contributor Author

hudeqi commented Jun 30, 2023

And this please @divijvaidya

@hudeqi
Copy link
Contributor Author

hudeqi commented Jul 3, 2023

Hello, please help to review this PR, we have doubts about removing the "recreateGauge" method, and the conversation here(#3506 (comment)) is related to you. @cmccabe

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions bot added the stale Stale PRs label Oct 2, 2023
@hudeqi hudeqi closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker stale Stale PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants