Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky-test: PrometheusMetricsTest.testDuplicateMetricTypeDefinitions #17921

Closed
1 of 2 tasks
tisonkun opened this issue Oct 4, 2022 · 3 comments · Fixed by #18077 or #17905
Closed
1 of 2 tasks

Flaky-test: PrometheusMetricsTest.testDuplicateMetricTypeDefinitions #17921

tisonkun opened this issue Oct 4, 2022 · 3 comments · Fixed by #18077 or #17905

Comments

@tisonkun
Copy link
Member

tisonkun commented Oct 4, 2022

Search before asking

  • I searched in the issues and found nothing similar.

Example failure

https://github.com/tisonkun/pulsar/actions/runs/3179774186/jobs/5182608943

Exception stacktrace

  java.lang.AssertionError: Metric pulsar_txn_pending_ack_store_bufferedwriter_batch_record_count_count does not have a corresponding summary type definition
  	at org.testng.Assert.fail(Assert.java:99)
  	at org.apache.pulsar.broker.stats.PrometheusMetricsTest.testDuplicateMetricTypeDefinitions(PrometheusMetricsTest.java:842)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
  	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
  	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
  	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  	at java.base/java.lang.Thread.run(Thread.java:833)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@tisonkun
Copy link
Member Author

tisonkun commented Oct 5, 2022

@poorbarcode
Copy link
Contributor

This error was caused by the non-standard definition of the metrics name pulsar_txn_pending_ack_store_bufferedwriter_batch_record_count_count , and I submitted a PR #17905 to solve this problem before

@tisonkun
Copy link
Member Author

tisonkun commented Nov 4, 2022

@poorbarcode @nicoloboschi this seems another race condition for fixing bugs.

And #17905 seems fixed the root cause so that we should reopen this issue?

congbobo184 pushed a commit that referenced this issue Nov 16, 2022
Fixes: #17921

<strong>Note</strong>: 

This patch will change metrics names `s_bufferedwriter_batch_record_count` and `s_bufferedwriter_batch_oldest_record_delay_time_second`. These two names were first used in this PR #17701, and PR #17701 hasn't cherry-picked any branches yet, so this change will not cause any breaking changes.

### Motivation

https://github.com/poorbarcode/pulsar/actions/runs/3156649582/jobs/5136584463
https://github.com/apache/pulsar/actions/runs/3156649597/jobs/5136596447

#### Problem-1

If the `Prometheus-Colloctor` which typed `Counter` is named 'xxx_count',  then the output `metrics-api` will be named 'xxx_count_count'.

`TxnLogBufferedWriterMetricsStats` hits this error.

https://github.com/apache/pulsar/blob/fb7307d8f4998e42b18df3a4599fd7ec34cb04a9/pulsar-transaction/coordinator/src/main/java/org/apache/pulsar/transaction/coordinator/impl/TxnLogBufferedWriterMetricsStats.java#L105-L106


----

#### Problem-2

`PrometheusMetricsTest` defines the standard metrics name(see code below): 

```
["_sum", "_bucket", "_count", "_total", "_created"]
```

But the standard Prometheus name has three others( see: https://github.com/prometheus/client_java/blob/c28b901225e35e7c1df0eacae8b58fdfbb390162/simpleclient/src/main/java/io/prometheus/client/Collector.java#L152-L186 ):

```
["_info", "_gsum", "_gcount"]
```


https://github.com/apache/pulsar/blob/fb7307d8f4998e42b18df3a4599fd7ec34cb04a9/pulsar-broker/src/test/java/org/apache/pulsar/broker/stats/PrometheusMetricsTest.java#L834-L861

----

### Modifications

- Make `PrometheusMetricsTest` run with transaction feature
- Make txn metrics name conforms to the rule. see: https://prometheus.io/docs/practices/naming/
- Make `PrometheusMetricsTest` support all suffix of prometheus metrics name

### Documentation

- [x] `doc-not-needed` 
(Please explain why)

### Matching PR in forked repository

PR in forked repository:

- poorbarcode#19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants