[fix][monitor] fix metrics string encoding #18138

tjiuming · 2022-10-20T10:55:30Z

Motivation

For the method SimpleTextOutputStream#write(String), it's implementation as below:

    public SimpleTextOutputStream write(String s) {
        if (s == null) {
            return this;
        }
        int len = s.length();
        for (int i = 0; i < len; i++) {
            buffer.writeByte((byte) s.charAt(i));
        }

        return this;
    }

Generally, it works fine. But if there is a character has different length in different encoding, it would lead to an issue, say ¬.

In latin1, it's bytes is [-84], by calling "¬".getBytes(StandardCharset.ISO_8859_1)
but in UTF8, it's bytes is [-62, -84], by calling "¬".getBytes(StandardCharset.UTF_8)

if we replace str.getBytes(charset) with (byte) s.charAt(i), ¬ will display as �.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: tjiuming#6

coderzc

LGTM

asafm · 2022-10-20T12:38:48Z

Thanks, @tjiuming for that prompt fix!
I'll just expand a bit on the motivation:

SimpleTextOutputFormat is the class used when writing Prometheus Text Format lines - we print those lines in this class, which inside writes as bytes to a ByteBuf (Netty's). That buffer is then flushed to the HTTP response upon /metrics endpoint (when Prometheus scrapes metrics from Pulsar).

The Prometheus Text Format requires label values to be UTF-8 encoded, as written here:

label_value can be any sequence of UTF-8 characters

Today, as @tjiuming mentioned, the current implementation iterates over the String and each character is written in an undefined encoding - i.e. manipulating the char and then casting it to byte

buffer.writeByte((byte) s.charAt(i));

This causes Prometheus to fail the scraping as it tries to validate that the label value byte array in the response is a valid UTF-8 encoded string (which is not).

The fix as @tjiuming mentioned is encoding the char using UTF-8 as provided by the ByteBuf class.

lhotari · 2022-10-20T18:17:02Z

@tjiuming Please run the tests in your fork as instructed in https://github.com/apache/pulsar/actions/runs/3288972604/attempts/1#summary-9005069188 .
The PR title needs revisiting, please check https://github.com/apache/pulsar/actions/runs/3288972601/jobs/5419952411 .
monitoring -> monitor .

lhotari · 2022-10-20T18:19:13Z

I wonder if it would also make sense to fix this method:

pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/util/SimpleTextOutputStream.java

Lines 46 to 49 in b59d0e9

    
           public SimpleTextOutputStream write(char c) { 
        
               buffer.writeByte((byte) c); 
        
               return this; 
        
           }

additional changes required

tjiuming · 2022-10-21T03:35:07Z

I wonder if it would also make sense to fix this method:

pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/util/SimpleTextOutputStream.java

Lines 46 to 49 in b59d0e9

public SimpleTextOutputStream write(char c) {

buffer.writeByte((byte) c);

return this;

}

write(char) is used to add some symbols like , \n ", generally, it won't lead to an issue. But, sure, fix it is better.
@lhotari I've fixed it, PTAL

tjiuming · 2022-10-21T03:44:27Z

pulsar-common/src/main/java/org/apache/pulsar/common/util/SimpleTextOutputStream.java

@@ -43,19 +44,16 @@ public SimpleTextOutputStream write(byte[] a, int offset, int len) {
    }

    public SimpleTextOutputStream write(char c) {
-        buffer.writeByte((byte) c);
+        write(String.valueOf(c));


I've tried to use buffer.writeChar(), but it doesn't work

codecov-commenter · 2022-10-24T02:58:49Z

Codecov Report

Merging #18138 (7cb9d17) into master (6c65ca0) will increase coverage by 10.90%.
The diff coverage is 42.68%.

@@              Coverage Diff              @@
##             master   #18138       +/-   ##
=============================================
+ Coverage     34.91%   45.82%   +10.90%     
- Complexity     5707    17600    +11893     
=============================================
  Files           607     1574      +967     
  Lines         53396   128637    +75241     
  Branches       5712    14150     +8438     
=============================================
+ Hits          18644    58943    +40299     
- Misses        32119    63630    +31511     
- Partials       2633     6064     +3431

Flag	Coverage Δ
unittests	`45.82% <42.68%> (+10.90%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ker/stats/prometheus/PrometheusMetricsServlet.java	`0.00% <0.00%> (ø)`
.../main/java/org/apache/pulsar/PulsarStandalone.java	`0.00% <0.00%> (ø)`
.../apache/pulsar/broker/admin/impl/ClustersBase.java	`76.66% <0.00%> (+67.62%)`	⬆️
...pache/pulsar/broker/admin/impl/NamespacesBase.java	`63.25% <ø> (+48.72%)`	⬆️
.../pulsar/broker/service/AbstractBaseDispatcher.java	`50.30% <0.00%> (+4.44%)`	⬆️
...che/pulsar/broker/service/BacklogQuotaManager.java	`12.39% <0.00%> (+2.91%)`	⬆️
.../pulsar/broker/service/BrokerServiceException.java	`44.44% <0.00%> (+19.44%)`	⬆️
...ava/org/apache/pulsar/broker/service/Consumer.java	`67.90% <0.00%> (+5.90%)`	⬆️
...ava/org/apache/pulsar/broker/service/Producer.java	`62.72% <0.00%> (+3.08%)`	⬆️
...pulsar/broker/service/PulsarCommandSenderImpl.java	`66.66% <0.00%> (ø)`
... and 1185 more

(cherry picked from commit 031e37c)

(cherry picked from commit 031e37c) (cherry picked from commit b7def56)

dao-jun added 3 commits October 20, 2022 14:20

fix metrics str encoding

c703874

fix metrics str encoding

9a5b79a

fix metrics str encoding

b59d0e9

coderzc approved these changes Oct 20, 2022

View reviewed changes

asafm approved these changes Oct 20, 2022

View reviewed changes

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 20, 2022

lhotari previously approved these changes Oct 20, 2022

View reviewed changes

tjiuming changed the title ~~[fix][monitoring] fix metrics string encoding~~ [fix][monitor] fix metrics string encoding Oct 21, 2022

tjiuming mentioned this pull request Oct 21, 2022

[run-tests] [fix][monitor] fix metrics string encoding tjiuming/pulsar#6

Closed

dao-jun added 2 commits October 21, 2022 11:15

Merge branch 'master' into dev/metrics_encoding

4c0153b

review fix

e27fbf4

tjiuming commented Oct 21, 2022

View reviewed changes

coderzc approved these changes Oct 21, 2022

View reviewed changes

tjiuming requested a review from lhotari October 21, 2022 13:23

dao-jun added 2 commits October 21, 2022 21:29

Merge branch 'master' into dev/metrics_encoding

cb4488a

Merge branch 'master' into dev/metrics_encoding

c4707dc

codelipenghui approved these changes Oct 24, 2022

View reviewed changes

codelipenghui assigned tjiuming Oct 24, 2022

codelipenghui added this to the 2.12.0 milestone Oct 24, 2022

codelipenghui added the ready-to-test label Oct 24, 2022

codelipenghui closed this Oct 24, 2022

codelipenghui reopened this Oct 24, 2022

Merge branch 'master' into dev/metrics_encoding

7cb9d17

codelipenghui merged commit 031e37c into apache:master Oct 24, 2022

codelipenghui added release/2.9.4 release/2.11.1 release/2.10.3 labels Oct 26, 2022

codelipenghui pushed a commit that referenced this pull request Oct 27, 2022

[fix][monitor] fix metrics string encoding (#18138)

899ce51

(cherry picked from commit 031e37c)

codelipenghui pushed a commit that referenced this pull request Oct 27, 2022

[fix][monitor] fix metrics string encoding (#18138)

b7def56

(cherry picked from commit 031e37c)

codelipenghui pushed a commit that referenced this pull request Oct 27, 2022

[fix][monitor] fix metrics string encoding (#18138)

a280fd2

(cherry picked from commit 031e37c)

codelipenghui added cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 cherry-picked/branch-2.11 labels Oct 27, 2022

codelipenghui modified the milestones: 2.12.0, 2.11.0 Oct 31, 2022

codelipenghui removed the release/2.11.1 label Oct 31, 2022

tjiuming deleted the dev/metrics_encoding branch November 1, 2022 17:38

nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Nov 9, 2022

[fix][monitor] fix metrics string encoding (apache#18138)

558cce2

(cherry picked from commit 031e37c) (cherry picked from commit b7def56)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][monitor] fix metrics string encoding #18138

[fix][monitor] fix metrics string encoding #18138

tjiuming commented Oct 20, 2022 •

edited

Loading

coderzc left a comment

asafm commented Oct 20, 2022

lhotari commented Oct 20, 2022

lhotari commented Oct 20, 2022

tjiuming commented Oct 21, 2022

tjiuming Oct 21, 2022 •

edited

Loading

codecov-commenter commented Oct 24, 2022 •

edited

Loading

[fix][monitor] fix metrics string encoding #18138

[fix][monitor] fix metrics string encoding #18138

Conversation

tjiuming commented Oct 20, 2022 • edited Loading

Motivation

Documentation

Matching PR in forked repository

coderzc left a comment

Choose a reason for hiding this comment

asafm commented Oct 20, 2022

lhotari commented Oct 20, 2022

lhotari commented Oct 20, 2022

tjiuming commented Oct 21, 2022

tjiuming Oct 21, 2022 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Oct 24, 2022 • edited Loading

Codecov Report

tjiuming commented Oct 20, 2022 •

edited

Loading

tjiuming Oct 21, 2022 •

edited

Loading

codecov-commenter commented Oct 24, 2022 •

edited

Loading