Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][monitor] fix metrics string encoding #18138

Merged
merged 8 commits into from
Oct 24, 2022

Conversation

tjiuming
Copy link
Contributor

@tjiuming tjiuming commented Oct 20, 2022

Motivation

For the method SimpleTextOutputStream#write(String), it's implementation as below:

    public SimpleTextOutputStream write(String s) {
        if (s == null) {
            return this;
        }
        int len = s.length();
        for (int i = 0; i < len; i++) {
            buffer.writeByte((byte) s.charAt(i));
        }

        return this;
    }

Generally, it works fine. But if there is a character has different length in different encoding, it would lead to an issue, say ¬.

In latin1, it's bytes is [-84], by calling "¬".getBytes(StandardCharset.ISO_8859_1)
but in UTF8, it's bytes is [-62, -84], by calling "¬".getBytes(StandardCharset.UTF_8)

if we replace str.getBytes(charset) with (byte) s.charAt(i), ¬ will display as .

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: tjiuming#6

Copy link
Member

@coderzc coderzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asafm
Copy link
Contributor

asafm commented Oct 20, 2022

Thanks, @tjiuming for that prompt fix!
I'll just expand a bit on the motivation:

SimpleTextOutputFormat is the class used when writing Prometheus Text Format lines - we print those lines in this class, which inside writes as bytes to a ByteBuf (Netty's). That buffer is then flushed to the HTTP response upon /metrics endpoint (when Prometheus scrapes metrics from Pulsar).

The Prometheus Text Format requires label values to be UTF-8 encoded, as written here:

label_value can be any sequence of UTF-8 characters

Today, as @tjiuming mentioned, the current implementation iterates over the String and each character is written in an undefined encoding - i.e. manipulating the char and then casting it to byte

buffer.writeByte((byte) s.charAt(i));

This causes Prometheus to fail the scraping as it tries to validate that the label value byte array in the response is a valid UTF-8 encoded string (which is not).

The fix as @tjiuming mentioned is encoding the char using UTF-8 as provided by the ByteBuf class.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 20, 2022
lhotari
lhotari previously approved these changes Oct 20, 2022
@lhotari
Copy link
Member

lhotari commented Oct 20, 2022

@tjiuming Please run the tests in your fork as instructed in https://github.com/apache/pulsar/actions/runs/3288972604/attempts/1#summary-9005069188 .
The PR title needs revisiting, please check https://github.com/apache/pulsar/actions/runs/3288972601/jobs/5419952411 .
monitoring -> monitor .

@lhotari
Copy link
Member

lhotari commented Oct 20, 2022

I wonder if it would also make sense to fix this method:

public SimpleTextOutputStream write(char c) {
buffer.writeByte((byte) c);
return this;
}

@lhotari lhotari dismissed their stale review October 20, 2022 18:19

additional changes required

@tjiuming tjiuming changed the title [fix][monitoring] fix metrics string encoding [fix][monitor] fix metrics string encoding Oct 21, 2022
@tjiuming
Copy link
Contributor Author

I wonder if it would also make sense to fix this method:

public SimpleTextOutputStream write(char c) {
buffer.writeByte((byte) c);
return this;
}

write(char) is used to add some symbols like , \n ", generally, it won't lead to an issue. But, sure, fix it is better.
@lhotari I've fixed it, PTAL

@@ -43,19 +44,16 @@ public SimpleTextOutputStream write(byte[] a, int offset, int len) {
}

public SimpleTextOutputStream write(char c) {
buffer.writeByte((byte) c);
write(String.valueOf(c));
Copy link
Contributor Author

@tjiuming tjiuming Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to use buffer.writeChar(), but it doesn't work

@tjiuming tjiuming requested a review from lhotari October 21, 2022 13:23
@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2022

Codecov Report

Merging #18138 (7cb9d17) into master (6c65ca0) will increase coverage by 10.90%.
The diff coverage is 42.68%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #18138       +/-   ##
=============================================
+ Coverage     34.91%   45.82%   +10.90%     
- Complexity     5707    17600    +11893     
=============================================
  Files           607     1574      +967     
  Lines         53396   128637    +75241     
  Branches       5712    14150     +8438     
=============================================
+ Hits          18644    58943    +40299     
- Misses        32119    63630    +31511     
- Partials       2633     6064     +3431     
Flag Coverage Δ
unittests 45.82% <42.68%> (+10.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ker/stats/prometheus/PrometheusMetricsServlet.java 0.00% <0.00%> (ø)
.../main/java/org/apache/pulsar/PulsarStandalone.java 0.00% <0.00%> (ø)
.../apache/pulsar/broker/admin/impl/ClustersBase.java 76.66% <0.00%> (+67.62%) ⬆️
...pache/pulsar/broker/admin/impl/NamespacesBase.java 63.25% <ø> (+48.72%) ⬆️
.../pulsar/broker/service/AbstractBaseDispatcher.java 50.30% <0.00%> (+4.44%) ⬆️
...che/pulsar/broker/service/BacklogQuotaManager.java 12.39% <0.00%> (+2.91%) ⬆️
.../pulsar/broker/service/BrokerServiceException.java 44.44% <0.00%> (+19.44%) ⬆️
...ava/org/apache/pulsar/broker/service/Consumer.java 67.90% <0.00%> (+5.90%) ⬆️
...ava/org/apache/pulsar/broker/service/Producer.java 62.72% <0.00%> (+3.08%) ⬆️
...pulsar/broker/service/PulsarCommandSenderImpl.java 66.66% <0.00%> (ø)
... and 1185 more

@codelipenghui codelipenghui merged commit 031e37c into apache:master Oct 24, 2022
codelipenghui pushed a commit that referenced this pull request Oct 27, 2022
codelipenghui pushed a commit that referenced this pull request Oct 27, 2022
codelipenghui pushed a commit that referenced this pull request Oct 27, 2022
@codelipenghui codelipenghui modified the milestones: 2.12.0, 2.11.0 Oct 31, 2022
@tjiuming tjiuming deleted the dev/metrics_encoding branch November 1, 2022 17:38
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Nov 9, 2022
(cherry picked from commit 031e37c)
(cherry picked from commit b7def56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants