-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][monitor] fix metrics string encoding #18138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks, @tjiuming for that prompt fix!
The Prometheus Text Format requires label values to be UTF-8 encoded, as written here:
Today, as @tjiuming mentioned, the current implementation iterates over the String and each character is written in an undefined encoding - i.e. manipulating the
This causes Prometheus to fail the scraping as it tries to validate that the label value byte array in the response is a valid UTF-8 encoded string (which is not). The fix as @tjiuming mentioned is encoding the char using UTF-8 as provided by the |
@tjiuming Please run the tests in your fork as instructed in https://github.com/apache/pulsar/actions/runs/3288972604/attempts/1#summary-9005069188 . |
I wonder if it would also make sense to fix this method: pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/util/SimpleTextOutputStream.java Lines 46 to 49 in b59d0e9
|
|
@@ -43,19 +44,16 @@ public SimpleTextOutputStream write(byte[] a, int offset, int len) { | |||
} | |||
|
|||
public SimpleTextOutputStream write(char c) { | |||
buffer.writeByte((byte) c); | |||
write(String.valueOf(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to use buffer.writeChar()
, but it doesn't work
Codecov Report
@@ Coverage Diff @@
## master #18138 +/- ##
=============================================
+ Coverage 34.91% 45.82% +10.90%
- Complexity 5707 17600 +11893
=============================================
Files 607 1574 +967
Lines 53396 128637 +75241
Branches 5712 14150 +8438
=============================================
+ Hits 18644 58943 +40299
- Misses 32119 63630 +31511
- Partials 2633 6064 +3431
Flags with carried forward coverage won't be shown. Click here to find out more.
|
(cherry picked from commit 031e37c)
(cherry picked from commit 031e37c)
(cherry picked from commit 031e37c)
Motivation
For the method SimpleTextOutputStream#write(String), it's implementation as below:
Generally, it works fine. But if there is a character has different length in different encoding, it would lead to an issue, say
¬
.In latin1, it's bytes is [-84], by calling
"¬".getBytes(StandardCharset.ISO_8859_1)
but in UTF8, it's bytes is [-62, -84], by calling
"¬".getBytes(StandardCharset.UTF_8)
if we replace
str.getBytes(charset)
with(byte) s.charAt(i)
,¬
will display as�
.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: tjiuming#6