-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CharSequences#split to support trim and min copies #1335
Conversation
7e416db
to
436d5c6
Compare
test failure attributed to #1224 |
servicetalk-buffer-api/src/main/java/io/servicetalk/buffer/api/CharSequences.java
Outdated
Show resolved
Hide resolved
servicetalk-buffer-api/src/main/java/io/servicetalk/buffer/api/CharSequences.java
Outdated
Show resolved
Hide resolved
servicetalk-buffer-api/src/main/java/io/servicetalk/buffer/api/CharSequences.java
Outdated
Show resolved
Hide resolved
servicetalk-buffer-api/src/main/java/io/servicetalk/buffer/api/CharSequences.java
Outdated
Show resolved
Hide resolved
servicetalk-buffer-api/src/test/java/io/servicetalk/buffer/api/CharSequencesTest.java
Outdated
Show resolved
Hide resolved
if (!expectedRespAcceptedEncodings.isEmpty() && !actualRespAcceptedEncodings.isEmpty()) { | ||
assertEquals(expectedRespAcceptedEncodings, actualRespAcceptedEncodings); | ||
final String respEncName = response.headers() | ||
.get(MESSAGE_ENCODING, "identity").toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps a followup PR, but can you enhance these tests to verify cases where the header isn't expected to be present (e.g. don't force identity via the default method, and explicitly verify if identity is an expected value)? there have been followup work related to identity
being present when not necessary, and headers (grpc-accept-encoding
) with empty values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will follow up on that with the related change on the gRPC headers.
servicetalk-grpc-netty/src/test/java/io/servicetalk/grpc/netty/GrpcMessageEncodingTest.java
Outdated
Show resolved
Hide resolved
servicetalk-grpc-netty/src/test/java/io/servicetalk/grpc/netty/GrpcMessageEncodingTest.java
Outdated
Show resolved
Hide resolved
another test failure attributed to #1224 |
servicetalk-buffer-api/src/main/java/io/servicetalk/buffer/api/CharSequences.java
Outdated
Show resolved
Hide resolved
} else { | ||
for (int i = 0; i < input.length(); i++) { | ||
char c = input.charAt(i); | ||
endIndex = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have an issue if there are consecutive delimiters. Here is a test case to add:
assertThat(split(",,,", ',', false), empty());
IIUC we should only need a the following state:
i
-> current indexstartIndex
-> last non-delimiterchar
(reset after each subsequence)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same issue applies for the trim
path too:
assertThat(split(",,,", ',', true), empty());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. Even String#split
has different outcomes which I find surprising.
" ,,,, ".split(",") => [ , , , , ]
",,,, ".split(",") => [, , , , ]
" ,,,,".split(",") => [ ]
" ,,, ,".split(",") => [ , , , ]
" , , , , ".split(",") => [ , , , , ]
Python on the other hand, is more predictable ie. WYSIWYG
eg.
" ,,, ,".split(",") => [' ', '', '', ' ', '']
My expectation is the same; that if you have N delimiters in a string, you want N+1 elements in the result set, regardless if they are empty. Otherwise its surprising to the user for this split(",,,", ',', _)
to be equal to split("", ',', _)
.
With the trim=true & del=' ', that is a special case, where I chose to give priority to the del, thus these two should behave similarly
",,,,".split(",") => [ , , , , ]
"\s\s\s\s".split(" ") => [ , , , , ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯 interesting discussion.
Ideally, we should be consistent with JDK behavior of String#split
, but I'm agreed that it behaves with lots of confusion.
Let's merge the current version, because it fixes the bug. And we can follow-up here later. Anyway, usage of the split
method is limited atm to compression headers only.
if (endIndex > startIndex) { | ||
result.add(subsequence(isAscii, input, startIndex, endIndex)); | ||
} else { | ||
result.add(isAscii ? newAsciiString("") : ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment if the following unit test is correct? If so this code needs to be adjusted.
// there are no non-delimiter chars present -> content should be empty
assertThat(split(" ", ' ', true), empty());
we may need more state (e.g. differentiate between last index of non-space and delimiter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the case anymore behavior is as described in comment above.
split(" ", ' ', true) -> ["", "", ""]
similar to
split(",,", ' ', true) -> ["", "", ""]
servicetalk-buffer-api/src/main/java/io/servicetalk/buffer/api/CharSequences.java
Outdated
Show resolved
Hide resolved
servicetalk-buffer-api/src/test/java/io/servicetalk/buffer/api/CharSequencesTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG!
Motivation:
CharSequences#split needed some improvements to avoid extra allocations for trimming (externally) and conversion to String due to calls to subsequence.
Modifications:
This PR improves the helper to support trimming individual tokens without additional allocations, and also better utilizes AsciiBuffer internals to avoid unnecessary conversions.
Result:
API can now handle representations that contain whitespaces in between delimiters.
Fixes: #1216