-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NRE with UnicodeEncoding when target is an empty span #97950
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsFixes #89931
|
@@ -1535,7 +1535,7 @@ internal sealed override unsafe int GetCharCount(byte* bytes, int count, Decoder | |||
} | |||
|
|||
// Valid surrogate pair, add our lastChar (will need 2 chars) | |||
if (chars >= charEnd - 1) | |||
if (chars >= charEnd - 1 || chars == charEnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncertain about the rationale behind this modification. when chars == charEnd
, this implies that the condition chars >= charEnd - 1
remains true. The added part is not changing anything. am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
charEnd
can be null. charEnd - 1
will underflow in that case, and the first condition won't be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing charEnd - chars
is used in some other place in the method too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?
It would not be correct, and it would be a breaking change. Just because we have some input bytes does not mean that we end up outputting any chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks clear but it is tricky if anyone change it in the future and not aware about the case char* chars = (char*)1;
or possible underflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so what would you like to comment to say? "If you are changing this code, be careful about integer overflows and underflows."?
FWIW, this is a general concern with any unmanaged pointer arithmetic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not worth the comment, that is ok. It was not obvious to me we can get char* is null or 1
value as I am not expecting this to happen in this code base. The comment in my mind is something like Exercise caution when manipulating pointers to prevent potential underflow issues. In certain situations, the char* can be equal to 1 and may also have a length of zero.
I'll leave it to you to decide. I am ok either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manandre Do you have a preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is a generic concern in unsafe world. We should not repeat it on each occurrence.
src/libraries/System.Private.CoreLib/src/System/Text/UnicodeEncoding.cs
Outdated
Show resolved
Hide resolved
I have added the same fix for |
@@ -23,5 +23,18 @@ public void GetDecoder() | |||
Assert.Equal(sourceChars, destinationChars); | |||
} | |||
} | |||
|
|||
[Fact] | |||
public void GetDecoder_NegativeTests() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These negative tests should be added to NegativeEncodingTests
Encoder_Convert_Invalid
instead. NegativeEncodingTests theories will run it over all encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test moved to the NegativeEncodingTests.Decoder_Convert_Invalid
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test was failing for the UTF7 encoding, so I have fixed the corresponding code.
@@ -1297,7 +1297,7 @@ internal unsafe bool AddChar(char ch, int numBytes) | |||
{ | |||
// Throw maybe | |||
_bytes -= numBytes; // Didn't encode these bytes | |||
_enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw? | |||
_enc.ThrowCharsOverflow(_decoder, _chars == _charStart); // Throw? | |||
return false; // No throw, but no store either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the same pattern here:
runtime/src/libraries/System.Text.Encoding.CodePages/src/System/Text/EncodingCharBuffer.cs
Lines 52 to 56 in 89bba37
if (_chars >= _charEnd) | |
{ | |
// Throw maybe | |
_bytes -= numBytes; // Didn't encode these bytes | |
_enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw? |
Does it have the same problem - are we missing coverage for this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have the same issue here. There is no test coverage for this path, but, despite multiple attempts, I do not manage to make the test failing before applying the fix :/
These encodings are quite complex and unknown to me. I am not sure to be able to prove the fix is required but I would recommend to keep the EncodingCharBuffers
implementations aligned.
@tarekgh Could you re-review when you have a chance to ensure your feedback has been addressed and this is ready for merge? |
Fixes #89931