Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NRE with UnicodeEncoding when target is an empty span #97950

Merged
merged 6 commits into from
Mar 8, 2024

Conversation

manandre
Copy link
Contributor

@manandre manandre commented Feb 4, 2024

Fixes #89931

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 4, 2024
@ghost
Copy link

ghost commented Feb 4, 2024

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #89931

Author: manandre
Assignees: -
Labels:

area-System.Text.Encoding

Milestone: -

@@ -1535,7 +1535,7 @@ internal sealed override unsafe int GetCharCount(byte* bytes, int count, Decoder
}

// Valid surrogate pair, add our lastChar (will need 2 chars)
if (chars >= charEnd - 1)
if (chars >= charEnd - 1 || chars == charEnd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncertain about the rationale behind this modification. when chars == charEnd, this implies that the condition chars >= charEnd - 1 remains true. The added part is not changing anything. am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charEnd can be null. charEnd - 1 will underflow in that case, and the first condition won't be true.

Copy link
Member

@tarekgh tarekgh Feb 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing charEnd - chars is used in some other place in the method too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, should we have a simple check in the top of the method to check that and just throw if we have any bytes to decode?

It would not be correct, and it would be a breaking change. Just because we have some input bytes does not mean that we end up outputting any chars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks clear but it is tricky if anyone change it in the future and not aware about the case char* chars = (char*)1; or possible underflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so what would you like to comment to say? "If you are changing this code, be careful about integer overflows and underflows."?

FWIW, this is a general concern with any unmanaged pointer arithmetic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not worth the comment, that is ok. It was not obvious to me we can get char* is null or 1 value as I am not expecting this to happen in this code base. The comment in my mind is something like Exercise caution when manipulating pointers to prevent potential underflow issues. In certain situations, the char* can be equal to 1 and may also have a length of zero.

I'll leave it to you to decide. I am ok either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manandre Do you have a preference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is a generic concern in unsafe world. We should not repeat it on each occurrence.

@manandre
Copy link
Contributor Author

manandre commented Feb 4, 2024

I have added the same fix for UTF32Encoding.

@@ -23,5 +23,18 @@ public void GetDecoder()
Assert.Equal(sourceChars, destinationChars);
}
}

[Fact]
public void GetDecoder_NegativeTests()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These negative tests should be added to NegativeEncodingTests Encoder_Convert_Invalid instead. NegativeEncodingTests theories will run it over all encodings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test moved to the NegativeEncodingTests.Decoder_Convert_Invalid function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was failing for the UTF7 encoding, so I have fixed the corresponding code.

@@ -1297,7 +1297,7 @@ internal unsafe bool AddChar(char ch, int numBytes)
{
// Throw maybe
_bytes -= numBytes; // Didn't encode these bytes
_enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw?
_enc.ThrowCharsOverflow(_decoder, _chars == _charStart); // Throw?
return false; // No throw, but no store either
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same pattern here:

if (_chars >= _charEnd)
{
// Throw maybe
_bytes -= numBytes; // Didn't encode these bytes
_enc.ThrowCharsOverflow(_decoder, _bytes <= _byteStart); // Throw?

Does it have the same problem - are we missing coverage for this path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have the same issue here. There is no test coverage for this path, but, despite multiple attempts, I do not manage to make the test failing before applying the fix :/
These encodings are quite complex and unknown to me. I am not sure to be able to prove the fix is required but I would recommend to keep the EncodingCharBuffers implementations aligned.

@jeffhandley
Copy link
Member

@tarekgh Could you re-review when you have a chance to ensure your feedback has been addressed and this is ready for merge?

@tarekgh tarekgh merged commit 861164c into dotnet:main Mar 8, 2024
178 checks passed
@manandre manandre deleted the fix-unicode-nre branch March 8, 2024 19:35
@github-actions github-actions bot locked and limited conversation to collaborators Apr 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NRE in encoders
5 participants