Skip to content

Conversation

StuartMosquera
Copy link
Contributor

Summary

  • Replace all instances of the word 'characters' with 'sbytes' in the description of the length parameter in the String constructor that uses sbyte*.

Fixes #11136

@StuartMosquera StuartMosquera requested a review from a team as a code owner September 2, 2025 01:23
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Sep 2, 2025
@tannergooding
Copy link
Member

This likely needs weigh in and comparison with other docs.

While the term characters can be confusing, it is not incorrect and matches the Unicode spec. That is, for sbyte the length represents number of 8-bit characters much as for System.Char it is the number of 16-bit characters.

It is not the number of code units or the number of code points, which are the likely what @rindlespot meant in the issue.

This particularly comes into play for UTF-16 when it comes to surrogate pairs, which are 2x UTF-16 characters, each representing 1x Unicode Code Unit, which combine to form a single Code Point.

For UTF-8, you have multiple characters which combine to form code units, which are then also valid code points since surrogates are illegal in UTF-8.

@tannergooding
Copy link
Member

It's in general a space where terms get thrown around and interchanged and while improving the wording can be good, we should ensure we do so consistently throughout the docs

@StuartMosquera
Copy link
Contributor Author

Could I clarify the bit width instead?

  • For char*: length: The number of characters (16-bit) within value to use.
  • For sbyte*: length: The number of characters (8-bit) within value to use.

The constructor with sbyte* already specifies its '8-bit' width in the value parameter description, while the one with char* does not explicitly mention its bit width.

@tannergooding
Copy link
Member

Looks like other places, like System.Text.Encoding.Utf8 and Utf8Formatter, etc

We're simply using "the number of characters" when referring to UTF-16 and "the number of bytes" when referring to UTF-8. So "the number of sbytes" here is probably fine.

CC. @gewarren for secondary input and wording suggestions

Copy link
Contributor

@gewarren gewarren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's Tanner approved, then I'm good with it :)

@gewarren gewarren merged commit a397890 into dotnet:main Sep 5, 2025
6 checks passed
@StuartMosquera StuartMosquera deleted the fix-length-parameter-description branch September 5, 2025 16:04
@tannergooding
Copy link
Member

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Length parameter description confusing for constructors with sbyte*
3 participants