Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code unit definition in Overview.md #61

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dcodeIO
Copy link
Contributor

@dcodeIO dcodeIO commented Mar 28, 2023

As per the overview, code unit is defined as "an indivisible unit of an encoded unicode scalar value". While individual 8-bit, 16-bit or 32-bit code units are indivisible bit combinations, individual units do not represent scalar values. In 16-bit strings for example, a surrogate pair is a divisible combination of two indivisible code units, both surrogates, that are not scalar values. In 8-bit strings, individual code units map to neither scalar values nor code points.

As per Unicode, Glossary:

Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.)

Suggesting to basically copy Unicode's definition and drop the reference to scalar values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant