Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct code point format in Base/Char/show function #33291

Merged
merged 6 commits into from
Sep 18, 2019

Conversation

srutzky
Copy link
Contributor

@srutzky srutzky commented Sep 17, 2019

Two minor changes (both on line 307) to conform to the Unicode Standard.

Unicode code points currently display with:

  1. Lowercase letters, a - f, when present
  2. A leading 0 for 5-digit code point values (i.e. 10000 - 9ffff)

However, the Unicode Standard specifies that when using the "U+" notation, you should use:

  1. Uppercase letters
  2. Leading zeros only when the code point would have fewer than four digits (i.e. 0000 - 0FFF)

For reference, the Unicode Standard (two versions to show consistency over time)

states:

In running text, an individual Unicode code point is expressed as U+n, where n is four to six hexadecimal digits, using the digits 0–9 and uppercase letters A–F (for 10 through 15, respectively). Leading zeros are omitted, unless the code point would have fewer than four
hexadecimal digits—for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345.

Two minor changes (both on line 307) to conform to the Unicode Standard.

Unicode code points currently display with:

1. Lowercase letters, a - f, when present
2. A leading 0 for 5-digit code point values (i.e. 10000 - 9ffff)

However, the Unicode Standard specifies that when using the "U+" notation, you should use:

1. Uppercase letters
2. Leading zeros only when the code point would have fewer than four digits (i.e. 0000 - 0FFF)

For reference, the Unicode Standard (two versions to show consistency over time)

* [(Version 12.1, 2019) Appendix A: Notational Conventions ⇒ Code Points](http://www.unicode.org/versions/Unicode12.0.0/appA.pdf)
* [(Version 4.0.0, 2003) Preface: Notational Conventions ⇒ Code Points](http://www.unicode.org/versions/Unicode4.0.0/Preface.pdf)

states:

> In running text, an individual Unicode code point is expressed as U+n, where n is four to six hexadecimal digits, using the digits 0–9 and uppercase letters A–F (for 10 through 15, respectively). Leading zeros are omitted, unless the code point would have fewer than four
hexadecimal digits—for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345.
@JeffBezanson JeffBezanson added display and printing Aesthetics and correctness of printed representations of objects. unicode Related to unicode characters and encodings labels Sep 17, 2019
@stevengj
Copy link
Member

Looks good to me, but could use a test.

@stevengj stevengj added the needs tests Unit tests are required for this change label Sep 17, 2019
@srutzky
Copy link
Contributor Author

srutzky commented Sep 17, 2019

Hi @stevengj . Fair enough, though I am not sure how formal of a test you are requesting. I do not currently have the ability to compile Julia. However, I did verify the syntax using the Julia command-line (i.e. julia.exe) as shown below. Does this suffice for a trivial change such as this one?

Old Syntax

julia> u=0x1b3
0x01b3

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"01b3"

julia> u=0x1b3d
0x1b3d

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"1b3d"

julia> u=0x1b3d5
0x0001b3d5

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"01b3d5"

julia> u=0x1b3d5f
0x001b3d5f

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"1b3d5f"

New Syntax

julia> u=0x1b3
0x01b3

julia> uppercase(string(u, base = 16, pad = 4))
"01B3"

julia> u=0x1b3d
0x1b3d

julia> uppercase(string(u, base = 16, pad = 4))
"1B3D"

julia> u=0x1b3d5
0x0001b3d5

julia> uppercase(string(u, base = 16, pad = 4))
"1B3D5"

julia> u=0x1b3d5f
0x001b3d5f

julia> uppercase(string(u, base = 16, pad = 4))
"1B3D5F"

@stevengj
Copy link
Member

Basically we would want something like

@test repr("text/plain", 'α') == "'α': Unicode U+03B1 (category Ll: Letter, lowercase)"
@test repr("text/plain", '🐨') == "'🐨': Unicode U+1F428 (category So: Symbol, other)"

in test/char.jl.

You can just edit test/char.jl to add these tests to the PR, and then the CI scripts will run them automatically — no need to compile Julia locally.

@srutzky
Copy link
Contributor Author

srutzky commented Sep 17, 2019

Hi @stevengj . Test file has been updated and added to this PR as requested. Please let me know if there is anything I need to change regarding the tests. I just added a new testset to the end of that file.

@stevengj stevengj removed the needs tests Unit tests are required for this change label Sep 17, 2019
@stevengj
Copy link
Member

stevengj commented Sep 18, 2019

Windows failure is #33311. Linux failure is #33312. Both seem unrelated.

@stevengj stevengj merged commit 493c797 into JuliaLang:master Sep 18, 2019
@StefanKarpinski StefanKarpinski added the needs news A NEWS entry is required for this change label Sep 18, 2019
@StefanKarpinski
Copy link
Member

This could use a NEWS entry since it's an observable behavior change.

stevengj added a commit that referenced this pull request Sep 18, 2019
@stevengj
Copy link
Member

Fixed in 64d8ca4

@stevengj stevengj removed the needs news A NEWS entry is required for this change label Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
display and printing Aesthetics and correctness of printed representations of objects. unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants