Correct code point format in Base/Char/show function #33291

srutzky · 2019-09-17T03:20:49Z

Two minor changes (both on line 307) to conform to the Unicode Standard.

Unicode code points currently display with:

Lowercase letters, a - f, when present
A leading 0 for 5-digit code point values (i.e. 10000 - 9ffff)

However, the Unicode Standard specifies that when using the "U+" notation, you should use:

Uppercase letters
Leading zeros only when the code point would have fewer than four digits (i.e. 0000 - 0FFF)

For reference, the Unicode Standard (two versions to show consistency over time)

states:

In running text, an individual Unicode code point is expressed as U+n, where n is four to six hexadecimal digits, using the digits 0–9 and uppercase letters A–F (for 10 through 15, respectively). Leading zeros are omitted, unless the code point would have fewer than four
hexadecimal digits—for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345.

Two minor changes (both on line 307) to conform to the Unicode Standard. Unicode code points currently display with: 1. Lowercase letters, a - f, when present 2. A leading 0 for 5-digit code point values (i.e. 10000 - 9ffff) However, the Unicode Standard specifies that when using the "U+" notation, you should use: 1. Uppercase letters 2. Leading zeros only when the code point would have fewer than four digits (i.e. 0000 - 0FFF) For reference, the Unicode Standard (two versions to show consistency over time) * [(Version 12.1, 2019) Appendix A: Notational Conventions ⇒ Code Points](http://www.unicode.org/versions/Unicode12.0.0/appA.pdf) * [(Version 4.0.0, 2003) Preface: Notational Conventions ⇒ Code Points](http://www.unicode.org/versions/Unicode4.0.0/Preface.pdf) states: > In running text, an individual Unicode code point is expressed as U+n, where n is four to six hexadecimal digits, using the digits 0–9 and uppercase letters A–F (for 10 through 15, respectively). Leading zeros are omitted, unless the code point would have fewer than four hexadecimal digits—for example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345.

stevengj · 2019-09-17T15:10:08Z

Looks good to me, but could use a test.

srutzky · 2019-09-17T15:30:35Z

Hi @stevengj . Fair enough, though I am not sure how formal of a test you are requesting. I do not currently have the ability to compile Julia. However, I did verify the syntax using the Julia command-line (i.e. julia.exe) as shown below. Does this suffice for a trivial change such as this one?

Old Syntax

julia> u=0x1b3
0x01b3

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"01b3"

julia> u=0x1b3d
0x1b3d

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"1b3d"

julia> u=0x1b3d5
0x0001b3d5

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"01b3d5"

julia> u=0x1b3d5f
0x001b3d5f

julia> string(u, base = 16, pad = u ≤ 0xffff ? 4 : 6)
"1b3d5f"

New Syntax

julia> u=0x1b3
0x01b3

julia> uppercase(string(u, base = 16, pad = 4))
"01B3"

julia> u=0x1b3d
0x1b3d

julia> uppercase(string(u, base = 16, pad = 4))
"1B3D"

julia> u=0x1b3d5
0x0001b3d5

julia> uppercase(string(u, base = 16, pad = 4))
"1B3D5"

julia> u=0x1b3d5f
0x001b3d5f

julia> uppercase(string(u, base = 16, pad = 4))
"1B3D5F"

stevengj · 2019-09-17T15:54:49Z

Basically we would want something like

@test repr("text/plain", 'α') == "'α': Unicode U+03B1 (category Ll: Letter, lowercase)"
@test repr("text/plain", '🐨') == "'🐨': Unicode U+1F428 (category So: Symbol, other)"

in test/char.jl.

You can just edit test/char.jl to add these tests to the PR, and then the CI scripts will run them automatically — no need to compile Julia locally.

srutzky · 2019-09-17T21:43:29Z

Hi @stevengj . Test file has been updated and added to this PR as requested. Please let me know if there is anything I need to change regarding the tests. I just added a new testset to the end of that file.

stevengj · 2019-09-18T15:22:54Z

Windows failure is #33311. Linux failure is #33312. Both seem unrelated.

StefanKarpinski · 2019-09-18T19:02:53Z

This could use a NEWS entry since it's an observable behavior change.

stevengj · 2019-09-18T19:41:44Z

Fixed in 64d8ca4

JeffBezanson added display and printing Aesthetics and correctness of printed representations of objects. unicode Related to unicode characters and encodings labels Sep 17, 2019

stevengj added the needs tests Unit tests are required for this change label Sep 17, 2019

Add tests for U+ syntax formatting

96d668d

stevengj approved these changes Sep 17, 2019

View reviewed changes

stevengj removed the needs tests Unit tests are required for this change label Sep 17, 2019

srutzky added 4 commits September 17, 2019 23:44

Update code point format to match change in show() function

8a6280e

Update code point format to match change in show() function

f364e43

Update code point format to match change in show() function

0452e63

Update code point format to match change in show() function

2492028

stevengj mentioned this pull request Sep 18, 2019

watch_folder timeout failure on Windows #33311

Closed

stevengj mentioned this pull request Sep 18, 2019

OutOfMemoryError in testset Profile on 32-bit Linux #33312

Closed

stevengj merged commit 493c797 into JuliaLang:master Sep 18, 2019

StefanKarpinski added the needs news A NEWS entry is required for this change label Sep 18, 2019

stevengj added a commit that referenced this pull request Sep 18, 2019

news for #33291

64d8ca4

stevengj removed the needs news A NEWS entry is required for this change label Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct code point format in Base/Char/show function #33291

Correct code point format in Base/Char/show function #33291

srutzky commented Sep 17, 2019

stevengj commented Sep 17, 2019

srutzky commented Sep 17, 2019

stevengj commented Sep 17, 2019

srutzky commented Sep 17, 2019

stevengj commented Sep 18, 2019 •

edited

Loading

StefanKarpinski commented Sep 18, 2019

stevengj commented Sep 18, 2019

Correct code point format in Base/Char/show function #33291

Correct code point format in Base/Char/show function #33291

Conversation

srutzky commented Sep 17, 2019

stevengj commented Sep 17, 2019

srutzky commented Sep 17, 2019

Old Syntax

New Syntax

stevengj commented Sep 17, 2019

srutzky commented Sep 17, 2019

stevengj commented Sep 18, 2019 • edited Loading

StefanKarpinski commented Sep 18, 2019

stevengj commented Sep 18, 2019

stevengj commented Sep 18, 2019 •

edited

Loading