Skip to content

Commit

Permalink
Unicode categories breaking change (#20605)
Browse files Browse the repository at this point in the history
  • Loading branch information
gewarren authored Sep 12, 2020
1 parent ea49f32 commit 3f722a9
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/core/compatibility/3.1-5.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,9 +271,14 @@ If you're migrating from version 3.1 of .NET Core, ASP.NET Core, or EF Core to v

## Globalization

- [Unicode category changed for some Latin-1 characters](#unicode-category-changed-for-some-latin-1-characters)
- [StringInfo and TextElementEnumerator are now UAX29-compliant](#stringinfo-and-textelementenumerator-are-now-uax29-compliant)
- [Globalization APIs use ICU libraries on Windows](#globalization-apis-use-icu-libraries-on-windows)

[!INCLUDE [unicode-categories-for-latin1-chars](../../../includes/core-changes/globalization/5.0/unicode-categories-for-latin1-chars.md)]

***

[!INCLUDE [uax29-compliant-grapheme-enumeration](../../../includes/core-changes/globalization/5.0/uax29-compliant-grapheme-enumeration.md)]

***
Expand Down
5 changes: 5 additions & 0 deletions docs/core/compatibility/globalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,17 @@ The following breaking changes are documented on this page:

| Breaking change | Version introduced |
| - | :-: |
| [Unicode category changed for some Latin-1 characters](#unicode-category-changed-for-some-latin-1-characters) | 5.0 |
| [Globalization APIs use ICU libraries on Windows](#globalization-apis-use-icu-libraries-on-windows) | 5.0 |
| [StringInfo and TextElementEnumerator are now UAX29-compliant](#stringinfo-and-textelementenumerator-are-now-uax29-compliant) | 5.0 |
| ["C" locale maps to the invariant locale](#c-locale-maps-to-the-invariant-locale) | 3.0 |

## .NET 5.0

[!INCLUDE [unicode-categories-for-latin1-chars](../../../includes/core-changes/globalization/5.0/unicode-categories-for-latin1-chars.md)]

***

[!INCLUDE [icu-globalization-api](../../../includes/core-changes/globalization/5.0/icu-globalization-api.md)]

***
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
### Unicode category changed for some Latin-1 characters

<xref:System.Char> methods now return the correct Unicode category for characters in the Latin-1 range. The category matches that of the Unicode standard.

#### Change description

In previous .NET versions, <xref:System.Char> methods used a fixed list of Unicode categories for characters in the Latin-1 range. However, the Unicode standard has changed the categories of some of these characters since those APIs were implemented, creating a discrepancy. In addition, there was also a discrepancy between <xref:System.Char> and <xref:System.Globalization.CharUnicodeInfo> APIs, which follow the Unicode standard. In .NET 5.0 and later versions, <xref:System.Char> methods use and return the Unicode category that matches the Unicode standard for all characters.

The following table shows the characters whose Unicode categories have changed in .NET 5.0:

| Character | Unicode category<br>in previous .NET versions | Unicode category<br>in .NET 5.0 and later versions |
|:------------:|:---------------------------------------------:|:--------------------------------------------------:|
| § (\u00a7) | `OtherSymbol` | `OtherPunctuation` |
| ª (\u00aa) | `LowercaseLetter` | `OtherLetter` |
| SHY (\u00ad) | `DashPunctuation` | `Format` |
| ¶ (\u00b6) | `OtherSymbol` | `OtherPunctuation` |
| º (\u00ba) | `LowercaseLetter` | `OtherLetter` |

#### Version introduced

.NET 5.0 RC1

#### Recommended action

If you have any code that gets the Unicode character category by using the <xref:System.Char> class and assumes the category will never change, you may need to update it.

#### Reason for change

This change was made so that the categories returned by the <xref:System.Char> type are consistent with both the Unicode standard and the <xref:System.Globalization.CharUnicodeInfo> type.

#### Category

- Core .NET libraries
- Globalization

#### Affected APIs

- <xref:System.Char.GetUnicodeCategory%2A?displayProperty=fullName>
- <xref:System.Char.IsLetter%2A?displayProperty=fullName>
- <xref:System.Char.IsPunctuation%2A?displayProperty=fullName>
- <xref:System.Char.IsSymbol%2A?displayProperty=fullName>
- <xref:System.Char.IsLower%2A?displayProperty=fullName>

Additionally, any class that depends on <xref:System.Char> to obtain the Unicode character category, for example, <xref:System.Text.RegularExpressions.Regex>, is affected by this change.

<!--
#### Affected APIs
- `Overload:System.Char.GetUnicodeCategory`
- `Overload:System.Char.IsLetter`
- `Overload:System.Char.IsPunctuation`
- `Overload:System.Char.IsSymbol`
- `Overload:System.Char.IsLower`
-->

0 comments on commit 3f722a9

Please sign in to comment.