Skip to content

Updates for RFC: Locale-independent case conversion #1934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 21 additions & 5 deletions language-snippets.ent
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,6 @@ cryptographically secure value, consider using <function>random_int</function>,
<!ENTITY note.bin-safe '<note xmlns="http://docbook.org/ns/docbook"><simpara>This function is
binary-safe.</simpara></note>'>

<!ENTITY note.locale-single-byte '<note xmlns="http://docbook.org/ns/docbook"><simpara>This function is locale-aware
and will handle input according to the currently set locale. However, it only works on single-byte character sets.
If you need to use multibyte characters (most non-western-European languages) look at the
<link linkend="book.mbstring">multibyte</link> or <link linkend="book.intl">intl</link> extensions instead.</simpara></note>'>

<!ENTITY note.clearstatcache '<note xmlns="http://docbook.org/ns/docbook"><simpara>The results of this
function are cached. See <function>clearstatcache</function> for
more details.</simpara></note>'>
Expand Down Expand Up @@ -3696,6 +3691,27 @@ local: {
</row>
'>

<!ENTITY strings.changelog.ascii-case-conversion '
<row xmlns="http://docbook.org/ns/docbook">
<entry>8.2.0</entry>
<entry>
Case conversion no longer depends on the locale set with
<function>setlocale</function>. Only ASCII characters will be converted.
</entry>
</row>
'>

<!ENTITY strings.changelog.ascii-case-folding '
<row xmlns="http://docbook.org/ns/docbook">
<entry>8.2.0</entry>
<entry>
Case folding no longer depends on the locale set with
<function>setlocale</function>. Only ASCII case folding will be done.
Non-ASCII bytes will be compared by their byte value.
</entry>
</row>
'>

<!-- filter snippets -->
<!ENTITY filter.param.filter '
<varlistentry xmlns="http://docbook.org/ns/docbook">
Expand Down
14 changes: 8 additions & 6 deletions reference/array/constants.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
<constant>CASE_LOWER</constant> is used with
<function>array_change_key_case</function> and is used to convert array
keys to lower case. This is also the default case for
<function>array_change_key_case</function>.
<function>array_change_key_case</function>. As of PHP 8.2.0, only ASCII
characters will be converted.
</simpara>
</listitem>
</varlistentry>
Expand All @@ -28,7 +29,8 @@
<simpara>
<constant>CASE_UPPER</constant> is used with
<function>array_change_key_case</function> and is used to convert array
keys to upper case.
keys to upper case. As of PHP 8.2.0, only ASCII characters will be
converted.
</simpara>
</listitem>
</varlistentry>
Expand Down Expand Up @@ -130,10 +132,10 @@
</term>
<listitem>
<simpara>
<constant>SORT_FLAG_CASE</constant> can be combined
(bitwise OR) with
<constant>SORT_STRING</constant> or
<constant>SORT_NATURAL</constant> to sort strings case-insensitively.
<constant>SORT_FLAG_CASE</constant> can be combined (bitwise OR) with
<constant>SORT_STRING</constant> or <constant>SORT_NATURAL</constant> to
sort strings case-insensitively. As of PHP 8.2.0, only ASCII case folding
will be done.
</simpara>
</listitem>
</varlistentry>
Expand Down
25 changes: 19 additions & 6 deletions reference/strings/functions/lcfirst.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,8 @@
<para>
Returns a string with the first character of
<parameter>string</parameter> lowercased if that character is
alphabetic.
</para>
<para>
Note that 'alphabetic' is determined by the current locale. For
instance, in the default "C" locale characters such as umlaut-a
(ä) will not be converted.
an ASCII character in the range <literal>"A"</literal> (0x41) to
<literal>"Z"</literal> (0x5a).
</para>
</refsect1>

Expand All @@ -47,6 +43,23 @@
</para>
</refsect1>

<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-conversion;
</tbody>
</tgroup>
</informaltable>
</refsect1>

<refsect1 role="examples">
&reftitle.examples;
<para>
Expand Down
2 changes: 1 addition & 1 deletion reference/strings/functions/setlocale.xml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
<listitem>
<simpara>
<constant>LC_CTYPE</constant> for character classification and conversion, for
example <function>strtoupper</function>
example <function>ctype_alpha</function>
</simpara>
</listitem>
<listitem>
Expand Down
17 changes: 17 additions & 0 deletions reference/strings/functions/str-ireplace.xml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,23 @@
</para>
</refsect1>

<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-folding;
</tbody>
</tgroup>
</informaltable>
</refsect1>

<refsect1 role="examples">
&reftitle.examples;
<para>
Expand Down
1 change: 1 addition & 0 deletions reference/strings/functions/stripos.xml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-folding;
<row>
<entry>8.0.0</entry>
<entry>
Expand Down
1 change: 1 addition & 0 deletions reference/strings/functions/stristr.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-folding;
<row>
<entry>8.0.0</entry>
<entry>
Expand Down
1 change: 1 addition & 0 deletions reference/strings/functions/strripos.xml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-folding;
<row>
<entry>8.0.0</entry>
<entry>
Expand Down
30 changes: 26 additions & 4 deletions reference/strings/functions/strtolower.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,18 @@
<methodparam><type>string</type><parameter>string</parameter></methodparam>
</methodsynopsis>
<para>
Returns <parameter>string</parameter> with all alphabetic characters
Returns <parameter>string</parameter> with all ASCII alphabetic characters
converted to lowercase.
</para>
<para>
Note that 'alphabetic' is determined by the current locale. This means
that e.g. in the default "C" locale, characters such as umlaut-A
(Ä) will not be converted.
Bytes in the range <literal>"A"</literal> (0x41) to <literal>"Z"</literal>
(0x5a) will be converted to the corresponding lowercase letter by adding 32
to each byte value.
</para>
<para>
This can be used to convert ASCII characters within strings encoded with
UTF-8, since multibyte UTF-8 characters will be ignored. To convert multibyte
non-ASCII characters, use <function>mb_strtolower</function>.
</para>
</refsect1>

Expand All @@ -46,6 +51,23 @@
</para>
</refsect1>

<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-conversion;
</tbody>
</tgroup>
</informaltable>
</refsect1>

<refsect1 role="examples">
&reftitle.examples;
<para>
Expand Down
30 changes: 26 additions & 4 deletions reference/strings/functions/strtoupper.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,18 @@
<methodparam><type>string</type><parameter>string</parameter></methodparam>
</methodsynopsis>
<para>
Returns <parameter>string</parameter> with all alphabetic characters
Returns <parameter>string</parameter> with all ASCII alphabetic characters
converted to uppercase.
</para>
<para>
Note that 'alphabetic' is determined by the current locale. For instance,
in the default "C" locale characters such as umlaut-a (ä) will not be
converted.
Bytes in the range <literal>"a"</literal> (0x61) to <literal>"z"</literal>
(0x7a) will be converted to the corresponding uppercase letter by subtracting
32 from each byte value.
</para>
<para>
This can be used to convert ASCII characters within strings encoded with
UTF-8, since multibyte UTF-8 characters will be ignored. To convert multibyte
non-ASCII characters, use <function>mb_strtoupper</function>.
</para>
</refsect1>

Expand All @@ -46,6 +51,23 @@
</para>
</refsect1>

<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-conversion;
</tbody>
</tgroup>
</informaltable>
</refsect1>

<refsect1 role="examples">
&reftitle.examples;
<para>
Expand Down
26 changes: 20 additions & 6 deletions reference/strings/functions/ucfirst.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,8 @@
<para>
Returns a string with the first character of
<parameter>string</parameter> capitalized, if that character is
alphabetic.
</para>
<para>
Note that 'alphabetic' is determined by the current locale. For
instance, in the default "C" locale characters such as umlaut-a
(ä) will not be converted.
an ASCII character in the range from <literal>"a"</literal> (0x61) to
<literal>"z"</literal> (0x7a).
</para>
</refsect1>

Expand All @@ -47,6 +43,23 @@
</para>
</refsect1>

<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-conversion;
</tbody>
</tgroup>
</informaltable>
</refsect1>

<refsect1 role="examples">
&reftitle.examples;
<para>
Expand Down Expand Up @@ -76,6 +89,7 @@ $bar = ucfirst(strtolower($bar)); // Hello world!
<member><function>strtolower</function></member>
<member><function>strtoupper</function></member>
<member><function>ucwords</function></member>
<member><function>mb_convert_case</function></member>
</simplelist>
</para>
</refsect1>
Expand Down
27 changes: 25 additions & 2 deletions reference/strings/functions/ucwords.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,20 @@
</methodsynopsis>
<para>
Returns a string with the first character of each word in
<parameter>string</parameter> capitalized, if that character is alphabetic.
<parameter>string</parameter> capitalized, if that character is an ASCII
character between <literal>"a"</literal> (0x61) and <literal>"z"</literal>
(0x7a).
</para>
<para>
For this function, a word is a string of characters that are not listed in
the <parameter>separators</parameter> parameter. By default, these are:
space, horizontal tab, carriage return, newline, form-feed and vertical tab.
</para>
<para>
To do a similar conversion on multibyte strings, use
<function>mb_convert_case</function> with the <constant>MB_CASE_TITLE</constant>
mode.
</para>
</refsect1>

<refsect1 role="parameters">
Expand Down Expand Up @@ -55,6 +62,23 @@
</para>
</refsect1>

<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
&strings.changelog.ascii-case-conversion;
</tbody>
</tgroup>
</informaltable>
</refsect1>

<refsect1 role="examples">
&reftitle.examples;
<para>
Expand Down Expand Up @@ -110,7 +134,6 @@ $baz = ucwords($foo, " \t\r\n\f\v'"); // Mike O'Hara

<refsect1 role="notes">
&reftitle.notes;
&note.locale-single-byte;
&note.bin-safe;
</refsect1>

Expand Down