@@ -543,8 +543,10 @@ scalar value, even when it is encoded using multiple bytes. When Unicode mode
543543is disabled (e.g., `(?-u:.)`), then `.` will match a single byte in all cases.
544544* The character classes `\w`, `\d` and `\s` are all Unicode-aware by default.
545545Use `(?-u:\w)`, `(?-u:\d)` and `(?-u:\s)` to get their ASCII-only definitions.
546- * Similarly, `\b` and `\B` use a Unicode definition of a "word" character. To
547- get ASCII-only word boundaries, use `(?-u:\b)` and `(?-u:\B)`.
546+ * Similarly, `\b` and `\B` use a Unicode definition of a "word" character.
547+ To get ASCII-only word boundaries, use `(?-u:\b)` and `(?-u:\B)`. This also
548+ applies to the special word boundary assertions. (That is, `\b{start}`,
549+ `\b{end}`, `\b{start-half}`, `\b{end-half}`.)
548550* `^` and `$` are **not** Unicode-aware in multi-line mode. Namely, they only
549551recognize `\n` (assuming CRLF mode is not enabled) and not any of the other
550552forms of line terminators defined by Unicode.
@@ -723,12 +725,16 @@ x{n}? exactly n x
723725### Empty matches
724726
725727<pre class="rust">
726- ^ the beginning of a haystack (or start-of-line with multi-line mode)
727- $ the end of a haystack (or end-of-line with multi-line mode)
728- \A only the beginning of a haystack (even with multi-line mode enabled)
729- \z only the end of a haystack (even with multi-line mode enabled)
730- \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
731- \B not a Unicode word boundary
728+ ^ the beginning of a haystack (or start-of-line with multi-line mode)
729+ $ the end of a haystack (or end-of-line with multi-line mode)
730+ \A only the beginning of a haystack (even with multi-line mode enabled)
731+ \z only the end of a haystack (even with multi-line mode enabled)
732+ \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
733+ \B not a Unicode word boundary
734+ \b{start}, \< a Unicode start-of-word boundary (\W|\A on the left, \w on the right)
735+ \b{end}, \> a Unicode end-of-word boundary (\w on the left, \W|\z on the right))
736+ \b{start-half} half of a Unicode start-of-word boundary (\W|\A on the left)
737+ \b{end-half} half of a Unicode end-of-word boundary (\W|\z on the right)
732738</pre>
733739
734740The empty regex is valid and matches the empty string. For example, the
@@ -856,28 +862,32 @@ Note that this includes all possible escape sequences, even ones that are
856862documented elsewhere.
857863
858864<pre class="rust">
859- \* literal *, applies to all ASCII except [0-9A-Za-z<>]
860- \a bell (\x07)
861- \f form feed (\x0C)
862- \t horizontal tab
863- \n new line
864- \r carriage return
865- \v vertical tab (\x0B)
866- \A matches at the beginning of a haystack
867- \z matches at the end of a haystack
868- \b word boundary assertion
869- \B negated word boundary assertion
870- \123 octal character code, up to three digits (when enabled)
871- \x7F hex character code (exactly two digits)
872- \x{10FFFF} any hex character code corresponding to a Unicode code point
873- \u007F hex character code (exactly four digits)
874- \u{7F} any hex character code corresponding to a Unicode code point
875- \U0000007F hex character code (exactly eight digits)
876- \U{7F} any hex character code corresponding to a Unicode code point
877- \p{Letter} Unicode character class
878- \P{Letter} negated Unicode character class
879- \d, \s, \w Perl character class
880- \D, \S, \W negated Perl character class
865+ \* literal *, applies to all ASCII except [0-9A-Za-z<>]
866+ \a bell (\x07)
867+ \f form feed (\x0C)
868+ \t horizontal tab
869+ \n new line
870+ \r carriage return
871+ \v vertical tab (\x0B)
872+ \A matches at the beginning of a haystack
873+ \z matches at the end of a haystack
874+ \b word boundary assertion
875+ \B negated word boundary assertion
876+ \b{start}, \< start-of-word boundary assertion
877+ \b{end}, \> end-of-word boundary assertion
878+ \b{start-half} half of a start-of-word boundary assertion
879+ \b{end-half} half of a end-of-word boundary assertion
880+ \123 octal character code, up to three digits (when enabled)
881+ \x7F hex character code (exactly two digits)
882+ \x{10FFFF} any hex character code corresponding to a Unicode code point
883+ \u007F hex character code (exactly four digits)
884+ \u{7F} any hex character code corresponding to a Unicode code point
885+ \U0000007F hex character code (exactly eight digits)
886+ \U{7F} any hex character code corresponding to a Unicode code point
887+ \p{Letter} Unicode character class
888+ \P{Letter} negated Unicode character class
889+ \d, \s, \w Perl character class
890+ \D, \S, \W negated Perl character class
881891</pre>
882892
883893### Perl character classes (Unicode friendly)
0 commit comments