Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex reference #22210

Merged
merged 29 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions files/en-us/_redirects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12274,6 +12274,7 @@
/en-US/docs/Web/JavaScript/Guide/Predefined_Core_Objects /en-US/docs/Web/JavaScript/Guide
/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Boundaries /en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions
/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges /en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences
/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape
/en-US/docs/Web/JavaScript/Guide/Sameness /en-US/docs/Web/JavaScript/Equality_comparisons_and_sameness
/en-US/docs/Web/JavaScript/Guide/Statements /en-US/docs/Web/JavaScript/Guide/Control_flow_and_error_handling
/en-US/docs/Web/JavaScript/Guide/The_Iterator_protocol /en-US/docs/Web/JavaScript/Reference/Iteration_protocols
Expand Down
15 changes: 0 additions & 15 deletions files/en-us/_wikihistory.json
Original file line number Diff line number Diff line change
Expand Up @@ -104778,21 +104778,6 @@
"jpmedley"
]
},
"Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes": {
"modified": "2020-07-12T19:11:35.411Z",
"contributors": [
"JNa0",
"hinell",
"wbamberg",
"fscholz",
"SphinxKnight",
"dennisja",
"chrisdavidmills",
"Windrill",
"Artoria2e5",
"jpmedley"
]
},
"Web/JavaScript/Guide/Text_formatting": {
"modified": "2020-05-25T10:48:56.137Z",
"contributors": [
Expand Down
2 changes: 1 addition & 1 deletion files/en-us/mozilla/firefox/releases/78/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ See also [New in Firefox 78: DevTools improvements, new regex engine, and abunda

- [Lookbehind assertions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions) ([Firefox bug 1225665](https://bugzil.la/1225665))
- {{JSxRef("RegExp.prototype.dotAll")}} ([Firefox bug 1361856](https://bugzil.la/1361856))
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes) ([Firefox bug 1361876](https://bugzil.la/1361876))
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape) ([Firefox bug 1361876](https://bugzil.la/1361876))
- [Named capture groups](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences) ([Firefox bug 1362154](https://bugzil.la/1362154))

- Due to a [WebIDL spec change](https://github.com/whatwg/webidl/pull/357) in mid-2020, we've [added a `Symbol.toStringTag` property to all DOM prototype objects](/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/toStringTag#tostringtag_available_on_all_dom_prototype_objects) ([Firefox bug 1277799](https://bugzil.la/1277799)).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,6 @@ console.log(ripeOranges); // [ 'ripe orange A', 'ripe orange C' ]

- [Character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes)
- [Quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers)
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes)
- [Groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences)

- [The `RegExp()` constructor](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Character classes distinguish kinds of characters such as, for example, distingu
<td>
Matches a character based on its
<a
href="/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes"
href="/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape"
>Unicode character properties</a
>
(to match just, for example, emoji characters, or Japanese
Expand Down Expand Up @@ -374,7 +374,6 @@ console.log("Number of vowels:", aliceExcerpt.match(regexpVowels).length);

- [Assertions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions)
- [Quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers)
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes)
- [Groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences)

- [The `RegExp()` constructor](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -609,7 +609,7 @@ This page provides an overall cheat sheet of all the capabilities of `RegExp` sy

[Quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers) indicate numbers of characters or expressions to match.

> **Note:** In the following, _item_ refers not only to singular characters, but also includes [character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes), [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes), [groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences).
> **Note:** In the following, _item_ refers not only to singular characters, but also includes [character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes) and [groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences).

<table class="standard-table">
<thead>
Expand Down Expand Up @@ -734,37 +734,3 @@ This page provides an overall cheat sheet of all the capabilities of `RegExp` sy
</tr>
</tbody>
</table>

## Unicode property escapes

[Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes) allow for matching characters based on their Unicode properties.

```js
// Non-binary values
/\p{UnicodePropertyValue}/
/\p{UnicodePropertyName=UnicodePropertyValue}/

// Binary and non-binary values
/\p{UnicodeBinaryPropertyName}/

// Negation: \P is negated \p
/\P{UnicodePropertyValue}/
/\P{UnicodeBinaryPropertyName}/
```

- `UnicodeBinaryPropertyName`
- : The name of a [binary property](https://tc39.es/ecma262/multipage/text-processing.html#table-binary-unicode-properties). E.g.: [`ASCII`](https://unicode.org/reports/tr18/#General_Category_Property), [`Alpha`](https://unicode.org/reports/tr44/#Alphabetic), `Math`, [`Diacritic`](https://unicode.org/reports/tr44/#Diacritic), [`Emoji`](https://unicode.org/reports/tr51/#Emoji_Properties), [`Hex_Digit`](https://unicode.org/reports/tr44/#Hex_Digit), `Math`, [`White_space`](https://unicode.org/reports/tr44/#White_Space), etc. See [Unicode Data PropList.txt](https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt) for more info.
- `UnicodePropertyName`

- : The name of a [non-binary](https://tc39.es/ecma262/multipage/text-processing.html#table-nonbinary-unicode-properties) property:

- [General_Category](https://unicode.org/reports/tr18/#General_Category_Property) (`gc`)
- [Script](https://unicode.org/reports/tr24/#Script) (`sc`)
- [Script_Extensions](https://unicode.org/reports/tr24/#Script_Extensions) (`scx`)

See also [PropertyValueAliases.txt](https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt)

- `UnicodePropertyValue`
- : One of the tokens listed in the Values section, below. Many values have aliases or shorthand (e.g. the value `Decimal_Number` for the `General_Category` property may be written `Nd`, `digit`, or `Decimal_Number`). For most values, the `UnicodePropertyName` part and equals sign may be omitted. If a `UnicodePropertyName` is specified, the value must correspond to the property type given.

> **Note:** As there are many properties and values available, we will not describe them exhaustively here but rather provide various examples.
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,6 @@ console.log(lines.join("\n"));
- [Character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes)
- [Assertions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions)
- [Quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers)
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes)

- [The `RegExp()` constructor](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)
- [ClassRanges in the ECMAScript specification](https://tc39.es/ecma262/multipage/text-processing.html#sec-classranges)
26 changes: 2 additions & 24 deletions files/en-us/web/javascript/guide/regular_expressions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,6 @@ The following pages provide lists of the different special characters that fit i
- : Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.
- [Quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers)
- : Indicate numbers of characters or expressions to match.
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes)
- : Distinguish based on unicode character properties, for example, upper- and lower-case letters, math symbols, and punctuation.

If you want to look at all the special characters that can be used in regular expressions in a single table, see the following:

Expand Down Expand Up @@ -142,18 +140,6 @@ If you want to look at all the special characters that can be used in regular ex
</p>
</td>
</tr>
<tr>
<td>
<code>\p{<em>UnicodeProperty</em>}</code>,
<code>\P{<em>UnicodeProperty</em>}</code>
</td>
<td>
<a
href="/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes"
>Unicode property escapes</a
>
</td>
</tr>
</tbody>
</table>

Expand Down Expand Up @@ -394,28 +380,20 @@ console.log(str.match(re)); // ["fee ", "fi ", "fo "]

#### Using unicode regular expressions

The "u" flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. This is mainly accomplished through the use of [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes), which are supported only within "unicode" regular expressions.

For example, the following regular expression might be used to match against an arbitrary unicode "word":
The `u` flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. An important feature that's enabled in unicode mode is [Unicode property escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape). For example, the following regular expression might be used to match against an arbitrary unicode "word":

```js
/\p{L}*/u;
```

There are a number of other differences between unicode and non-unicode regular expressions that one should be aware of:

- Unicode regular expressions do not support so-called "identity escapes"; that is, patterns where an escaping backslash is not needed and effectively ignored. For example, `/\a/` is a valid regular expression matching the letter 'a', but `/\a/u` is not.
- Curly brackets need to be escaped when not used as [quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers). For example, `/{/` is a valid regular expression matching the curly bracket '{', but `/{/u` is not — instead, the bracket should be escaped and `/\\{/u` should be used instead.
- The `-` character is interpreted differently within character classes. In particular, for Unicode regular expressions, `-` is interpreted as a literal `-` (and not as part of a range) only if it appears at the start or end of the character class. For example, `/[\w-:]/` is a valid regular expression matching a word character, a `-`, or `:`, but `/[\w-:]/u` is an invalid regular expression, as `\w` to `:` is not a well-defined range of characters.

Unicode regular expressions have different execution behavior as well. [`RegExp.prototype.unicode`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) contains more explanation about this.

## Examples

> **Note:** Several examples are also available in:
>
> - The reference pages for {{jsxref("RegExp/exec", "exec()")}}, {{jsxref("RegExp/test", "test()")}}, {{jsxref("String/match", "match()")}}, {{jsxref("String/matchAll", "matchAll()")}}, {{jsxref("String/search", "search()")}}, {{jsxref("String/replace", "replace()")}}, {{jsxref("String/split", "split()")}}
> - The guide articles: [character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes), [assertions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions), [groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences), [quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers), [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes)
> - The guide articles: [character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes), [assertions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions), [groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences), [quantifiers](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Quantifiers)

### Using special characters to verify input

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Quantifiers indicate numbers of characters or expressions to match.

## Types

> **Note:** In the following, _item_ refers not only to singular characters, but also includes [character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes), [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes), [groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences).
> **Note:** In the following, _item_ refers not only to singular characters, but also includes [character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes) and [groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences).

<table class="standard-table">
<thead>
Expand Down Expand Up @@ -206,7 +206,6 @@ console.log(text.match(nonGreedyRegexp));

- [Character classes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes)
- [Assertions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Assertions)
- [Unicode property escapes](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Unicode_property_escapes)
- [Groups and backreferences](/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Groups_and_backreferences)

- [The `RegExp()` constructor](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp)
Expand Down
Loading