From 557085a85a695b074a187fda98179b39f3756d95 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Fri, 22 Nov 2024 14:04:57 -0500 Subject: [PATCH 01/11] Reference for stage 3 regex-escaping --- .../guide/regular_expressions/index.md | 13 +-- .../global_objects/regexp/escape/index.md | 95 +++++++++++++++++++ .../reference/global_objects/regexp/index.md | 5 + .../global_objects/string/replaceall/index.md | 2 +- 4 files changed, 102 insertions(+), 13 deletions(-) create mode 100644 files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md diff --git a/files/en-us/web/javascript/guide/regular_expressions/index.md b/files/en-us/web/javascript/guide/regular_expressions/index.md index 251f3b65844efcc..07ab0d0913ab478 100644 --- a/files/en-us/web/javascript/guide/regular_expressions/index.md +++ b/files/en-us/web/javascript/guide/regular_expressions/index.md @@ -157,18 +157,7 @@ For instance, to match the string "C:\\" where "C" can be any letter, you'd use If using the `RegExp` constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. `/a\*b/` and `new RegExp("a\\*b")` create the same expression, which searches for "a" followed by a literal "\*" followed by "b". -If escape strings are not already part of your pattern you can add them using {{jsxref("String.prototype.replace()")}}: - -```js -function escapeRegExp(string) { - return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string -} -``` - -The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches. -It is explained in detail below in [Advanced Searching With Flags](#advanced_searching_with_flags). - -_Why isn't this built into JavaScript?_ There is a [proposal](https://github.com/tc39/proposal-regex-escaping) to add such a function to RegExp. +The {{jsxref("RegExp.escape()")}} function returns a new string where all special characters in regex syntax are escaped. This allows you to do `new RegExp(RegExp.escape("a*b"))` to create a regular expression that matches only the string `"a*b"`. ### Using parentheses diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md new file mode 100644 index 000000000000000..46f82b058963c89 --- /dev/null +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -0,0 +1,95 @@ +--- +title: RegExp.escape() +slug: Web/JavaScript/Reference/Global_Objects/RegExp/escape +page-type: javascript-static-method +browser-compat: javascript.builtins.RegExp.escape +--- + +{{JSRef}} + +The **`RegExp.escape()`** static method [escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. + +Always prefer this function to a manual search & replace using, for example, {{jsxref("String.prototype.replaceAll()")}}, because `RegExp.escape()` can handle more edge cases and use the right escape sequence that doesn't cause syntax errors in certain contexts. + +## Syntax + +```js-nolint +RegExp.escape(string) +``` + +### Parameters + +- `string` + - : The string to escape. + +### Return value + +A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced: + +- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\0`, `\c`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence. +- Regex [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character#description), including `^`, `$`, `\`, `.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, and `|`, as well as the `/` delimiter, are escaped by inserting a `\` character before them. For example, `RegExp.escape("foo.bar")` returns `"\\x66oo\\.bar"`. +- Other punctuators, including `,`, `-`, `=`, `<`, `>`, `#`, `&`, `!`, `%`, `:`, `;`, `@`, `~`, `'`, `` ` ``, and `"`, are escaped using the `\x` syntax. For example, `RegExp.escape("foo-bar")` returns `"\\x66oo\\x2dbar"`. These characters cannot be escaped by prefixing with `\` because, for example, `/foo\-bar/u` is a syntax error. +- The characters with their own [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) sequences: `\f` (U+000C FORM FEED), `\n` (U+000A LINE FEED), `\r` (U+000D CARRIAGE RETURN), `\t` (U+0009 CHARACTER TABULATION), and `\v` (U+000B LINE TABULATION), are replaced with their escape sequences. For example, `RegExp.escape("foo\nbar")` returns `"\\x66oo\\nbar"`. +- The space character is escaped as `"\\x20"`. +- Other non-ASCII [line break and white space characters](/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#white_space) are replaced with one or two `\uXXXX` escape sequences representing their UTF-16 code units. For example, `RegExp.escape("foo\u2028bar")` returns `"\\x66oo\\u2028bar"`. +- [Lone surrogates](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters) are replaced with their `\uXXXX` escape sequences. For example, `RegExp.escape("foo\uD800bar")` returns `"\\x66oo\\ud800bar"`. + +### Exceptions + +- {{jsxref("TypeError")}} + - : Thrown if `string` is not a string. + +## Examples + +### Using RegExp.escape() + +The following examples demonstrate various inputs and outputs for the `RegExp.escape()` method. + +```js +RegExp.escape("foo.bar"); // "\\x66oo\\.bar" +RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar" +RegExp.escape("foo\nbar"); // "\\x66oo\\nbar" +RegExp.escape("foo\uD800bar"); // "\\x66oo\\ud800bar" +RegExp.escape("foo\u2028bar"); // "\\x66oo\\u2028bar" +``` + +### Using RegExp.escape() with the RegExp constructor + +The primary use case of `RegExp.escape()` is when you want to embed a string into a bigger regex pattern, and you want to ensure that the string is treated as a literal pattern, not as a regex syntax. Consider the following naïve example that replaces URLs: + +```js +function removeDomain(text, domain) { + return text.replace(new RegExp(`https?://${domain}(?=/)`, "g"), ""); +} + +const input = + "Considering using [RegExp.escape()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string."; +const domain = "developer.mozilla.org"; +console.log(removeDomain(input, domain)); +// Considering using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string. +``` + +The above inadvertently converts the literal "." character to the regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means it will match anything, including `developer-mozilla.org`, which is not what we intended. To fix this, we can use `RegExp.escape()` to ensure that any user input is treated as a literal pattern: + +```js +function removeDomain(text, domain) { + return text.replace( + new RegExp(`https?://${RegExp.escape(domain)}(?=/)`, "g"), + "", + ); +} +``` + +Now this function will do exactly what we intend to, and will not transform `developer-mozilla.org` URLs. + +## Specifications + +{{Specifications}} + +## Browser compatibility + +{{Compat}} + +## See also + +- {{jsxref("RegExp")}} diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/index.md index cfc700508d36262..1866385e0c892cb 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/index.md @@ -115,6 +115,11 @@ Note that several of the `RegExp` properties have both long and short (Perl-like - [`RegExp[Symbol.species]`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/Symbol.species) - : The constructor function that is used to create derived objects. +## Static methods + +- {{jsxref("RegExp.escape()")}} + - : [Escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. + ## Instance properties These properties are defined on `RegExp.prototype` and shared by all `RegExp` instances. diff --git a/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md b/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md index 92406649d173bfc..1ec40fa29af46a7 100644 --- a/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md +++ b/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md @@ -41,7 +41,7 @@ A new string, with all matches of a pattern replaced by a replacement. This method does not mutate the string value it's called on. It returns a new string. -Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) constructor without escaping special characters may unintentionally change its semantics. +Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) constructor without escaping special characters may unintentionally change its semantics. (You can also use {{jsxref("RegExp.escape()")}} to make the replacement string a literal pattern, but that is more indirection than just calling `replaceAll()`.) ```js function unsafeRedactName(text, name) { From a1724ca665621cc23b47f45bc22fd9b74c996f50 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Sat, 23 Nov 2024 02:03:23 -0500 Subject: [PATCH 02/11] Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md --- .../javascript/reference/global_objects/regexp/escape/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 46f82b058963c89..d402ec842db1dd1 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -92,4 +92,5 @@ Now this function will do exactly what we intend to, and will not transform `dev ## See also +- [Polyfill of `RegExp.escape` in `core-js`](https://github.com/zloirock/core-js#regexp-escaping) - {{jsxref("RegExp")}} From f65789063e7dbfc92739779dcd5682e32be0e0c4 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Mon, 25 Nov 2024 14:14:35 -0500 Subject: [PATCH 03/11] Apply suggestions from code review Co-authored-by: Hamish Willee --- .../reference/global_objects/regexp/escape/index.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index d402ec842db1dd1..4396a32d2dfe0a9 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -27,7 +27,7 @@ RegExp.escape(string) A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced: - The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\0`, `\c`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence. -- Regex [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character#description), including `^`, `$`, `\`, `.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, and `|`, as well as the `/` delimiter, are escaped by inserting a `\` character before them. For example, `RegExp.escape("foo.bar")` returns `"\\x66oo\\.bar"`. +- Regex [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character#description), including `^`, `$`, `\`, `.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, and `|`, as well as the `/` delimiter, are escaped by inserting a `\` character before them. For example, `RegExp.escape("foo.bar")` returns `"\\x66oo\\.bar"`, and `RegExp.escape("(foo)")` returns `"\\(foo\\)"`. - Other punctuators, including `,`, `-`, `=`, `<`, `>`, `#`, `&`, `!`, `%`, `:`, `;`, `@`, `~`, `'`, `` ` ``, and `"`, are escaped using the `\x` syntax. For example, `RegExp.escape("foo-bar")` returns `"\\x66oo\\x2dbar"`. These characters cannot be escaped by prefixing with `\` because, for example, `/foo\-bar/u` is a syntax error. - The characters with their own [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) sequences: `\f` (U+000C FORM FEED), `\n` (U+000A LINE FEED), `\r` (U+000D CARRIAGE RETURN), `\t` (U+0009 CHARACTER TABULATION), and `\v` (U+000B LINE TABULATION), are replaced with their escape sequences. For example, `RegExp.escape("foo\nbar")` returns `"\\x66oo\\nbar"`. - The space character is escaped as `"\\x20"`. @@ -46,6 +46,8 @@ A new string that can be safely used as a literal pattern for the {{jsxref("RegE The following examples demonstrate various inputs and outputs for the `RegExp.escape()` method. ```js +RegExp.escape("Buy it. use it. break it. fix it."); +// "Buy\\ it\\.\\ use\\ it\\.\\ break\\ it\\.\\ fix\\ it\\." RegExp.escape("foo.bar"); // "\\x66oo\\.bar" RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar" RegExp.escape("foo\nbar"); // "\\x66oo\\nbar" @@ -69,7 +71,10 @@ console.log(removeDomain(input, domain)); // Considering using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string. ``` -The above inadvertently converts the literal "." character to the regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means it will match anything, including `developer-mozilla.org`, which is not what we intended. To fix this, we can use `RegExp.escape()` to ensure that any user input is treated as a literal pattern: +Inserting the `domain` above results in the regular expression literal `https?://developer.mozilla.org(?=/)`, where the "." character is a regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means the string will match the string with any character in place of the ".", such as `developer-mozilla-org`. +While we have no problematic strings in the particular input shown here, this is not robust code. + +To fix this, we can use `RegExp.escape()` to ensure that any user input is treated as a literal pattern: ```js function removeDomain(text, domain) { From e0e5e03e6a3ecafcc682c5cdfa479bb41ecfd769 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Tue, 26 Nov 2024 00:30:47 -0500 Subject: [PATCH 04/11] Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md --- .../javascript/reference/global_objects/regexp/escape/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 4396a32d2dfe0a9..10e4f33700cb744 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -47,7 +47,7 @@ The following examples demonstrate various inputs and outputs for the `RegExp.es ```js RegExp.escape("Buy it. use it. break it. fix it."); -// "Buy\\ it\\.\\ use\\ it\\.\\ break\\ it\\.\\ fix\\ it\\." +// "Buy\\x20it\\.\\x20use\\x20it\\.\\x20break\\x20it\\.\\x20fix\\x20it\\." RegExp.escape("foo.bar"); // "\\x66oo\\.bar" RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar" RegExp.escape("foo\nbar"); // "\\x66oo\\nbar" From b4162870eec1ec5af962277db81764ef547a7da0 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Tue, 26 Nov 2024 19:54:51 -0500 Subject: [PATCH 05/11] Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Co-authored-by: Kevin Gibbons --- .../javascript/reference/global_objects/regexp/escape/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 10e4f33700cb744..7ec8594b3552040 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -47,7 +47,7 @@ The following examples demonstrate various inputs and outputs for the `RegExp.es ```js RegExp.escape("Buy it. use it. break it. fix it."); -// "Buy\\x20it\\.\\x20use\\x20it\\.\\x20break\\x20it\\.\\x20fix\\x20it\\." +// "\\x42uy\\x20it\\.\\x20use\\x20it\\.\\x20break\\x20it\\.\\x20fix\\x20it\\." RegExp.escape("foo.bar"); // "\\x66oo\\.bar" RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar" RegExp.escape("foo\nbar"); // "\\x66oo\\nbar" From 39e2cfa30436684d40e46a8e477b94cc9155ea69 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Tue, 26 Nov 2024 19:56:54 -0500 Subject: [PATCH 06/11] Update index.md --- .../global_objects/regexp/escape/index.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 7ec8594b3552040..81ec0ed1868018c 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -65,14 +65,21 @@ function removeDomain(text, domain) { } const input = - "Considering using [RegExp.escape()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string."; + "Consider using [RegExp.escape()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string."; const domain = "developer.mozilla.org"; console.log(removeDomain(input, domain)); -// Considering using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string. +// Consider using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string. ``` -Inserting the `domain` above results in the regular expression literal `https?://developer.mozilla.org(?=/)`, where the "." character is a regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means the string will match the string with any character in place of the ".", such as `developer-mozilla-org`. -While we have no problematic strings in the particular input shown here, this is not robust code. +Inserting the `domain` above results in the regular expression literal `https?://developer.mozilla.org(?=/)`, where the "." character is a regex [wildcard](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard) character. This means the string will match the string with any character in place of the ".", such as `developer-mozilla-org`. Therefore, it would incorrectly also change the following text: + +```js +const input = + "This is not an MDN link: https://developer-mozilla.org/, be careful!"; +const domain = "developer.mozilla.org"; +console.log(removeDomain(input, domain)); +// This is not an MDN link: /, be careful! +``` To fix this, we can use `RegExp.escape()` to ensure that any user input is treated as a literal pattern: From 09f6a7382bf202503ce144f2404ef2fce858b598 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Tue, 26 Nov 2024 22:53:09 -0500 Subject: [PATCH 07/11] Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md --- .../javascript/reference/global_objects/regexp/escape/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 81ec0ed1868018c..5f6816e1eeaca64 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -9,7 +9,7 @@ browser-compat: javascript.builtins.RegExp.escape The **`RegExp.escape()`** static method [escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. -Always prefer this function to a manual search & replace using, for example, {{jsxref("String.prototype.replaceAll()")}}, because `RegExp.escape()` can handle more edge cases and use the right escape sequence that doesn't cause syntax errors in certain contexts. +When dynamically creating a {{jsxref("RegExp")}} with user-provided content, always consider using this function to sanitize the input, unless the input is actually intended to contain regex syntax. In addition, don't try to re-implement its functionality by, for example, using {{jsxref("String.prototype.replaceAll()")}} to insert a `\` before all syntax characters, because `RegExp.escape()` can handle more edge cases and use the right escape sequence that doesn't cause syntax errors in certain contexts. ## Syntax From dcd50d5f04022a94513b2b19dccb090dd931bcf9 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Wed, 27 Nov 2024 01:17:37 -0500 Subject: [PATCH 08/11] Fix wording --- .../en-us/web/javascript/reference/regular_expressions/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/regular_expressions/index.md b/files/en-us/web/javascript/reference/regular_expressions/index.md index f57391d0cef144c..fddd05c6ea4cf6e 100644 --- a/files/en-us/web/javascript/reference/regular_expressions/index.md +++ b/files/en-us/web/javascript/reference/regular_expressions/index.md @@ -147,7 +147,7 @@ _Escape sequences_ in regexes refer to any kind of syntax formed by `\` followed [VCC]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class#v-mode_character_class [WBA]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion -`\` followed by any other digit character becomes a [legacy octal escape sequence](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences), which is forbidden in [Unicode-aware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode). +`\` followed by `0` and another digit becomes a [legacy octal escape sequence](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences), which is forbidden in [Unicode-aware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode). `\` followed by any other digit sequence becomes a [backreference](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Backreference). In addition, `\` can be followed by some non-letter-or-digit characters, in which case the escape sequence is always a [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) representing the escaped character itself: From b5b5eb1db6bad5aa009beb7319bb49a7af016021 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Thu, 28 Nov 2024 19:57:46 -0500 Subject: [PATCH 09/11] Update files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md Co-authored-by: Hamish Willee --- .../reference/global_objects/string/replaceall/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md b/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md index 1ec40fa29af46a7..4b9aa7795b4e49e 100644 --- a/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md +++ b/files/en-us/web/javascript/reference/global_objects/string/replaceall/index.md @@ -41,7 +41,7 @@ A new string, with all matches of a pattern replaced by a replacement. This method does not mutate the string value it's called on. It returns a new string. -Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method would replace all occurrences of a string, not just the first one. This is especially useful if the string is not statically known, as calling the [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) constructor without escaping special characters may unintentionally change its semantics. (You can also use {{jsxref("RegExp.escape()")}} to make the replacement string a literal pattern, but that is more indirection than just calling `replaceAll()`.) +Unlike [`replace()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), this method replaces all occurrences of a string, not just the first one. While it is also possible to use `replace()` with a global regex dynamically constructed with [`RegExp()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/RegExp) to replace all instances of a string, this can have unintended consequences if the string contains special characters that have meaning in regular expressions (which might happen if the replacement string comes from user input). While you can mitigate this case using {{jsxref("RegExp.escape()")}} to make the regular expression string into a literal pattern, it is better to just use `replaceAll()` and pass the string without converting it to a regex. ```js function unsafeRedactName(text, name) { From 53d173f21603ec19d203095add7abf56baa69c8a Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Thu, 28 Nov 2024 21:37:37 -0500 Subject: [PATCH 10/11] Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Co-authored-by: Hamish Willee --- .../javascript/reference/global_objects/regexp/escape/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 5f6816e1eeaca64..7c8d21b3e6b086d 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -9,7 +9,7 @@ browser-compat: javascript.builtins.RegExp.escape The **`RegExp.escape()`** static method [escapes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions#escape_sequences) any potential regex syntax characters in a string, and returns a new string that can be safely used as a [literal](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. -When dynamically creating a {{jsxref("RegExp")}} with user-provided content, always consider using this function to sanitize the input, unless the input is actually intended to contain regex syntax. In addition, don't try to re-implement its functionality by, for example, using {{jsxref("String.prototype.replaceAll()")}} to insert a `\` before all syntax characters, because `RegExp.escape()` can handle more edge cases and use the right escape sequence that doesn't cause syntax errors in certain contexts. +When dynamically creating a {{jsxref("RegExp")}} with user-provided content, consider using this function to sanitize the input (unless the input is actually intended to contain regex syntax). In addition, don't try to re-implement its functionality by, for example, using {{jsxref("String.prototype.replaceAll()")}} to insert a `\` before all syntax characters. `RegExp.escape()` is designed to use escape sequences that work in many more edge cases/contexts than hand-crafted code is likely to achieve. ## Syntax From ee3396889aa598d9d05972966e7249f4fe4d3b35 Mon Sep 17 00:00:00 2001 From: Joshua Chen Date: Thu, 28 Nov 2024 21:57:25 -0500 Subject: [PATCH 11/11] Update files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md Co-authored-by: Kevin Gibbons --- .../javascript/reference/global_objects/regexp/escape/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md index 7c8d21b3e6b086d..f7a3b0a04c8022b 100644 --- a/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md +++ b/files/en-us/web/javascript/reference/global_objects/regexp/escape/index.md @@ -26,7 +26,7 @@ RegExp.escape(string) A new string that can be safely used as a literal pattern for the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor. Namely, the following things in the input string are replaced: -- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\0`, `\c`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence. +- The first character of the string, if it's either a decimal digit (0–9) or ASCII letter (a–z, A–Z), is escaped using the `\x` [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) syntax. For example, `RegExp.escape("foo")` returns `"\\x66oo"` (here and after, the two backslashes in a string literal denote a single backslash character). This step ensures that if this escaped string is embedded into a bigger pattern where it's immediately preceded by `\1`, `\x0`, `\u000`, etc., the leading character doesn't get interpreted as part of the escape sequence. - Regex [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character#description), including `^`, `$`, `\`, `.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, and `|`, as well as the `/` delimiter, are escaped by inserting a `\` character before them. For example, `RegExp.escape("foo.bar")` returns `"\\x66oo\\.bar"`, and `RegExp.escape("(foo)")` returns `"\\(foo\\)"`. - Other punctuators, including `,`, `-`, `=`, `<`, `>`, `#`, `&`, `!`, `%`, `:`, `;`, `@`, `~`, `'`, `` ` ``, and `"`, are escaped using the `\x` syntax. For example, `RegExp.escape("foo-bar")` returns `"\\x66oo\\x2dbar"`. These characters cannot be escaped by prefixing with `\` because, for example, `/foo\-bar/u` is a syntax error. - The characters with their own [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) sequences: `\f` (U+000C FORM FEED), `\n` (U+000A LINE FEED), `\r` (U+000D CARRIAGE RETURN), `\t` (U+0009 CHARACTER TABULATION), and `\v` (U+000B LINE TABULATION), are replaced with their escape sequences. For example, `RegExp.escape("foo\nbar")` returns `"\\x66oo\\nbar"`.