Supported Modifier Flags #1

rbuckton · 2021-11-22T20:20:14Z

In the Oct, 2021 plenary, @michaelficarra asked that we outline and provide motivating examples for each flag we are considering as a supported modifier.

The flags currently under consideration are:

i — ignore-case
- Rationale — Toggling ignore-case is especially useful when matching patterns with varying case sensitivity, or when parsing patterns provided via JSON configuration. Especially useful when working with complex Unicode character ranges.
- Example — Match upper case ascii letter followed by upper or lower case ascii letter or '
```
const re = /^[A-Z](?i)[a-z']+$/;
re.test("O'Neill"); // true
re.test("o'neill"); // false

// alternatively (defaulting to ignore-case):
const re2 = /^(?-i:[A-Z])[a-z']+$/i;
```
- Example — Match word starting with D followed by word starting with D or d (from .NET documentation, see ¹)
```
const re = /\b(D\w+)(?ix)\s(d\w+)\b/g;
const input = "double dare double Double a Drooling dog The Dreaded Deep";
re.exec(input); // ["Drooling dog", "Drooling", "dog"]
re.exec(input); // ["Dreaded Deep", "Dreaded", "Deep"]
```
m — multiline
- Rationale — Flexibility in matching beginning-of-buffer vs. beginning-of-line or end-of-buffer vs. end-of-line in a complex pattern.
- Example — Match a frontmatter block at the start of a file
```
const re = /^---(?m)$((?:^(?!---$).*$)*)^---$/;
re.test("---a"); // false
re.test("---\n---"); // true
re.test("---\na: b\n---"); // true
```

s — dot-all (i.e., "single line")

Rationale — Control over . matching semantics within a pattern.

Example

const re = /a.c(?s:.)*x.z/;
re.test("a\ncx\nz"); // flse
re.test("abcdxyz"); // true
re.test("aBc\nxYz"); // true

x — Extended Mode. This flag is proposed by https://github.com/tc39/proposal-regexp-x-mode

Rationale — Would allow control over significant whitespace handling in a pattern.

Example — Disabling x mode when composing a complex pattern:

const idPattern = `[a-z]{2} \d{4}`; // space required
const re = new RegExp(String.raw`
  # match the id
  (?<id>(?-x:${idPattern}))
  
  # match a separator
  :\s
  
  # match the value
  (?<value>\w+)
`, "x");

re.exec("aa0123: foo")?.groups; // undefined
re.exec("aa 0123: foo")?.groups; // { id: "aa 0123", value: "foo" }

Flags likely too complex to support:

u — Unicode. This flag affects how a pattern is parsed, not how it is matched. Supporting it would likely require a cover grammar and additional static semantics.
v — Extended Unicode. This flag is proposed by https://github.com/tc39/proposal-regexp-set-notation as an extension of the u flag and would have the same difficulties.

Flags that will never be supported:

g — Global. This flag affects the index at which matching starts and not the matching behavior itself. Changing it mid pattern would have no effect.
y — Sticky. This flag affects the index at which matching starts and not the matching behavior itself. Changing it mid pattern would have no effect.
d — Indices. This flag affects the match result. Changing it mid pattern would have no effect.

https://docs.microsoft.com/en-us/dotnet/standard/base-types/miscellaneous-constructs-in-regular-expressions#inline-options ↩

The text was updated successfully, but these errors were encountered:

ljharb · 2021-11-22T20:43:57Z

For the examples, can you share how you'd do it without the relevant proposal?

rbuckton · 2021-11-23T01:28:31Z

`i`

Simple cases like /[A-Z][A-Za-z]/ are trivial:

// match an uppercase ASCII letter followed by a mixed-case ASCII letter

// with 'i' modifier:
/[A-Z](?i)[A-Z]/

// without 'i' modifier:
/[A-Z][A-Za-z]/

However, more complex cases are far from trivial:

// match a mixed case "hello" followed by the exact characters "World"

// with 'i' modifier:
/(?i:hello) World/

// without 'i' modifier:
/[Hh][Ee][Ll][Ll][Oo] World/

`m`

If you are in u mode, you could emulate non-m mode when in m mode using the proposed \A and \z buffer boundaries. However, if you are not in u mode, there's no way to match the buffer boundaries when in m mode:

// with 'm' modifier:
/^---(?m)$((?:^(?!---$).*$)*)^---$/

// without the 'm' modifier, in 'u' mode:
/\A---$((?:^(?!---$).*$)*)^---$/mu

// without the 'm' modifier, not in 'u' mode: not possible to invert when in 'm' mode

`s`

Its fairly complicated to invert the s flag in a RegExp without modifiers, and easy to get wrong:

// match /a.b/ outside of 's' mode, then /.+/ in 's' mode, then /c.d/ outside of 's' mode
// with 's' modifier
/a.b(?s:.)+c.d/

// without 's' modifier
/a.b(?:.|[\r\n\u2028\u2029])+c.d/

// match /a.b/ inside of 's' mode, then /.+/ outside of 's' mode, then /c.d/ inside of 's' mode
// with 's' modifier
/a.b(?-s:.+)c.d/s

// without 's' modifier
/a.b(?:(?![\r\n\u2028\u2029]).)+c.d/s

ljharb · 2021-11-23T03:30:51Z

There's nothing with [^\s\S] for the dotAll case?

rbuckton · 2021-11-23T05:27:35Z

I'm not sure I understand what you mean. Can you clarify?

rbuckton · 2021-11-23T05:29:05Z

If you mean using [\s\S] to match everything, that's feasible for the first s example, sure. I don't see how it helps with the second example though.

RunDevelopment · 2022-03-16T13:07:25Z

I just want to share a little trick to emulate m and non-m mode without using ^ and $. This might be relevant for transpilers.

- /^ $/ == /(?<![\s\S]) (?![\s\S])/
- /^ $/m == /(?<!.) (?!.)/ // no `s` flag!

This works for both u and non-u mode.

rbuckton · 2022-06-07T20:13:52Z

The modifiers supported by this proposal will be limited to i, m, and s. These may be potentially changed by future proposals (such as the x-mode proposal), but doing so is out of scope.

slevithan · 2024-05-30T15:47:14Z

@rbuckton, I know this is already closed (and implemented in V8, yay!), but for interest's sake, note that it's very possible to emulate presence or lack of s and m. I do it in regex-make to locally apply the presence or absence of local flags for RegExp instances interpolated into a template.

m

If you are in u mode, you could emulate non-m mode when in m mode using the proposed \A and \z buffer boundaries. However, if you are not in u mode, there's no way to match the buffer boundaries when in m mode

Emulating is possible without u mode or buffer boundaries.

Emulate an m mode ^: (?<=^|[\n\r\u2028\u2029])
Emulate a non-m mode ^: (?<![^])
Emulate an m mode $: (?=$|[\n\r\u2028\u2029])
Emulate a non-m mode $: (?![^])

s

Its fairly complicated to invert the s flag in a RegExp without modifiers, and easy to get wrong:
[...]
// without 's' modifier
/a.b(?:.|[\r\n\u2028\u2029])+c.d/

[...]
// without 's' modifier
/a.b(?:(?![\r\n\u2028\u2029]).)+c.d/s

It's easier than that:

Emulate an s mode .: [^]
Emulate a non-s mode .: [[^]--[\n\r\u2028\u2029]] (with v) or (?:(?![\n\r\u2028\u2029]).) for a less efficient version without v (same as you showed in the quote).

Note that, like your (?:.|[\r\n\u2028\u2029]) example, [^] either matches full code points or doesn't based on the presence of flag u/v.

RunDevelopment · 2024-05-30T19:41:54Z

Emulate a non-s mode .: [[^]--[\n\r\u2028\u2029]] (with v) or (?:(?![\n\r\u2028\u2029]).) for a less efficient version without v (same as you showed in the quote).

Or [^\n\r\u2028\u2029].

rbuckton closed this as completed Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported Modifier Flags #1

Supported Modifier Flags #1

rbuckton commented Nov 22, 2021

ljharb commented Nov 22, 2021

rbuckton commented Nov 23, 2021 •

edited

Loading

ljharb commented Nov 23, 2021

rbuckton commented Nov 23, 2021

rbuckton commented Nov 23, 2021

RunDevelopment commented Mar 16, 2022

rbuckton commented Jun 7, 2022

slevithan commented May 30, 2024 •

edited

Loading

`m`

`s`

RunDevelopment commented May 30, 2024

Supported Modifier Flags #1

Supported Modifier Flags #1

Comments

rbuckton commented Nov 22, 2021

Footnotes

ljharb commented Nov 22, 2021

rbuckton commented Nov 23, 2021 • edited Loading

i

m

s

ljharb commented Nov 23, 2021

rbuckton commented Nov 23, 2021

rbuckton commented Nov 23, 2021

RunDevelopment commented Mar 16, 2022

rbuckton commented Jun 7, 2022

slevithan commented May 30, 2024 • edited Loading

m

s

RunDevelopment commented May 30, 2024

rbuckton commented Nov 23, 2021 •

edited

Loading

`i`

`m`

`s`

slevithan commented May 30, 2024 •

edited

Loading

`m`

`s`