Regular expressions command literals have to be different to usual literals. To allow both in the same text, the escape character \
switches to the other form (literal or command). For example \|
results in a |
in the output instead of the alternative command |
. To enter an arbitrary character/codepoint it's possible to use \xXX
, \uXXXX
and \UXXXXXXXX
where X is the hexadecimal representation of the codepoint.
Alternatives are used in the form abc|def
and matches abc
or def
.
Literals and/or other groups can be grouped together either to repeat with quantization modifiers or as backreferences in replacements. (abc)+
If one or more literals are allowed at the same position it possible to use a literal set in form of [abc]
. It is possible to add ranges like [A-Z]
, to invert the final set with [^A-Z]
and/or use a character class [^\w_]
. Please note that the -
(minus sign) has to be escaped and is not allowed as stand alone at the end of the set unlike other implementations.
Predefined sets are:
\d
digits (all languages)\D
everything else than digits\w
letters (all languages)\W
everything else than letters\s
whitespace (all languages)\S
everything else than whitespace.
any character (including newline)
After a group or literal the number of repetitions are added if more or less than a single match is needed. Possible commands:
*
matches from 0 to infinite times+
matches from 1 to infinite times?
matches from 0 to 1 times{n}
matches exactly n times{n,}
matches at least n times{,m}
matches from 0 to m times{n,m}
matches from n to m times
Example: (abc){3}
would match abcabcabc
and (abc|def){2}
would match abcabc
, abcdef
, defabc
or defdef
.
After using a quantization modifier software search for the longest match longest match (greedy). To modify this behaviour use one of the following options:
-
Lazy matching
When using
?
the shortest match that fulfills the criterias is used. -
Possessive matching
When using
+
the longest match of the group/literal is used. Unlike the greedy match type which also checks the rest of the expression.
With ^
the search start with the next line start or file start. Whereas $
checks for a line end or file end.
Backreferences are allowed in replacements and during matching outside the referenced group. Use \1 to \9 for the group number. (\S)123(\S)
with replacement \1\2
on abc123def
results in abcdef
.
When using the "ignore case" option the literal sets are expanded with their uppercase/lowercase variants. If the transformation results in more than one codepoint/character everything is added. For example the german ß
is expanded to ss
.