Regaxor (RegExp Haxxor) is a regular expression fuzzer, written in ECMAScript 6.
Whatever you're coding, regular expressions come in handy in various situations and are often very useful but can also be very tricky to get right. Writing a regex that matches what you expect is easy; writing a regex that only matches what you expect is virtually impossible (except in trivial cases). That's where this tool comes into play—by fuzzing regular expressions, we can easily detect any issues/gotchas before learning about them the hard way.
The following are just some examples of common regex gotchas (NVM the funny titles):
- In the beginning was the Word
let badRegex = /https?:\/\/example\.com\/[\w]*/;
let str = 'Word\nhttps://example.com/';
str.match(badRegex);
// Output: ["https://example.com/", index: 5, input: "Word↵https://example.com/", groups: undefined]
let goodRegex = /^https?:\/\/example\.com\/[\w]*/;
str.match(goodRegex);
// Output: null
'https://example.com/'.match(goodRegex);
// Output: ["https://example.com/", index: 0, input: "https://example.com/", groups: undefined]
- Catch 22
let badRegex = /[123]|22/g;
badRegex.exec('22');
// Output: ["2", index: 0, input: "22", groups: undefined]
let goodRegex = /22|[123]/g;
goodRegex.exec('22');
// Output: ["22", index: 0, input: "22", groups: undefined]
- One sneaky dot
let str = 'https://exampleXcom';
let badRegex = /^\w+:\/\/example.com$/;
badRegex.exec(str);
// Output: ["https://exampleXcom", index: 0, input: "https://exampleXcom", groups: undefined]
let goodRegex = /^\w+:\/\/example\.com$/;
goodRegex.exec(str);
// Output: null
goodRegex.exec('https://example.com');
// Output: ["https://example.com", index: 0, input: "https://example.com", groups: undefined]
- All or nothing
let badRegex = /^\.*|\d+$/g;
'abc'.match(badRegex);
// Output: [""]
let goodRegex = /^[\d.]+$/g;
'abc'.match(goodRegex);
// Output: null
'12.3'.match(goodRegex);
// Output: ["12.3"]
- The word boundary trap
let badRegex = /word/;
badRegex.exec('aworda');
// Output: ["word", index: 1, input: "aworda", groups: undefined]
let goodRegex = /\bword\b/;
goodRegex.exec('aworda');
// Output: null
goodRegex.exec('a word');
// Output: ["word", index: 2, input: "a word", groups: undefined]
- Multiline confusion
let badRegex = /a.*b/;
badRegex.exec('a\nb');
// Output: null
let alsoBadRegex = /a.*b/m;
alsoBadRegex.exec('a\nb');
// Output: null
let goodRegex = /a[^]*b/;
goodRegex.exec('a\nb');
// Output: ["a↵b", index: 0, input: "a↵b", groups: undefined]
- One escape is not enough
let badRegex = 'x\.com';
new RegExp(badRegex).exec('xycom');
// Output: ["xycom", index: 0, input: "xycom", groups: undefined]
let goodRegex = 'x\\.com';
new RegExp(goodRegex).exec('xycom');
// Output: null
new RegExp(goodRegex).exec('x.com');
// Output: ["x.com", index: 0, input: "x.com", groups: undefined]
- Escaping the escaping
let str = 'double\\"quotes"';
// Bad.
str.replace(/"/g, '\\"');
// Output: "double\\"quotes\""
// Not bad but not recommended.
str.replace(/(\\|")/g, '\\$1');
// Output: "double\\\"quotes\""
// Better.
str.replace(/\\/g, '\\\\').replace(/"/g, '\\"');
// Output: "double\\\"quotes\""
- Too greedy
let badRegex = /<.+><\/.+>/g;
let tags = '<tag attribute="foo"></tag><tag id="foo"></tag>';
badRegex.exec(tags);
// Output: ["<tag attribute="foo"></tag><tag id="foo"></tag>", index: 0, input: "<tag attribute="foo"></tag><tag id="foo"></tag>", groups: undefined]
let notBadRegex = /<.+?><\/.+?>/g;
notBadRegex.exec(tags);
// Output: ["<tag attribute="foo"></tag>", index: 0, input: "<tag attribute="foo"></tag><tag id="foo"></tag>", groups: undefined]
notBadRegex.exec(tags);
// Output: ["<tag id="foo"></tag>", index: 27, input: "<tag attribute="foo"></tag><tag id="foo"></tag>", groups: undefined]
- The misplaced hyphen
let badRegex = /[\w -$]+/;
'#'.match(badRegex);
// Output: ["#", index: 0, input: "#", groups: undefined]
let goodRegex = /[\w $-]+/;
'#'.match(goodRegex);
// Output: null
'$100 USD'.match(goodRegex);
// Output: ["$100 USD", index: 0, input: "$100 USD", groups: undefined]
At times, writing a regex can feel like walking in a minefield. At other times, regular expressions are the wrong answer—or as Jamie Zawinski puts it Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
. So, especially in security-sensitive contexts, you're probably better off not using regular expressions unless you really have to....