-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReDos in prism #2583
Comments
Thank you for reporting! I'll fix them immediately. @yetingli How did you find the vulnerabilities? Some of the patterns in question contain backreferences (and assertions) and I don't know any existing technique for the static analysis of RE that can handle that. |
I quickly want to point out that the 6th vulnerability doesn't trigger for
|
I fixed 5/6 patterns. I can't reproduce the 3rd vulnerability. The attack string doesn't work and I just don't see ambiguity that could cause exponential backtracking. |
Oops...This is a typo, What I want to write is |
You can make the string longer, for example, the attack string The sub-pattern This fix is equivalent, and the repaired regex is safe and efficient. For example, for the attack string |
Yes, I use my tool to detect ReDos vulnerabilities. After a time, I will release my tool and welcome to use it :) |
I can help to detect whether the repaired patterns are safe. |
I just understood what you mean by strings like I verified and fixed the 6th vulnerability for your attack string.
I'm looking forward to it but in that case, I have a question: There are 2 more vulnerabilities in If your tool failed to detect this because it couldn't extract the regexes from the source code, then I can help you with this. I can modify Prism's test suite to extract all of the 2500 unique regexes Prism uses (even the dynamically generated ones) and output all of them into a JSON file (or similar) (like I did here).
Thank you for your continued help! All of my fixes are in #2584.
I analyzed the pattern and the problem is the following: First of all. The pattern does not backtrack exponentially. All words of the language Measurments
Code used: const { performance } = require("perf_hooks");
const measure = (fn, samples = 1) => {
const start = performance.now();
for (let i = samples; i > 0; i--) {
fn();
}
return (performance.now() - start) / samples;
};
const samples = 10;
const re = /(\s*)(?:=+ +)+=+(?:(?:\r?\n|\r)\1.+)+(?:\r?\n|\r)\1(?:=+ +)+=+(?=(?:\r?\n|\r){2}|\s*$)/;
// warmup
measure(() => re.test("= "), 100);
const measurements = [];
const ns = [50, 100, 200, 500, 1000, 2000, 5000, 10000];
for (const n of ns) {
const text = "= ".repeat(n);
const t = measure(() => re.test(text), samples);
measurements.push([
n,
t,
t / n**2 * 1e6 // scaled up by a constant factor to get nicer numbers
]);
}
measurements.forEach(([n, t, r]) => {
console.log(`| ${n} | ${t.toFixed(3)}ms | ${r.toFixed(3)} |`)
}); Run with Node.js v13.12.0 but previous versions will also work. That being said, I don't think that this can actually be fixed. The problem is that each suffix of the input string takes O(n) steps to reject and it seems like the regex engine is forced to try O(n) many suffixes. Even extremely simple regexes like I also want to point out that |
It's very nice to receive your reply.
I want to point out that my tool does not support extracting regular expressions from the projects, but only supports inputting regular expressions to check whether regular expressions are safe.
Great! You can send me these regexes and I am willing to check whether all regexes Prism uses are safe.
Indeed, they are not equivalent and you're right. |
Thank you very much! Here is a JSON of all 2587 unique regexes. Each regex is mapped to all of its occurrences in Prism language definitions. All regexes were extracted from the most recent commit in #2584. If you need/want anything else, please don't hesitate to ask. This is a huge improvement for Prism, so I'm really thankful for the work you're doing. |
I have detected 116 vulnerable regular expressions (see the link, there are some scripts for you to verify further). Feel free to contact me if you have any questions. |
Thank you @yetingli! Sorry for the delay. I have looked at the files and analyzed them. Changed regexesBefore I show the results, one question: Why did some regexes change? Example of a changed regex
There are no other changes that have been made to the regexes. I also want to point out that the second change (non-capturing -> capturing) is not consistent across regexes. The second change is significantly more likely for the regexes with higher file numbers. ExtractionWhile the script files were very nice in demonstrating the issue, they also make it hard to fix the issues because I had no idea where the regexes are coming from. The first thing I did was to extract the regex and the [x,y,z] tuple (the strings that generate the attack string MeasuringI then measured the execution time of each regex on generated attack strings Note: With linear, I mean The runtime of both the extracted regex and the original regex were measured. I also measure the runtime of an anchored version of the regex (= Results5 regexes were changed enough to affect their runtime. This was usually due to a backreference not referring to the right group anymore. 5 regexes have exponential runtime. 4 of those, I already detect (and fixed) with my method in #2590. The remaining one ( 2 regexes have linear runtime. One attack string didn't work because a prefix of it was accepted. The other attack string didn't work because the regex had Prism lookbehind group (just a normal capturing group) that prevented polynomial runtime. PolynomialNow to the polynomial. 104 regexes have polynomial runtime. I further subdivided this into 3 categories. The main idea is that some of the polynomial runtime isn't caused by backtracking but by the regex engine moving the pattern across the string to find a match. Even if the pattern only looks at each character once to reject a suffix of the string, it has to do so for 95 regexes have 1 regex has 8 regexes have polynomial backtracking not affected by moving. Problems with your methodI want to point out that your method seems to favor polynomial runtime over exponential backtracking. Some of the patterns reported as polynomial also have attack strings that can cause exponential backtracking. Examples:
This is concerning because people may be inclined to ignore fixing patterns that run in "only I also want to point out that your method has failed to point out a lot of other patterns with exponential backtracking. Examples:
It also failed to find a lot of cases of polynomial backtracking. See #2597 for examples. Closing thoughtsWhile not perfect, your method has found a lot of cases of exponential and polynomial backtracking. All cases of exponential backtracking have been fixed in #2590 and I plan to fix all non-moving polynomial backtracking regexes in #2597. FilesHere is a JSON of the results. Here is all the code I used to produce these results (scripts are supposed to be executed in the order extract -> measure -> group). |
Many thanks for your further verification and repair! @RunDevelopment
I'm very sorry, these changes have brought you great inconvenience to fix the issues. I want to explain why I have to make these changes (non-capturing -> capturing). Existing static analysis techniques cannot handle non-capturing and backreferences, etc. well. Although the existing dynamic fuzzing methods can process non-capturing and backreferences, etc., the detection is very time-consuming. Meanwhile, for dynamic methods, there will be false negatives. So I would like to try a combination of dynamic and static methods, that is, some regexes are directly checked by static methods (preferred), and the rest are checked by dynamic fuzzing. I want to try to change the initial regex (e.g., non-capturing -> capturing) so that it can be detected by static methods as much as possible. Thank you for pointing out the problem (
This may be related to the detection strategy I mentioned earlier. If I use the dynamic fuzzing method directly, there will be no such problem. It seems that I still need to make a further trade-off between detection efficiency and effect.
On the one hand, there are false negatives as mentioned above, and on the other hand, some characters (e.g., I really appreciate your responses :) |
I'm glad that my results could be of help.
I suppose that's the reason flags disappeared as well? Regarding the non-capturing groups. They can easily be converted to capturing ones if backreferences are also changed so that they still point to the right group. Here is a little JS function that does exactly that. Given the source of the regex (everything between the const { RegExpParser, visitRegExpAST } = require("regexpp");
/**
* @param {string} source
* @param {boolean} unicode
* @returns {string}
*/
function toCapturing(source, unicode) {
const patternAst = new RegExpParser().parsePattern(source, undefined, undefined, unicode);
/** @type {Map<import("regexpp/ast").CapturingGroup, number>} */
const groupNumbers = new Map();
let groupCounter = 0;
visitRegExpAST(patternAst, {
onGroupEnter() {
groupCounter++;
},
onCapturingGroupEnter(node) {
groupCounter++;
groupNumbers.set(node, groupCounter);
}
});
/** @type {{ start: number; end: number; text: string; }[]} */
const changes = [];
visitRegExpAST(patternAst, {
onGroupEnter(node) {
changes.push({
start: node.start,
end: node.start + "(?:".length,
text: "("
});
},
onBackreferenceEnter(node) {
changes.push({
start: node.start,
end: node.end,
text: "\\" + groupNumbers.get(node.resolved)
});
}
});
changes.reverse();
for (const { start, end, text } of changes) {
source = source.substring(0, start) + text + source.substring(end);
}
return source;
}
// example
toCapturing(/(?:)(a)\1/.source, false)
// returns '()(a)\\2' Flags are harder to remove. I don't want to shamelessly promote my own work but you could use my library refa. If you were using existing static analyzers that only support ASCII regexes (e.g. RXXR2), then the AST could be transformed to derive an ASCII regex from any JS regex. That being said, this is probably too much work for too little gain especially if your tool wasn't focused on JS regexes. I just wanted to point out that it is possible. |
@yetingli Can you comment on how your technique differs from the work of Rathnayake / Weideman / Wustholz / Shen? (Might be easier for you to send me an email -- davisjam@purdue.edu) |
Hi,
I would like to report 6 ReDoS vulnerabilities in prism (https://github.com/PrismJS/prism).
It allows cause a denial of service if highlighting crafted codes.
The vulnerable regular expression is
"(?:%\s*\n\s*%|%.|[^%"\r\n])*"
and is located inprism/components/prism-eiffel.js
Line 16 in 38f42dd
The ReDOS vulnerability can be exploited with the following crafted code string
The vulnerable regular expression is
(\s*)(?:\+[=-]+)+\+(?:\r?\n|\r)(?:\1(?:[+|].+)+[+|](?:\r?\n|\r))+\1(?:\+[=-]+)+\+
and is located inhttps://github.com/PrismJS/prism/blob/38f42dd668a7bf388dfe0f5ed4b07aacb23b0255/components/prism-rest.js#L4
The ReDOS vulnerability can be exploited with the following crafted code string
+=+\r++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++!
The vulnerable regular expression is
(\s*)(?:=+ +)+=+(?:(?:\r?\n|\r)\1.+)+(?:\r?\n|\r)\1(?:=+ +)+=+(?=(?:\r?\n|\r){2}|\s*$)
and is located inhttps://github.com/PrismJS/prism/blob/38f42dd668a7bf388dfe0f5ed4b07aacb23b0255/components/prism-rest.js#L11
The ReDOS vulnerability can be exploited with the following crafted code string
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
The vulnerable regular expression is
(^[ \t]*)\[(?!\[)(?:(["'$
])(?:(?!\2)[^\\]|\.)\2|[(?:[^\]\\]|\.)]|[^\]\\]|\.)*]` and is located inprism/components/prism-asciidoc.js
Line 4 in 38f42dd
The ReDOS vulnerability can be exploited with the following crafted code string
[1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1!
The vulnerable regular expression is
(^[^\S\r\n]*)---(?:\r\n?|\n)(?:.*(?:\r\n?|\n))*?[^\S\r\n]*\.\.\.$
and is located inprism/components/prism-tap.js
Line 15 in 38f42dd
The ReDOS vulnerability can be exploited with the following crafted code string
---\n\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r\r
The vulnerable regular expression is
((?:^|[&(])[ \t]*)if(?: ?\/[a-z?](?:[ :](?:"[^"]*"|\S+))?)* (?:not )?(?:cmdextversion \d+|defined \w+|errorlevel \d+|exist \S+|(?:"[^"]*"|\S+)?(?:==| (?:equ|neq|lss|leq|gtr|geq) )(?:"[^"]*"|\S+))
and is located inprism/components/prism-batch.js
Line 41 in 38f42dd
The ReDOS vulnerability can be exploited with the following crafted code string
'if'+'/? '*100+'!'
I think you can limit the input length or modify this regex.
Steps To Reproduce:
https://drive.google.com/file/d/10RJ21Xr7NRKFBjtA7IoqaRuUVO3v050V/view?usp=sharing
test.html
in the root directoryThe text was updated successfully, but these errors were encountered: