-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create differential fuzzers and add them to OSS-Fuzz #1044
base: master
Are you sure you want to change the base?
Conversation
b0fcd6f
to
0873b6f
Compare
Some last things I'm not quite sure what to do on:
|
Found a few disagreements already. Some of them I'm reasonably certain are just misuses of the API on my part. Others appear to be rather rare disagreements caused by unicode word boundary shenanigans again. Copying an example below (I can raise a separate issue but not sure if desired): #[test]
fn unicode_disagreement() {
let pattern = ".+\\b\u{a}";
let haystack = "7\u{ffff}77\u{a}";
let baseline = PikeVM::new(pattern).unwrap();
let mut cache = baseline.create_cache();
let re = Regex::new(pattern).unwrap();
let found1 = re.find(haystack);
let found2 = baseline.find(&mut cache, haystack);
if let Some(found1) = found1 {
let found2 = found2.expect("Found in target, but not in baseline!");
assert_eq!(found1.start(), found2.start());
assert_eq!(found1.end(), found2.end());
}
}
#[test]
fn unicode_disagreement2() {
let pattern = ".\u{7}+.\\B";
let haystack = "7\u{7}\u{7}\u{1}\u{ffff}";
let baseline = PikeVM::new(pattern).unwrap();
let mut cache = baseline.create_cache();
let re = Regex::new(pattern).unwrap();
let found1 = re.find(haystack);
let found2 = baseline.find(&mut cache, haystack);
if let Some(found1) = found1 {
let found2 = found2.expect("Found in target, but not in baseline!");
assert_eq!(found1.start(), found2.start());
assert_eq!(found1.end(), found2.end());
}
} Let me know how you want me to file these and I'll start working away at it, but I get the feeling that some of these are false positives... |
@addisoncrump I think that unit test depends on what use regex_automata::{meta::regex, nfa::thompson::pikevm::pikevm};
fn main() {
env_logger::init();
let pattern = r".+\b\n";
let haystack = "β77\n";
let baseline = pikevm::new(pattern).unwrap();
let mut cache = baseline.create_cache();
let re = regex::new(pattern).unwrap();
let found1 = re.find(haystack);
let found2 = baseline.find(&mut cache, haystack);
if let some(found1) = found1 {
let found2 = found2.expect("found in target, but not in baseline!");
assert_eq!(found1, found2);
}
} Also enabled the As for what to do here... Hmmm. I'd say this one is definitely deserving of its own bug ticket: #1046 I don't necessarily want you going through the rigamarole of creating a bug ticket for every case you find? I'm not sure. Depends on how many they are and whether they're all the same bug. |
There are lots of mismatches on DFA targets, but I think I'm misusing that API. I worry that I won't be able to distinguish between crashes or do effective root-cause analysis on some of these crashes as they are quite in the weeds 😅 I will do my best to minimise and look into it as much as I can, but if I can't figure it out I'll share a reproducer. |
Aye thanks. If you burn out of it (I know going through these test cases is a lot of work) then just sharing the cluster test inputs or whatever would be great. Then I'll be able to go through them (at some point). And yeah unfortunately the API for regex-automata is incredibly complex. There's a billion knobs. I think my docs should explain everything but, well, there are a lot of docs. |
A slight concern I have merging this too early: if we aren't reasonably certain that we haven't gotten rid of at least most of the mismatches before merging, I'm reasonably confident that OSS-Fuzz may fail to report some bugs... I need to look into it further, but I'm fairly certain it deduplicates by stacktrace -- and here, all of our stack traces will be the same for different bugs if they trigger the same assertion! I've asked the OSS-Fuzz folks and will think about how to avoid this problem a bit before merging. |
Yes I would probably run this locally (or use your test cases) before merging and try to squash some bugs first. Basically, get it to a point where I can run it over night and not have it find anything. (That's what I did with fuzzers currently on master IIRC.) |
Any interest in moving forward with this? We can simply not put them in OSS-Fuzz, as you prefer. |
dbb629d
to
312764d
Compare
I am definitely interested, but it might take me a while to get back around to this unfortunately. |
Okay, no issues. For the sake of making clear what remains: the DFA differential fuzzers seem to encounter issues almost immediately, but I'm not entirely sure if this is my fault. The other fuzzers execute as expected and have already found some bugs that were previously addressed. |
This attempts to implement #1003.