Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous failure to match in simple regex #289

Closed
ConnorGray opened this issue Oct 26, 2016 · 13 comments · Fixed by #290
Closed

Erroneous failure to match in simple regex #289

ConnorGray opened this issue Oct 26, 2016 · 13 comments · Fixed by #290
Labels

Comments

@ConnorGray
Copy link

ConnorGray commented Oct 26, 2016

The simple regexes ((IMG|MVI|MGG)|DS)X and (((IMG|MVI)|MG)|DS)X both ought to match the string "MVIX, but only the second one does.

extern crate regex;

use regex::{Regex};

fn main() {
    simple_match_1();
    simple_match_2();
}

/// This example working on regex101: https://regex101.com/r/WJgwJ5/1
#[test]
fn simple_match_1() {
    let regex_str = "((IMG|MVI|MG)|DS)X";
    let text = "MVIX";

    let regex: Regex = Regex::new(regex_str).unwrap();

    // These both panic when they shouldn't
    assert!(regex.is_match(text));
    assert!(regex.find(text) != None);
}

/// This example *does not* panic
#[test]
fn simple_match_2() {
    let regex_str = "(((IMG|MVI)|MG)|DS)X";
    let text = "MVIX";

    let regex: Regex = Regex::new(regex_str).unwrap();

    // These both *work* (no panic)
    assert!(regex.is_match(text));
    assert!(regex.find(text) != None);
}
@mbrubeck
Copy link
Contributor

git bisect shows that this is a regression from 55a1fc9.

@mbrubeck
Copy link
Contributor

The minimal test case I've found for this bug is:

assert!(Regex::new("(ABC|CDA|BC)X").unwrap().is_match("CDAX"));

@BurntSushi BurntSushi added the bug label Oct 26, 2016
@BurntSushi
Copy link
Member

Gah. Probably a bug in the literal optimizations (again).

@BurntSushi
Copy link
Member

BurntSushi commented Oct 26, 2016

The problem here is that the literals returned by regex_syntax::Literals::unambiguous_prefixes for (ABC|CDA|BC)X are [Cut(A), Cut(BCX), Complete(CDAX)], which is wrong. Namely, it doesn't satisfy the documented contract of the function:

    /// Any substring match with a member of the set is returned is guaranteed
    /// to never overlap with a substring match of another member of the set
    /// at the same starting position.
    ///
    /// Given any two members of the returned set, neither is a substring of
    /// the other.

... because A is a substring of CDAX.

BurntSushi added a commit to BurntSushi/regex that referenced this issue Oct 26, 2016
Specifically, given the strings ABCX, CDAX and BCX, it was reporting
the unambiguous set as A, BCX and CDAX, which is wrong since A is a
substring of CDAX.

unambiguous_prefixes is now quite a bit of a mess, but so is the rest
of the literal extraction code. The only thing it has going for it is
a massive test suite.

Fixes rust-lang#289
@BurntSushi
Copy link
Member

OK, 0.1.78 should be on crates.io with this fix. @ConnorGray Thanks so much for reporting this! @mbrubeck Thanks so much for minimizing the example. :-)

@ConnorGray
Copy link
Author

No problem! :) Thanks for the quick fix!

@ConnorGray
Copy link
Author

I'm new to this so maybe I made a mistake, but this seems to still not be working in the original simple_match_1() case after updating to regex 0.1.78 on my machine.

@BurntSushi
Copy link
Member

@ConnorGray Wow. Serves me right for trusting the minimal example. Dammit. Your original example does indeed fail. sigh Working on it...

@BurntSushi
Copy link
Member

@ConnorGray Derp. The minimal example was good. I just messed up the release. 0.1.79 incoming...

@BurntSushi
Copy link
Member

@ConnorGray All right, 0.1.79 is out. Try again? :-)

@ConnorGray
Copy link
Author

ConnorGray commented Oct 26, 2016

@BurntSushi Looks good :)

@BurntSushi
Copy link
Member

Phew. Thanks for your patience!

@ConnorGray
Copy link
Author

Haha no problem, I was surprised it got fixed this quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants