Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

char boundary byte indexing panic when using Regex::split #417

Closed
frewsxcv opened this issue Nov 24, 2017 · 3 comments
Closed

char boundary byte indexing panic when using Regex::split #417

frewsxcv opened this issue Nov 24, 2017 · 3 comments

Comments

@frewsxcv
Copy link
Member

frewsxcv commented Nov 24, 2017

extern crate regex;

fn main() {
    let a = std::str::from_utf8(b"\\B(?-u)|0").unwrap();
    let b = std::str::from_utf8(b"\n\xcd\x86").unwrap();
    let c = regex::Regex::new(a).unwrap();
    c.split(b).collect::<Vec<_>>(); 
}
thread 'main' panicked at 'byte index 2 is not a char boundary; it is inside '͆' (bytes 1..3) of `
͆`', src/libcore/str/mod.rs:2232:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.

line where the panic happens

found via afl.rs using this fuzz target

@frewsxcv frewsxcv changed the title char boundary byte indexing panic when using Regex::spit char boundary byte indexing panic when using Regex::split Nov 24, 2017
@BurntSushi
Copy link
Member

BurntSushi commented Nov 24, 2017 via email

@frewsxcv
Copy link
Member Author

@BurntSushi yep, just confirmed it happens in master too. here's a backtrace

@BurntSushi
Copy link
Member

This was fixed in regex 0.2.7. In particular, negated word boundaries can match invalid UTF-8, which the new regex-syntax crate now detects correctly. Previous it didn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants