Skip to content

Fix TwoWaySearcher to work when used with periodic needles. #16612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 23, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion src/libcore/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,8 @@ struct TwoWaySearcher {
memory: uint
}

// This is the Two-Way search algorithm, which was introduced in the paper:
// Crochemore, M., Perrin, D., 1991, Two-way string-matching, Journal of the ACM 38(3):651-675.
impl TwoWaySearcher {
fn new(needle: &[u8]) -> TwoWaySearcher {
let (critPos1, period1) = TwoWaySearcher::maximal_suffix(needle, false);
Expand All @@ -437,7 +439,14 @@ impl TwoWaySearcher {
let byteset = needle.iter()
.fold(0, |a, &b| (1 << ((b & 0x3f) as uint)) | a);

if needle.slice_to(critPos) == needle.slice_from(needle.len() - critPos) {

// The logic here (calculating critPos and period, the final if statement to see which
// period to use for the TwoWaySearcher) is essentially an implementation of the
// "small-period" function from the paper (p. 670)
//
// In the paper they check whether `needle.slice_to(critPos)` is a suffix of
// `needle.slice(critPos, critPos + period)`, which is precisely what this does
if needle.slice_to(critPos) == needle.slice(period, period + critPos) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not 100% sure this handles not-exactly-period strings correctly, e.g. naXYZna, naxyzna, playpen
(I don't know this algorithm, so it might be doing the right thing, but it certainly seems weird that those two differ.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your link is mangled for me, so I'm not sure what you're referring to. Are you worried about maximal_suffix returning different factorizations for "naXYZna" and "naxyzna"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm worried about that. maximal_suffix is giving different periods for "naXYZna" and "naxyzna", which seems to contradict how my port of glibc's critical_factorization function works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw Nevermind, my port had a bug. I think what I have in the PR is correct. The maximal_suffix returns (i, p) where i is the starting position of the suffix found and p is the period of the suffix. So the factorization for "naxyzna" is ("naxy", "zna") and the factorization for "naXYZna" is ("na", "XYZna") (because the "maximal" in the function name is based on lexicographic ordering, not length). And actually, my comment in the code was wrong. It's actually checking whether the suffix is periodic. So that's why one is periodic and not the other. I will update the comments to make this clear.

TwoWaySearcher {
critPos: critPos,
period: period,
Expand Down Expand Up @@ -508,6 +517,9 @@ impl TwoWaySearcher {
}
}

// returns (i, p) where i is the "critical position", the starting index of
// of maximal suffix, and p is the period of the suffix
// see p. 668 of the paper
#[inline]
fn maximal_suffix(arr: &[u8], reversed: bool) -> (uint, uint) {
let mut left = -1; // Corresponds to i in the paper
Expand Down
20 changes: 20 additions & 0 deletions src/libcoretest/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,27 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

fn check_contains_all_substrings(s: &str) {
assert!(s.contains(""));
for i in range(0, s.len()) {
for j in range(i+1, s.len() + 1) {
assert!(s.contains(s.slice(i, j)));
}
}
}

#[test]
fn strslice_issue_16589() {
assert!("bananas".contains("nana"));

// prior to the fix for #16589, x.contains("abcdabcd") returned false
// test all substrings for good measure
check_contains_all_substrings("012345678901234567890123456789bcdabcdabcd");
}


#[test]
fn test_strslice_contains() {
let x = "There are moments, Jeeves, when one asks oneself, 'Do trousers matter?'";
check_contains_all_substrings(x);
}