Skip to content

Poor codegen for string search (i.e., memchr) with iterators #94573

Open
@mcy

Description

@mcy

A colleague provided me with this benchmark, which compares the following equivalent C++ and Rust:

bool HasMoreThanTwoDashes(std::string_view sv) {
    return sv.find_first_of('-', sv.find_first_of('-')+1) != std::string_view::npos;
}
fn has_more_than_two_dashes(s: &[u8]) -> bool {
  sl.iter().filter(|&&c| c == b'-').count() > 2
}

C++ performs 20x better in the microbenchmark. We haven't tried very hard to figure out why, but our suspicion is that Rust's implementation of memchr is worse than the one provided by the system that ran the benchmark. C++ also bails out early, and Rust does not, but that appears not to matter because the dashes are at the far end of the strings.

We're not actually sure what CPU quick-bench ran this with, but I can run it on my (icelake, I think) Xeon later. I think the difference in memchr implementations is worth investigating regardless.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-discussionCategory: Discussion or questions that doesn't represent real issues.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions