Skip to content

<regex>: Optional empty repetitions are illegal #5490

@Alcaro

Description

@Alcaro

Describe the bug

NOTE 4
Step 1 of the RepeatMatcher's d closure states that, once the minimum number of repetitions has been satisfied, any more expansions of Atom that match the empty String are not considered for further repetitions.

~JS spec, https://262.ecma-international.org/5.1/#sec-15.10.2.5

MS-STL currently does not comply with that clause (and neither do libstdc++ or libc++).

Command-line test case

#include <regex>
#include <stdio.h>

int main() {
    try {
        std::string s{"b"};
        std::regex r{"(a*)*"};
        std::smatch m;
        bool result = std::regex_search(s, m, r);
        printf("regex_search: %d\n", result);
        for (unsigned i = 0; i < m.size(); ++i) {
            printf("m[%d]", i);
            if (m[i].matched) {
                printf(".str(): \"%s\"\n", m[i].str().c_str());
            } else {
                puts(".matched: false");
            }
        }
    } catch (const std::exception& e) {
        printf("Exception: %s\n", e.what());
    }
}

This also affects whether a(b?)+c\1d matches abcd (it shouldn't), though various recently-fixed regex bugs make it hard to demonstrate on Godbolt.

Expected behavior

Group 0 - empty string
Group 1 - no match

STL version

@muellerj2 says it's still present on main as of today

Additional context

https://godbolt.org/z/s7ejf7GKv

llvm/llvm-project#133314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120212

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!regexmeow is a substring of homeowner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions