Skip to content

<regex>: Implementation divergence for capture group behavior #5365

@StephanTLavavej

Description

@StephanTLavavej
C:\Temp>type meow.cpp
#include <cstddef>
#include <exception>
#include <print>
#include <string>

#ifdef USE_BOOST
#include <boost/regex.hpp>
namespace std_or_boost = boost;
constexpr auto library = "boost";
#else
#include <regex>
namespace std_or_boost = std;
constexpr auto library = "std";
#endif

using std::exception, std::print, std::println, std::size_t, std::string;

int main() {
    try {
        println("Library: {}", library);
        const string s{"acbd"};
        const std_or_boost::regex r{"(?:(a)|(b)|(c)|(d))+"};
        std_or_boost::smatch m;
        const bool result{std_or_boost::regex_search(s, m, r)};
        println("regex_search: {}", result);
        for (size_t i = 0; i < m.size(); ++i) {
            print("m[{}]", i);
            if (m[i].matched) {
                println(R"(.str(): "{}")", m[i].str());
            } else {
                println(".matched: false");
            }
        }
    } catch (const exception& e) {
        println("Exception: {}", e.what());
    }
}

VS 2022 17.14 Preview 2 prints:

C:\Temp>cl /EHsc /nologo /W4 /std:c++latest /MTd /Od meow.cpp && meow
meow.cpp
Library: std
regex_search: true
m[0].str(): "acbd"
m[1].str(): "a"
m[2].str(): "b"
m[3].matched: false
m[4].str(): "d"

microsoft/STL main prints the exact same thing (as of f2a2933 with @muellerj2's amazing #5218 merged), so we haven't regressed or improved. #5218 did fix several other long-standing bugs in our internal database, so I was surprised to see that this one remained.

And we have implementation divergence! See: https://godbolt.org/z/cjz8PWaf7

libstdc++ 14.2 and Boost 1.87.0 agree, differing only in their Library output:

Library: std
regex_search: true
m[0].str(): "acbd"
m[1].str(): "a"
m[2].str(): "b"
m[3].str(): "c"
m[4].str(): "d"
Library: boost
regex_search: true
m[0].str(): "acbd"
m[1].str(): "a"
m[2].str(): "b"
m[3].str(): "c"
m[4].str(): "d"

But libc++ 20.1 says:

Library: std
regex_search: true
m[0].str(): "acbd"
m[1].matched: false
m[2].matched: false
m[3].matched: false
m[4].str(): "d"

Originally reported as VSO-110491 / AB#110491 (in 2014 or earlier via the now-defunct Microsoft Connect). The original user expected libstdc++/Boost's behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!regexmeow is a substring of homeowner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions