Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boost::regex match but not std::regex #39399

Open
llvmbot opened this issue Dec 17, 2018 · 13 comments
Open

boost::regex match but not std::regex #39399

llvmbot opened this issue Dec 17, 2018 · 13 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. regex Issues related to regex

Comments

@llvmbot
Copy link
Member

llvmbot commented Dec 17, 2018

Bugzilla Link 40052
Version 6.0
OS FreeBSD
Attachments Testcase
Reporter LLVM Bugzilla Contributor
CC @mclow

Extended Description

I got the attached program. It has a global locale set (de_DE.UTF-8).

I think it might be a bug in libc++ because
on Windows(MSVC 2013 & MSVC 2017) and on Linux (gcc 8.2 + libstdc++) this regex (from std) matches with the global locale from boost. Also the regex from boost matches (replace std::regex by boost::regex).

This bug triggers only (also on my box and only on freebsd with clang and libc++) when i use boost::locale. With std::locale() it matches.

I already submitted this bug to FreeBSD and to boost.org.

For reference, here are the links
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233994

Boost.locale says:
[quote]
Boost.Regex and Boost.Locale aren't related, the locale generated by Boost.Locale is "C" locale with addons unrelated to Boost.Regex
[/quote]
boostorg/locale#35

FreeBSD says, it is a bug in boost.locale.

As both of my direct upstream bugtrackers seem to "dislike" this bug, i report it to clang/libc++ directly.

@llvmbot
Copy link
Member Author

llvmbot commented Dec 17, 2018

assigned to @mclow

@llvmbot
Copy link
Member Author

llvmbot commented Dec 17, 2018

Expected Output of the Testcase:
All ok

Got:
Bug triggered

@mclow
Copy link
Contributor

mclow commented Dec 17, 2018

My first thought was that you had a "high ascii" character in the test case, and that was getting treated differently by the locale. That appears not to be the case.

@llvmbot
Copy link
Member Author

llvmbot commented Dec 17, 2018

@llvmbot
Copy link
Member Author

llvmbot commented Dec 17, 2018

I also tried to set the facets one by one. It just triggers at

  • all_characters
  • collation_facet
  • all_categories

I also updated the testcase to show how i did it.

@llvmbot
Copy link
Member Author

llvmbot commented Dec 17, 2018

The bug also goes away if i remove the icase flag from the regex.

@mclow
Copy link
Contributor

mclow commented Dec 17, 2018

Is it specific to the "de_DE.UTF-8" locale, or does it happen with others?
I'm thinking of other UTF-8 locales, like "en_US.UTF-8" or "fr_FR.UTF-8"

@mclow
Copy link
Contributor

mclow commented Dec 17, 2018

Check all the characters for tolower
Here's a program that calls translate_nocase for all the possible character values.
Can you run that and email me the output, please?

On my (Mac OS) machine, the values C0 -> DE are translated.

@mclow
Copy link
Contributor

mclow commented Dec 17, 2018

Whoops. I sent this to the wrong place. This should have been sent to https://reviews.llvm.org/D55746 instead. It may be related, but that's not for sure yet.

@llvmbot
Copy link
Member Author

llvmbot commented Dec 18, 2018

[expanded testcase that run 192 locales (1 segfaults if i do it) x 3 backends x 12 facets x icase onhttps://user-images.githubusercontent.com/60944935/143758461-cba7d36c-a8ae-4849-88cd-758f0286b165.gz)
I ran 13824 testcases.
192 locales (1 segfaults if i do it) x 3 backends x 12 facets x icase on/off.

One locale segfaults, but it is always the facets "all_characters , collation_facet, all_categories". C and POSIX work always ok.

@llvmbot
Copy link
Member Author

llvmbot commented Dec 18, 2018

@llvmbot
Copy link
Member Author

llvmbot commented Dec 18, 2018

Pivot Analysis of the result
The Pivot analysis of the data shows, its BUG always in "all_characters , collation_facet, all_categories". C and POSIX work always ok. Every other locale is affected. Even en_US.UTF-8.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@philnik777 philnik777 added the regex Issues related to regex label Jul 15, 2023
@Lord-Kamina
Copy link

I have made a new issue over at the boost.locale repository for what, I am assuming, is this same issue, but on mac, with AppleClang:

boostorg/locale#249

Using wandbox, you will see the issue does not seem to affect GCC and libstdc++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. regex Issues related to regex
Projects
None yet
Development

No branches or pull requests

4 participants