-
Notifications
You must be signed in to change notification settings - Fork 1.6k
support utf8 languages in std::chrono::time_zone::get_info
#3102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
use `__std_code_page::_Utf8` instead of `__std_fs_code_page()` to support languages like Arabic. This is like what is used by `u8path`
|
Unfortunately, while I believe that using utf-8 everywhere would be nice, we can't break people who are using non-utf-8 code pages; I believe that the correct answer here is to instead set your application's code page to 65001 either by manifesting your application, or by calling |
|
This is not using utf-8 everywhere it is only for time zones strings because some of them need utf-8. The current behavior of the stl is insane because it encounters an error and do not report it correctly and because it does have non ascii time zones but it treats them as if they were ascii. Either those utf-8 strings should not be returned (I think this is out of stl control) or a proper encoding should be used. Requiring the user to call |
|
I agree with @strega-nil-ms here:
We talked about this at the weekly maintainer meeting, and we believe that we need to fully understand the issue #3097, and have it be reliably reproducible, before attempting to make any changes here. (A proper fix may take an entirely different form, to avoid breaking the scenarios that Nicole mentioned.) |
|
You can try to reproduce it locally by using a language that will give you non ascii chars instead of "GMT" in abbreviation. |
fsb4000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests failed because clang-format formatted that code slightly different.
stl/src/tzdb.cpp
Outdated
|
|
||
| const auto _Result = | ||
| __std_fs_convert_wide_to_narrow(_Code_page, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len); | ||
| __std_fs_convert_wide_to_narrow(__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const auto _Result = | |
| __std_fs_convert_wide_to_narrow(_Code_page, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len); | |
| __std_fs_convert_wide_to_narrow(__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len); | |
| const auto _Result = __std_fs_convert_wide_to_narrow( | |
| __std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you solve this formatting issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do:
cmake -G Ninja -S . -B out\build\x64
cmake --build .\out\build\x64 --target format
|
The strange part, in my opinion, is even trying to get the abbreviated time zone name in localized form. I don't think the IANA tzdb that we're emulating has that data, just abbreviations like "PDT". Cario's abbreviation, for example, is |
Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>
Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>
|
It seems that HowardHinnant's date library (the original of c++20 time zone) has been using |
|
Yeah, the |
|
This should instead probably be fixed by #3122; it sucks that the default code page isn't able to represent Arabic, but we also need to avoid breaking people who are depending on narrow encodings that are not UTF-8. Thank you so much @cppdev123 for bringing this to our attention! I'm glad that we seem to have been able to fix at least the problem that we were throwing "operation completed successfully". Could you please try out #3122 and tell me if that fixes it for you? |
|
@strega-nil-ms this fix the empty error message but it is still failing to do the conversion. Can you add a fallback to try utf8 when |
|
@cppdev123 mmh. That's an interesting idea; @microsoft/vclibs, how do we feel about doing this "fallback" if the unicode stuff fails? |
I think only the second one is viable. Unless I'm missing something, there's no way for the user to figure out if a "fallback to UTF-8" is taken, so they can't know which encoding was used. It needs to always be the same encoding, UTF-8 or the user's CP, and as @strega-nil-ms says, we're already locked into the latter. |
|
While we appreciate this PR, unfortunately, due to the reasons stated above, we cannot switch to using UTF-8 as the narrow code page. Thanks so much for bringing this to our attention, though, and we hope to fix the underlying issue soon! |
use
__std_code_page::_Utf8instead of__std_fs_code_page()to support languages like Arabic. This is like what is used byu8path.This should fix #3097