support utf8 languages in `std::chrono::time_zone::get_info` #3102

cppdev123 · 2022-09-14T17:38:14Z

use __std_code_page::_Utf8 instead of __std_fs_code_page() to support languages like Arabic. This is like what is used by u8path.
This should fix #3097

use `__std_code_page::_Utf8` instead of `__std_fs_code_page()` to support languages like Arabic. This is like what is used by `u8path`

strega-nil-ms · 2022-09-14T18:17:55Z

Unfortunately, while I believe that using utf-8 everywhere would be nice, we can't break people who are using non-utf-8 code pages; I believe that the correct answer here is to instead set your application's code page to 65001 either by manifesting your application, or by calling setlocale(LC_ALL, ".UTF-8") very early in your program, in order to force UTF-8.

cppdev123 · 2022-09-14T21:36:34Z

This is not using utf-8 everywhere it is only for time zones strings because some of them need utf-8. The current behavior of the stl is insane because it encounters an error and do not report it correctly and because it does have non ascii time zones but it treats them as if they were ascii. Either those utf-8 strings should not be returned (I think this is out of stl control) or a proper encoding should be used. Requiring the user to call setlocale or using manifests which affect the whole program for only this feature to work is cumbersome and clearly not the way to go also not documented by the standard

StephanTLavavej · 2022-09-14T21:58:25Z

I agree with @strega-nil-ms here:

Unfortunately, while I believe that using utf-8 everywhere would be nice, we can't break people who are using non-utf-8 code pages

We talked about this at the weekly maintainer meeting, and we believe that we need to fully understand the issue #3097, and have it be reliably reproducible, before attempting to make any changes here. (A proper fix may take an entirely different form, to avoid breaking the scenarios that Nicole mentioned.)

cppdev123 · 2022-09-14T23:00:06Z

You can try to reproduce it locally by using a language that will give you non ascii chars instead of "GMT" in abbreviation.
Go to 'Control Panel\Clock and Region' and click on 'Change data, time or number formats' then in the 'Formats' tab select a language from 'Format' combo box in the top of the page. I was previously using 'Arabic (Egypt)' and had this odd error but now I changed it to 'English (United states) ' and the example now works flawlessly and produces this output:

current timezone name: Africa/Cairo
current system time: 2022-09-14 22:54:56.2932052
sys_info: offset: 7200s, abbrevGMT+2
current local time: 2022-09-15 00:54:56.2932052

fsb4000

Tests failed because clang-format formatted that code slightly different.

stl/src/tzdb.cpp

fsb4000 · 2022-09-15T03:04:55Z

stl/src/tzdb.cpp


        const auto _Result =
-            __std_fs_convert_wide_to_narrow(_Code_page, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);
+            __std_fs_convert_wide_to_narrow(__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);


Suggested change

const auto _Result =

__std_fs_convert_wide_to_narrow(_Code_page, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);

__std_fs_convert_wide_to_narrow(__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);

const auto _Result = __std_fs_convert_wide_to_narrow(

__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);

can you solve this formatting issue ?

you can do:

cmake -G Ninja -S . -B out\build\x64 cmake --build .\out\build\x64 --target format

MattStephanson · 2022-09-15T05:52:46Z

The strange part, in my opinion, is even trying to get the abbreviated time zone name in localized form. I don't think the IANA tzdb that we're emulating has that data, just abbreviations like "PDT". Cario's abbreviation, for example, is ~~"EE[SD]T"~~ "EE[S]T" in the tzdb, for "Eastern European [Summer] Time". If ICU has something akin to a "C" locale, it might make sense to always get the names and abbreviations in that locale rather than the user's.

Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>

stl/src/tzdb.cpp

Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>

cppdev123 · 2022-09-16T12:11:21Z

It seems that HowardHinnant's date library (the original of c++20 time zone) has been using CP_UTF8 without issues. Also the setlocale(LC_ALL, ".UTF-8") trick worked for me.
Another thing is that when the conversion function returns error ERROR_NO_UNICODE_TRANSLATION it should use SetLastError or indicate in the return result that this error is not due to a winapi function failure. This will prevent failing with error code set to 0

strega-nil-ms · 2022-09-16T15:35:34Z

Yeah, the ERROR_NO_UNICODE_TRANSLATION -> 0 is definitely a bug.

strega-nil-ms · 2022-09-22T21:27:11Z

This should instead probably be fixed by #3122; it sucks that the default code page isn't able to represent Arabic, but we also need to avoid breaking people who are depending on narrow encodings that are not UTF-8.

Thank you so much @cppdev123 for bringing this to our attention! I'm glad that we seem to have been able to fix at least the problem that we were throwing "operation completed successfully".

Could you please try out #3122 and tell me if that fixes it for you?

cppdev123 · 2022-09-22T22:01:18Z

@strega-nil-ms this fix the empty error message but it is still failing to do the conversion. Can you add a fallback to try utf8 when ERROR_NO_UNICODE_TRANSLATION is returned or use a fallback string of format "GTM+{}" since the offset is already calculated. this is better than failing due to the abbreviation which is usually not needed but only the time zone difference

strega-nil-ms · 2022-09-23T16:08:45Z

@cppdev123 mmh. That's an interesting idea; @microsoft/vclibs, how do we feel about doing this "fallback" if the unicode stuff fails?

MattStephanson · 2022-09-23T17:32:42Z

Can you add a fallback to try utf8 when ERROR_NO_UNICODE_TRANSLATION is returned or use a fallback string of format "GTM+{}" since the offset is already calculated.

I think only the second one is viable. Unless I'm missing something, there's no way for the user to figure out if a "fallback to UTF-8" is taken, so they can't know which encoding was used. It needs to always be the same encoding, UTF-8 or the user's CP, and as @strega-nil-ms says, we're already locked into the latter.

strega-nil-ms · 2022-10-14T23:37:59Z

While we appreciate this PR, unfortunately, due to the reasons stated above, we cannot switch to using UTF-8 as the narrow code page. Thanks so much for bringing this to our attention, though, and we hope to fix the underlying issue soon!

support utf8 languages

446dc47

use `__std_code_page::_Utf8` instead of `__std_fs_code_page()` to support languages like Arabic. This is like what is used by `u8path`

cppdev123 requested a review from a team as a code owner September 14, 2022 17:38

support utf8 languages

fb4b469

StephanTLavavej added the bug Something isn't working label Sep 14, 2022

fsb4000 reviewed Sep 15, 2022

View reviewed changes

cppdev123 and others added 2 commits September 15, 2022 15:10

Update stl/src/tzdb.cpp

236ee24

Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>

Update tzdb.cpp

d0f1c9d

fsb4000 reviewed Sep 15, 2022

View reviewed changes

stl/src/tzdb.cpp Outdated Show resolved Hide resolved

Update stl/src/tzdb.cpp

0081d71

Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>

StephanTLavavej assigned strega-nil-ms Sep 21, 2022

strega-nil-ms mentioned this pull request Oct 14, 2022

<chrono>: time_zone::get_info fails when the current code page cannot represent the localized time zone short ID. #3158

Closed

strega-nil-ms closed this Oct 14, 2022

support utf8 languages in std::chrono::time_zone::get_info #3102

support utf8 languages in std::chrono::time_zone::get_info #3102

Uh oh!

Conversation

cppdev123 commented Sep 14, 2022

Uh oh!

strega-nil-ms commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cppdev123 commented Sep 14, 2022

Uh oh!

StephanTLavavej commented Sep 14, 2022

Uh oh!

cppdev123 commented Sep 14, 2022

Uh oh!

fsb4000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fsb4000 Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

cppdev123 Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

fsb4000 Sep 15, 2022

Choose a reason for hiding this comment

Uh oh!

MattStephanson commented Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cppdev123 commented Sep 16, 2022

Uh oh!

strega-nil-ms commented Sep 16, 2022

Uh oh!

strega-nil-ms commented Sep 22, 2022

Uh oh!

cppdev123 commented Sep 22, 2022

Uh oh!

strega-nil-ms commented Sep 23, 2022

Uh oh!

MattStephanson commented Sep 23, 2022

Uh oh!

strega-nil-ms commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

support utf8 languages in `std::chrono::time_zone::get_info` #3102

support utf8 languages in `std::chrono::time_zone::get_info` #3102

strega-nil-ms commented Sep 14, 2022 •

edited

Loading

MattStephanson commented Sep 15, 2022 •

edited

Loading

strega-nil-ms commented Oct 14, 2022 •

edited

Loading