Skip to content

Conversation

@cppdev123
Copy link

use __std_code_page::_Utf8 instead of __std_fs_code_page() to support languages like Arabic. This is like what is used by u8path.
This should fix #3097

 use `__std_code_page::_Utf8` instead of `__std_fs_code_page()` to support languages like Arabic. This is like what is used by `u8path`
@cppdev123 cppdev123 requested a review from a team as a code owner September 14, 2022 17:38
@strega-nil-ms
Copy link
Contributor

strega-nil-ms commented Sep 14, 2022

Unfortunately, while I believe that using utf-8 everywhere would be nice, we can't break people who are using non-utf-8 code pages; I believe that the correct answer here is to instead set your application's code page to 65001 either by manifesting your application, or by calling setlocale(LC_ALL, ".UTF-8") very early in your program, in order to force UTF-8.

@cppdev123
Copy link
Author

This is not using utf-8 everywhere it is only for time zones strings because some of them need utf-8. The current behavior of the stl is insane because it encounters an error and do not report it correctly and because it does have non ascii time zones but it treats them as if they were ascii. Either those utf-8 strings should not be returned (I think this is out of stl control) or a proper encoding should be used. Requiring the user to call setlocale or using manifests which affect the whole program for only this feature to work is cumbersome and clearly not the way to go also not documented by the standard

@StephanTLavavej StephanTLavavej added the bug Something isn't working label Sep 14, 2022
@StephanTLavavej
Copy link
Member

I agree with @strega-nil-ms here:

Unfortunately, while I believe that using utf-8 everywhere would be nice, we can't break people who are using non-utf-8 code pages

We talked about this at the weekly maintainer meeting, and we believe that we need to fully understand the issue #3097, and have it be reliably reproducible, before attempting to make any changes here. (A proper fix may take an entirely different form, to avoid breaking the scenarios that Nicole mentioned.)

@cppdev123
Copy link
Author

You can try to reproduce it locally by using a language that will give you non ascii chars instead of "GMT" in abbreviation.
Go to 'Control Panel\Clock and Region' and click on 'Change data, time or number formats' then in the 'Formats' tab select a language from 'Format' combo box in the top of the page. I was previously using 'Arabic (Egypt)' and had this odd error but now I changed it to 'English (United states) ' and the example now works flawlessly and produces this output:

current timezone name: Africa/Cairo
current system time: 2022-09-14 22:54:56.2932052
sys_info: offset: 7200s, abbrevGMT+2
current local time: 2022-09-15 00:54:56.2932052

Copy link
Contributor

@fsb4000 fsb4000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests failed because clang-format formatted that code slightly different.

stl/src/tzdb.cpp Outdated
Comment on lines 213 to 214

const auto _Result =
__std_fs_convert_wide_to_narrow(_Code_page, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);
__std_fs_convert_wide_to_narrow(__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto _Result =
__std_fs_convert_wide_to_narrow(_Code_page, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);
__std_fs_convert_wide_to_narrow(__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);
const auto _Result = __std_fs_convert_wide_to_narrow(
__std_code_page::_Utf8, _Input_as_wchar, _Input_len, _Data.get(), _Count_result._Len);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you solve this formatting issue ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do:

cmake -G Ninja -S . -B out\build\x64
cmake --build .\out\build\x64 --target format

@MattStephanson
Copy link
Contributor

MattStephanson commented Sep 15, 2022

The strange part, in my opinion, is even trying to get the abbreviated time zone name in localized form. I don't think the IANA tzdb that we're emulating has that data, just abbreviations like "PDT". Cario's abbreviation, for example, is "EE[SD]T" "EE[S]T" in the tzdb, for "Eastern European [Summer] Time". If ICU has something akin to a "C" locale, it might make sense to always get the names and abbreviations in that locale rather than the user's.

cppdev123 and others added 2 commits September 15, 2022 15:10
Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>
Co-authored-by: Igor Zhukov <fsb4000@yandex.ru>
@cppdev123
Copy link
Author

It seems that HowardHinnant's date library (the original of c++20 time zone) has been using CP_UTF8 without issues. Also the setlocale(LC_ALL, ".UTF-8") trick worked for me.
Another thing is that when the conversion function returns error ERROR_NO_UNICODE_TRANSLATION it should use SetLastError or indicate in the return result that this error is not due to a winapi function failure. This will prevent failing with error code set to 0

@strega-nil-ms
Copy link
Contributor

Yeah, the ERROR_NO_UNICODE_TRANSLATION -> 0 is definitely a bug.

@strega-nil-ms
Copy link
Contributor

This should instead probably be fixed by #3122; it sucks that the default code page isn't able to represent Arabic, but we also need to avoid breaking people who are depending on narrow encodings that are not UTF-8.

Thank you so much @cppdev123 for bringing this to our attention! I'm glad that we seem to have been able to fix at least the problem that we were throwing "operation completed successfully".

Could you please try out #3122 and tell me if that fixes it for you?

@cppdev123
Copy link
Author

@strega-nil-ms this fix the empty error message but it is still failing to do the conversion. Can you add a fallback to try utf8 when ERROR_NO_UNICODE_TRANSLATION is returned or use a fallback string of format "GTM+{}" since the offset is already calculated. this is better than failing due to the abbreviation which is usually not needed but only the time zone difference

@strega-nil-ms
Copy link
Contributor

@cppdev123 mmh. That's an interesting idea; @microsoft/vclibs, how do we feel about doing this "fallback" if the unicode stuff fails?

@MattStephanson
Copy link
Contributor

Can you add a fallback to try utf8 when ERROR_NO_UNICODE_TRANSLATION is returned or use a fallback string of format "GTM+{}" since the offset is already calculated.

I think only the second one is viable. Unless I'm missing something, there's no way for the user to figure out if a "fallback to UTF-8" is taken, so they can't know which encoding was used. It needs to always be the same encoding, UTF-8 or the user's CP, and as @strega-nil-ms says, we're already locked into the latter.

@strega-nil-ms
Copy link
Contributor

strega-nil-ms commented Oct 14, 2022

While we appreciate this PR, unfortunately, due to the reasons stated above, we cannot switch to using UTF-8 as the narrow code page. Thanks so much for bringing this to our attention, though, and we hope to fix the underlying issue soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

std::chrono::time_zone::get_info throws an exception saying: The operation completed successfully

5 participants