Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<format>: Wide and multibyte versions different behavior for char arg. #2320

Closed
jovibor opened this issue Nov 3, 2021 · 7 comments · Fixed by #4189
Closed

<format>: Wide and multibyte versions different behavior for char arg. #2320

jovibor opened this issue Nov 3, 2021 · 7 comments · Fixed by #4189
Labels
fixed Something works now, yay! format C++20/23 format

Comments

@jovibor
Copy link
Contributor

jovibor commented Nov 3, 2021

#include <iostream>
#include <format>

int main()
{
	char ch = -1;
	std::cout << std::format("cout: {:d}\r\n", ch);
	std::wcout << std::format(L"wcout: {:d}\r\n", ch);
}

The output:
image

Shouldn't the result be the same?
VS 16.11.5
Unfortunately I can't test in VS 17 at the moment.

@CaseyCarter CaseyCarter added bug Something isn't working format C++20/23 format LWG issue needed A wording defect that should be submitted to LWG as a new issue and removed bug Something isn't working labels Nov 3, 2021
@jovibor jovibor changed the title **<format>**: Wide and multibyte versions different behavior for **char** arg. **&lt;format&gt;**: Wide and multibyte versions different behavior for char arg. Nov 3, 2021
@jovibor jovibor changed the title **&lt;format&gt;**: Wide and multibyte versions different behavior for char arg. <format>: Wide and multibyte versions different behavior for char arg. Nov 3, 2021
@CaseyCarter
Copy link
Member

This is by design, but it seems like a defect to me. Per [format.arg]/5, initializing a basic_format_arg with a char c when char_type is wchar_t results in initializing the basic_format_arg's stored value with static_cast<wchar_t>(c). On our supported platforms, char is signed and wchar_t is unsigned and has 16-bit width, so static_cast<wchar_t>(char(-1)) is in fact wchar_t{65535}. Our behavior is what the Standard specifies. I observe that the specified would be reasonable (1) if the type character in the format string were c instead of d, or (2) if char were unsigned (whether plain char is signed or unsigned is implementation-defined in C++).

I _think _ the fix is to use a different storage type that is properly value-preserving regardless of the signed-ness of char (e.g., int), which will then do the right thing whether formatted with d or c. We need to validate this suggestion, implement it, and submit an LWG issue.

CaseyCarter added a commit to CaseyCarter/STL that referenced this issue Nov 3, 2021
* `format(L"{}", c)` for a `char` `c` renders the single wide character `wchar_t{c}` when it should instead format `c` as an integer.
* `format(L"{:d}", c)` for a `char` `c` formats `65536 + c` instead of `c` when `c < 0`. This seems to be a defect in the Standard for platforms with signed `char`.

Fixes microsoft#2320
@jovibor
Copy link
Contributor Author

jovibor commented Nov 3, 2021

The wording is here, but for me this is definitely wrong, blindly convert from signed to unsigned type.
Fmt library, btw, in both cases returns same result -1.

Update:
Interestingly, at godbolt fmt returns -1 in both cases (link)
But on Windows, locally, its results coincide with std::format:
image
That's so C++ish...😎

@jovibor
Copy link
Contributor Author

jovibor commented Feb 3, 2022

Is there any news on the subject, or where to track it?
Or it's gonna be another ABI-locked-C++ quirk, etched in the stone of time😌?

@frederick-vs-ja

This comment was marked as resolved.

@frederick-vs-ja

This comment was marked as outdated.

@cpplearner

This comment was marked as resolved.

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Sep 18, 2023

Why int? Is it because basic_format_arg behaves as-if it contains a variant, and the variant won't hold an unsigned char?

@vitaut, does P2909R1 intentionally make basic_format_arg hold an int when TD is char and char_type is wchar_t?

Oh, I found that it was a mistake. In P2909R2 the stored value is changed to static_cast<wchar_t>(static_cast<unsigned char>(v)), so the char value won't be handled as an int.

@StephanTLavavej StephanTLavavej removed the LWG issue needed A wording defect that should be submitted to LWG as a new issue label Nov 13, 2023
@StephanTLavavej StephanTLavavej added the fixed Something works now, yay! label Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Something works now, yay! format C++20/23 format
Projects
None yet
5 participants