Support UTF-8 Everywhere #269

4A696D · 2017-11-07T12:09:12Z

In order to support the UTF-8 Everywhere principle, please consider adding the following to hstring:

a constructor and assignment operator taking a std::string containing UTF-8 encoded text
a conversion operator returning a std::string containing UTF-8 encoded text

Thanks!

kennykerr · 2017-11-08T20:53:42Z

We are very concerned about UTF-8 support on Windows, but very unlikely to add implicit conversions as it can have unintended performance consequences. Same reason there's no implicit conversion between std::string and std::wstring. We have some scenarios on Windows where this really hurts performance. It's a hard problem and we are working on it. Thanks for the feedback!

4A696D · 2017-11-09T15:08:12Z

Surely it's up to the cppwinrt user to decide whether the cost of conversion is acceptable. I've been doing UTF-8 Everywhere for several years with WinAPI apps and it hasn't caused a single performance issue. And supposing such an issue did arise, it should be pretty simple to solve (for instance by using UTF-16 in the affected section of code).

I think the reason there's no conversion between std::string and std::wstring is mainly that the standard library can't assume any particular character encoding. But cppwinrt is certainly in a position to make such an assumption. Indeed, I notice that the current version already includes a function to_hstring, which takes a std::string_view that is assumed to reference UTF-8 data!

If cppwinrt included the implicit conversions I requested, it really would be a lot more attractive to folks like me who use UTF-8 everywhere.

MikeGitb · 2017-11-09T15:51:10Z

Imho things like string conversion should be explicit, as this can actually be not only a performance but also a correctness issue: A std::string might not actually contain a utf8 encoded string and IIRC, NTFS paths might actually contain 16Bit values tht don't form valid utf-16 encoded code points (I hope I got the terminology correct).

That being said. Those explicit transformations should be as convenient as possible.

tim-weis · 2017-11-09T16:21:10Z

I'd make a strong point against implementing conversion constructors and operators. Beyond the performance implications there is the issue about correctness. While a wchar_t/std::wstring is implied to use UTF-16LE, there is no such convention for char/std::string. The latter could be ASCII, ANSI, UTF-8, Shift JIS, or any other character encoding (with ANSI being the most common).

Conversions must be explicit. Otherwise the ambiguity around char/std::string will make for subtle bugs, when the compiler is silenced by providing implicit conversions, that may or may not do, what you want.

Besides, the "UTF-8 Everywhere" mantra is more dogmatic than convincing. UTF-8 is great, for information interchange (writing files to disk, sending data across a network, etc.). For a Windows application I have yet to see a convincing argument against using UTF-16 internally throughout.

4A696D · 2017-11-09T17:02:20Z

Have you guys actually read the UTF-8 Everywhere manifesto?

tim-weis · 2017-11-09T17:16:48Z

Yes. And it isn't very convincing. UTF-8 is great for data interchange. It isn't exactly well suited as an internal representation for text in a Windows application.

As for this specific issue, you need to explain, why assuming UTF-8 in a general purpose library is more important than allowing it to easily interface with legacy code that uses ANSI encoding.

MikeGitb · 2017-11-09T19:09:28Z

@4A696D: Yes I have and in any code that is portable I try to follow it (doing so in c++ is not always easy though as long as there is no standardized utf8 string). However, the fact of the matter is, that windows APIs (as well as Java and Qt for that matter) use mostly wchars / utf-16 and the roundtrip windows API string -> utf-8 -> windows API string is not efficient, not always correct (although that are probably mostly very specific corner cases or bugs) and often simply not necessary.

As I said. There definitely should be an easy way to do the conversion, but it should not be hidden.

MikeGitb · 2017-11-09T19:11:27Z

Also, we are coming pretty close to " If you have to do this all the time you should rethink your design" territory.

kennykerr · 2017-11-14T19:13:00Z

Our internal builds now have winrt::to_hstring for std::string_view to winrt::hstring conversion as well as winrt::to_string for std::wstring_view to std::string conversion. winrt::hstring is convertible to std::wstring_view. This should make it a lot more convenient to work with UTF-8 in C++/WinRT apps while remaining explicit.

4A696D · 2017-11-21T20:11:51Z

Thanks Kenny. However, this doesn't really offer cppwinrt users anything they couldn't already do for themselves. To avoid having to litter UTF-8 based code with endless to_hstring and to_string calls, one must modify cppwinrt itself, which is hardly ideal.

kennykerr · 2017-11-21T20:14:46Z

Right, the helpers are merely provided as a convenience. Feel free to use them or not.

brodycj mentioned this issue Nov 8, 2017

Use UTF-8 encoding for Windows? storesafe/cordova-sqlite-storage#652

Open

kennykerr closed this as completed Nov 21, 2017

jonwis mentioned this issue Jun 7, 2023

Document in the C++ stdin/stdout host that it's UTF-8 jonwis/app2app#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support UTF-8 Everywhere #269

Support UTF-8 Everywhere #269

4A696D commented Nov 7, 2017

kennykerr commented Nov 8, 2017

4A696D commented Nov 9, 2017

MikeGitb commented Nov 9, 2017

tim-weis commented Nov 9, 2017

4A696D commented Nov 9, 2017

tim-weis commented Nov 9, 2017

MikeGitb commented Nov 9, 2017

MikeGitb commented Nov 9, 2017

kennykerr commented Nov 14, 2017

4A696D commented Nov 21, 2017

kennykerr commented Nov 21, 2017

Support UTF-8 Everywhere #269

Support UTF-8 Everywhere #269

Comments

4A696D commented Nov 7, 2017

kennykerr commented Nov 8, 2017

4A696D commented Nov 9, 2017

MikeGitb commented Nov 9, 2017

tim-weis commented Nov 9, 2017

4A696D commented Nov 9, 2017

tim-weis commented Nov 9, 2017

MikeGitb commented Nov 9, 2017

MikeGitb commented Nov 9, 2017

kennykerr commented Nov 14, 2017

4A696D commented Nov 21, 2017

kennykerr commented Nov 21, 2017