-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What happened to wstring support in open() APIs? #2637
Comments
I can confirm that digiKam 8.1.0 pre-release based on Exiv2 0.28 is broken under windows to read metadata from images. https://bugs.kde.org/show_bug.cgi?id=469372 |
The intended way to use this is to set UTF-8 as the code page. See: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page edit: oh I see. digiKam is compiled with MSVCRT instead of UCRT. I don't think that can work. ping @piponazo |
No, digiKam is cross compiled under Windows using MXE (MinGW env - https://github.com/mxe/mxe) since 10 years now. The MSVC build is just another way to build the application in the KDE infra, it's not the official production release at all, because MSVC is unsafe,slow, and under-productive platform to compile huge code as digiKam, but it's another story... Exiv2 team, please revert this commit to drop the UNICODE support under Windows. It's not at all a "dead" code... or explain well the work around. Note: I'm surprised to not see an unit-tests crying for the UTF16 long file path under Windows... |
Read what I said again. MSVCRT and UCRT |
@neheb @piponazo
I see that the current recommendation is to "set UTF-8 as the code page". It is a bit reckless to ask users of Exiv2 to do that. There are some problems with this approach. First of all, this can perfectly work in tests but there is a reality when you have 150+ external libraries in your project and neither of them mentions that it works flawlessly with relatively new Microsoft invention. There are also loads of your code that relies on assumption "it is not UTF-8 there but something else" you have to check to make sure nothing is broken by forcing UTF-8 code page. The "UTF-8 as the code page" feature is available only since Windows 1803. For Windows 10 operating systems prior to 1803 (10.0.17134.0), only static linking of UCRT is supported.. Windows 10 version prior 1803 are still supported in LTSC channel and static linking of UCRT is discouraged by Microsoft for security reasons. I don't see a nice way to use 0.28 across all Windows 10 versions. I'm sharing these insights in the spirit of maintaining the functionality and usability of Exiv2 across a wide range of scenarios. I hope we can discuss this further and consider the potential benefits of restoring the code. Thank you for your attention and consideration. |
@maksim-petukhov more reading here: #2090 |
@neheb I read it. It is still doesn't answer why @piponazo deleted code for |
And digiKam also depend on Exiv2 in-deep, and we don't have this kind of problem with UTF8 under Windows when it's cross compiled under Linux MinGW. Please revert this code deletion, as the library is not suitable under Windows when it's cross compiled. I second also maksim-petukhov about the topic : "Why to drop working code tested since a vrey long time ?" NOTE: recommend me to compile whole digiKam with MSVC under Windows is not a solution for digiKam at all... Gilles Caulier |
You have the MSYS2 UCRT64 option (the default now for MSYS2). I think Octave recently switched to UCRT builds also using MXE - @mmuetzel can you please advise the digiKam devs? |
AFAICT, it seems Fedora's MinGW cross toolchain can also target UCRT since F37: https://fedoraproject.org/wiki/Changes/F37MingwUCRT (e.g. https://src.fedoraproject.org/rpms/mingw-zlib/pull-request/3) Please get in touch with any other distro not supporting and ask for UCRT targets (cf. Arch, Debian, don't know about Ubuntu...) |
Just my 2 cents without getting a complete picture yet: Switching Windows to a UTF-8 locale only affects binaries that use the new Universal C Runtime (UCRT) instead of the older MS Visual C Runtime (MSVCRT). But even then, there are still quite a few applications that don't work when switching Windows to a UTF-8 locale. (IIRC, Notepad++ had some issues last time I checked.) So, the only way to reliably support non-ASCII characters in file names on Windows is to use their wide character API. I guess that is what is meant when people talk about |
Correct. However, we're discussing here about transitioning exiv2 clients to UCRT builds and UTF-8 locale only, as exiv2 in the latest version dropped the W API*. digiKam is a good example, as it uses MXE builds like Octave. * Why, and why now is a separate discussion, but let's be pragmatic for a moment and focus on the UCRT build transition, as it'll have to be done by people sooner or later anyway... |
I understand that this issue took a turn to become about linking to the UCRT. But to come back to the original issue: |
How so? The UTF-8 Anyhow, as I said, I'm not too keen on getting into why (feel free to join the prior issues and PRs linked above), but trying to offer workarounds. |
Btw, even more than one year ago this was already less than 0.2% of all Windows 10/11 machines (since 2018 >90% out of those 2.4% were already on 1803). (Windows 7 & 8.1 are not counted here as they are no longer supported by MS.) If you say a lot of enterprise machines are probably also not counted here, what could it be realistically then given the 3-5 year enterprise cycles? Still less than 1%, no? What percentage of those will ever use exiv2? The UCRT toolchains are there (both native and cross). Let's move on. |
If you are sure that you won't restore using the wide character Windows API, that's your decision. (Cross-)compiling MinGW and GCC that default to linking to the UCRT should only require adding |
wstring API removal was premature looks like. Can probably be added back in terms of std::filesystem. Debian and hence Ubuntu do not have a UCRT target: https://groups.google.com/g/linux.debian.bugs.dist/c/vU9KxW1RTLg Fedora has one but it's very barebones. |
Yeah messing around with std::filesystem leads me to believe the wstring API should come back. fs::path is an std::(w)string based on the platform. For Windows it requires calling .string(), which does not work with references as it's a temporary. |
I encountered this regression as well after upgrading to v0.28.0 and would like to note that Microsoft added UTF-8 code page on top of core Windows functionality, which remains to be UTF-16 (well, UCS-2, actually), so it will end up calling the same Also, I imagine processes using a specific code page would need to check the code page inherited from the console and report an error if it doesn't match - wouldn't this cause more problems, compared to just using wide characters as 0.26.x did and not worrying about code pages? It would be great if this breaking change was reverted until a usable alternative is found, whatever it is. You can even make it smaller if you use I will also note that since UTF-8 being the target here, keep in mind that in C++20 |
Requires small API change. See #2800 |
While it's a good change, I don't mean it as just an improvement, but rather as a way to reduce the amount of conditionally compiled code, which in 0.26.x included entire methods. Having As a side note, you may want to revisit by-value |
@gh-andre public API is kept at C++11 and private is C++17. The ref to value change was because it does not work under Windows. The alternative is to reintroduce const std::string& and const std::wstring&. |
Not sure what wasn't working on Windows. Maybe it is a C++11 issue, I cannot say. Definitely has no problem in C++17, so I was able to create a patch to handle opening wide-character file paths for Exiv2 v0.28.0. It will only work for files because the implementation isn't cleanly separated for various derived classes of That is, some things seem like they got worse over time, such as calling Initially, I took a stab at the whole thing, similar to how previous It will apply against the released Exiv2 0.28.0 source in |
If I am not wrong, this could be fixed with #2800? Is there anything blocking it from being merged? Thanks! |
This is an alternative approach to using the std::wstring versions of exiv2 methods that are not available in v0.28.x (see Exiv2/exiv2#2637). It should also allow non ascii paths to be used when libexiv2 has been compiled without wstring methods enabled. The proper approach is to set the Windows code page to utf-8, but this probably won't work on old Windows, and it's not clear how to do it for a Python application. Instead we transcode paths from utf-8 to the current code page, which could fail.
Note : digiKam is now also compiled with VCPKG and MSVC compiler under Windows and the problem is also present in native build of Exiv2. https://bugs.kde.org/show_bug.cgi?id=483743 Gilles Caulier |
@cgilles As noted in the darktable pull request and elsewhere, please do not use the .manifest directly (MSVC only) - if you wrap it in a .rc, it works for both MinGW and MSVC (you'll need one extra option for MSVC though...) |
Are you going to fix this bug? |
Hey folks! Is there any hope this issue will be resolved? Can we merge |
Has that PR even been tested? |
Hi
I upgraded exiv2 to 0.28.0 just to find that
std::wstring
support has been nuked out in 7933ff4 but I could not find a matching PR, issue or at least a discussion that would explain why and what to do now.std::string
just does not work in Windows for non-latin1 paths forExiv2::ImageFactory::open()
(Visual Studio compiler).What is the intended way to use it now?
These are all the old issues that reference the problem:
The text was updated successfully, but these errors were encountered: