-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support paths with UTF characters on Windows #681
Comments
Marking this as an enhancement since this is more of a general problem with MinGW/MSVCRT. From https://www.msys2.org/docs/environments/:
The easiest way to fix this (i.e. without modifying the Binutils and GCC themselves to use the Windows UTF-16LE API) would be to build the Windows Zephyr SDK binaries against the UCRT instead of the MSVCRT; but, this requires more investigation and discussion on the potential side effects. |
Could an alternative build linked with UCRT be provided? Currently, building Zephyr from a user directory name containing Unicode characters on Windows is broken due to this issue (among others), since absolute paths are used almost everywhere in Zephyr's build system. As for the GNU Arm Embedded toolchain, it seems like the new ARM GNU Toolchain might have fixed this problem. As a side note, in my case the issue is that GCC preprocessor produces files with include paths in an "ANSI" character set instead of UTF-8. When the path contains characters that can be converted from Unicode to the compatibility "ANSI" character set (e.g., |
I would like to point out that not supporting UNICODE in Windows API also leads to issues with path length restrictions. See https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation The functions that are affected by setting the registry value to override the length limitation are ONLY the wide versions, the narrow character versions still have the max length restriction of 260 and there is no way to change it. This has led to issues with path restrictions on Windows PCs that are not present on other systems because GCC can't open a file with a long path name. So I would say this is more akin to a bug than an Enhancement. |
Note that the path length limitation issue will not be fully solved even if we make the SDK use the Unicode functions, because there are other components in the build system (notoriously, Ninja) that does not support long paths due to the very same underlying problem. |
Right but as long as the SDK is in a path accessible by Ninja, it will have no problem launching GCC. The source is passed as a string, so it will still work if only the source is in a long path. Some of the files generated by the build system have very long paths due to being added as relative to the build directory. If GCC is able to accept such a path, then it would work in a lot of places. If there's an issue with CMake or Ninja, then maybe they should be built with UNICODE as well; but it shouldn't stop this from being supported here. |
Turns out the issue wasn't just about UCRT, but also from the way the command line arguments are passed to the main function of GCC as explained there https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865 . Additionally, in order for the build with UCRT to be successful, this commit is also required (also available in GCC 13.1): UCRT might not even be needed at all, not sure. Zephyr's GCC hasn't been ported to GCC 13.1 yet sadly, so I merged these commits on top of Zephyr's GCC fork for SDK v0.16.5-1: https://github.com/piernov/gcc/commits/fix/ucrt-utf8/ I then rebuilt a MinGW-W64 toolchain with UCRT and with win32 threads (instead of posix/winpthreads) on ArchLinux:
The next release of MinGW should default to UCRT: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/82b8edc101d7f8fefd44e84d2e24a6edd01901f9/ . However, sdk-ng's uses Ubuntu 20.04 packages, so it may take a very long time until with see a UCRT build used by default. I hope there is a way the process can be sped up. Finally I rebuilt Zephyr's SDK toolchain for ARM, and I obtained a GCC compiler that can take UTF-8 paths on the command line, and generates preprocessed files with UTF-8 paths as well. I uploaded my build there: https://github.com/piernov/sdk-ng/releases/download/v0.16.5-1-ucrt-utf8/toolchain_windows-x86_64_arm-zephyr-eabi.7z Lastly, in order to support running the toolchain from a path with Unicode characters in the Zephyr build system, I had to add |
For this particular issue, supporting UTF-8 encoded paths may fix the problem mentioned; but it still does not solve the issue of Windows API path length being different based on the setting of UNICODE, which affects the Windows API call (e.g., CreateFileW vs CreateFileA). If the arguments are passed in a separate file to GCC, then that would allow bypassing the path length restriction in a greater number of cases. |
Does UCRT solve that or not? Can you confirm your bug still exists in my build? |
Yes, it is technically a separate issue. I have solved it locally by building from a shorter path on my local filesystem, so it's not a blocker. Maybe a "nice-to-have". |
MinGW 12.0.0 is released and defaults to UCRT https://sourceforge.net/p/mingw-w64/mailman/message/58776404/ GCC 14.2 is also released https://lists.nongnu.org/archive/html/info-gnu/2024-08/msg00000.html so hopefully there'll be some progress on #740 . That said I'm not sure the build environment targeting Windows will be updated to use Ubuntu 24.10 MinGW packages, so it may still take quite a long time before we see an official SDK build for Windows with these fixes. |
@stephanosio thanks, I pushed a sdk-ng branch that uses the still in-development Ubuntu 24.10 image, and also backported some patches for gdb, gcc and crosstool-ng to build with the newer MinGW: I also pushed a mingw-12-act branch for sdk-ng that uses nektos/act to run the build locally. However, I haven't had the time to check that it works properly. In the meantime I was debugging a CMake issue ( https://gitlab.kitware.com/cmake/cmake/-/issues/26262 ) which basically means that a bunch of |
A more up-to-date version of MinGW-w64 toolchain, which uses UCRT by default, has been added to the sdk-build Docker image. This will be integrated as part of the Clang/LLVM toolchain support in Zephyr SDK planned for 0.18.0 release. For more details, see #830 (comment) |
@stephanosio thanks for your work. Until the porting to GCC 14 is done, would it be possible to backport the following patches to the GCC 12 branch as well? With this I think we should be close to having a UTF-8-compatible Zephyr toolchain on Windows. The CMake issue should be solved upstream and an updated package is available in Chocolatey, and the pull request for updating dtc has been merged. |
+1 from my side on this proposal by @piernov as well @stephanosio. |
Note that GNU Arm Embedded doesn't work either:
The text was updated successfully, but these errors were encountered: