-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bazel fails to start if username contains accented characters on Windows #18293
Labels
help wanted
Someone outside the Bazel team could own this
P2
We'll consider working on this in future. (Assignee optional)
platform: windows
team-OSS
Issues for the Bazel OSS team: installation, release processBazel packaging, website
type: bug
Comments
Pavank1992
added
platform: windows
team-OSS
Issues for the Bazel OSS team: installation, release processBazel packaging, website
labels
May 3, 2023
Maybe related to #18254 |
meteorcloudy
added
P2
We'll consider working on this in future. (Assignee optional)
help wanted
Someone outside the Bazel team could own this
and removed
untriaged
labels
May 9, 2023
This was referenced Oct 28, 2024
copybara-service bot
pushed a commit
that referenced
this issue
Nov 5, 2024
This change patches the app manifest of the `java.exe` launcher in the embedded JDK to always use the UTF-8 codepage on Windows 1903 and later. This is necessary because the launcher sets sun.jnu.encoding to the system code page, which by default is a legacy code page such as Cp1252 on Windows. This causes the JVM to be unable to interact with files whose paths contain Unicode characters not representable in the system code page, as well as command-line arguments and environment variables containing such characters. The Windows VMs in CI are not running Windows 1903 or later yet, so this change can currently only be tested locally by running `bazel info character-encoding` and verifying that it prints `sun.jnu.encoding = UTF-8`. Work towards #374 Work towards #18293 Work towards #23859 Closes #24172. PiperOrigin-RevId: 693466466 Change-Id: I4914c21e846493a8880ac8c6f5e1afa9fae87366
bazel-io
pushed a commit
to bazel-io/bazel
that referenced
this issue
Nov 6, 2024
This change patches the app manifest of the `java.exe` launcher in the embedded JDK to always use the UTF-8 codepage on Windows 1903 and later. This is necessary because the launcher sets sun.jnu.encoding to the system code page, which by default is a legacy code page such as Cp1252 on Windows. This causes the JVM to be unable to interact with files whose paths contain Unicode characters not representable in the system code page, as well as command-line arguments and environment variables containing such characters. The Windows VMs in CI are not running Windows 1903 or later yet, so this change can currently only be tested locally by running `bazel info character-encoding` and verifying that it prints `sun.jnu.encoding = UTF-8`. Work towards bazelbuild#374 Work towards bazelbuild#18293 Work towards bazelbuild#23859 Closes bazelbuild#24172. PiperOrigin-RevId: 693466466 Change-Id: I4914c21e846493a8880ac8c6f5e1afa9fae87366
github-merge-queue bot
pushed a commit
that referenced
this issue
Nov 7, 2024
This change patches the app manifest of the `java.exe` launcher in the embedded JDK to always use the UTF-8 codepage on Windows 1903 and later. This is necessary because the launcher sets sun.jnu.encoding to the system code page, which by default is a legacy code page such as Cp1252 on Windows. This causes the JVM to be unable to interact with files whose paths contain Unicode characters not representable in the system code page, as well as command-line arguments and environment variables containing such characters. The Windows VMs in CI are not running Windows 1903 or later yet, so this change can currently only be tested locally by running `bazel info character-encoding` and verifying that it prints `sun.jnu.encoding = UTF-8`. Work towards #374 Work towards #18293 Work towards #23859 Closes #24172. PiperOrigin-RevId: 693466466 Change-Id: I4914c21e846493a8880ac8c6f5e1afa9fae87366 Commit 7bb8d2b Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
iancha1992
pushed a commit
to iancha1992/bazel
that referenced
this issue
Nov 8, 2024
Bazel aims to support arbitrary file system path encodings (even raw byte sequences) by attempting to force the JVM to use a Latin-1 locale for OS interactions. As a result, Bazel internally encodes `String`s as raw byte arrays with a Latin-1 coder and no encoding information. Whenever it interacts with encoding-aware APIs, this may require a reencoding of the `String` contents, depending on the OS and availability of a Latin-1 locale. This PR introduces the concepts of *internal*, *Unicode*, and *platform* strings and adds dedicated optimized functions for converting between these three types (see the class comment on the new `StringEncoding` helper class for details). These functions are then used to standardize and fix conversion throughout the code base. As a result, a number of new end-to-end integration tests for the handling of Unicode in file paths, command-line arguments and environment variables now pass. Full support for Unicode beyond the current active code page on Windows is left to a follow-up PR as it may require patching the embedded JDK. * Replace ad-hoc conversion logic with the new consistent set of helper functions. * Make more parts of the Bazel client's Windows implementation Unicode-aware. This also fixes the behavior of `SetEnv` on Windows, which previously would remove an environment variable if passed an empty value for it, which doesn't match the Unix behavior. * Drop the `charset` parameter from all methods related to parameter files. The `ISO-8859-1` vs. `UTF-8` choice was flawed since Bazel's internal string representation doesn't maintain any encoding information - `ISO-8859-1` just meant "write out raw bytes", which is the only choice that matches what arguments would look like if passed on the command line. * Convert server args to the internal string representation. The arguments for requests to the server were already converted to Bazel's internal string representation, which resulted in a mismatch between `--client_cwd` and `--workspace_directory` if the workspace path contains non-ASCII characters. * Read the downloader config using Bazel's filesystem implementation. * Make `MacOSXFsEventsDiffAwareness` UTF-8 aware. It previously used the `GetStringUTF` JNI method, which, despite its name, doesn't return the UTF-8 representation of a string, but modified CESU-8 (nobody ever wants this). * Correctly reencode path strings for `LocalDiffAwareness`. * Correctly reencode the value of `user.dir`. * Correctly turn `ExecRequest` fields into strings for `ProcessBuilder` for `bazel --batch run`. This makes it possible to reenable the `test_consistent_command_line_encoding` test, fixing bazelbuild#1775. * Fix encoding issues in `TargetCompleteEvents`. * Fix encoding issues in `SubprocessFactory` implementations. * Drop obsolete warning if `file.encoding` doesn't equal `ISO-8859-1` as file names are encoded with `sun.jnu.encoding` now. * Consistently reencode internal strings passed into and out of `FileSystem` implementations, e.g. if reading a symlink target. Tests are added that verify the interaction between `FileSystem` implementations and the Java (N)IO APIs on Unicode file paths. Fixes bazelbuild#1775. Fixes bazelbuild#11602. Fixes bazelbuild#18293. Work towards #374. Work towards bazelbuild#23859. Closes bazelbuild#24010. PiperOrigin-RevId: 694114597 Change-Id: I5bdcbc14a90dd1f0f34698aebcbd07cd2bde7a23
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
help wanted
Someone outside the Bazel team could own this
P2
We'll consider working on this in future. (Assignee optional)
platform: windows
team-OSS
Issues for the Bazel OSS team: installation, release processBazel packaging, website
type: bug
Description of the bug:
All Bazel Windows executables fail during startup if the path to the user's home directory contains accented characters. The accented character in most cases comes from the username. Crashes occur either because Bazel can't create necessary files and folders, or because it can't load certain dynamic libraries because the accent is mishandled in the path.
Bazelisk is also affected.
Changing the default code page on Windows to use utf-8 fixes some of the early start-up crashes, but Bazel will eventually mishandle a path and fail. This can be done by going to the system's Language settings, selecting Administrative language settings, clicking Change system locale... and checking the Beta: Use Unicode UTF-8 for worldwide language support box and then restarting the system.
I tested this with the 7.0.0 prerelease build as well. This version also fails. However, some improvements have been made in this version of Bazel. With the code page change described above Bazel will finish launching, however, it still emits an error. See output section at the end of this document
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Which operating system are you running Bazel on?
Windows 10/11
What is the output of
bazel info release
?WARNING: Invoking Bazel in batch mode since it is not invoked from within a workspace (below a directory having a WORKSPACE file). Error occurred during initialization of VM Unable to load native library:
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.N/A
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?Have you found anything relevant by searching the web?
Similar previous issue #3821
Any other information, logs, or outputs that you want to share?
Example invocations with errors:
Example invocations running from an elevated prompt. This will create the incorrect paths from above, and will reveal other ways in which the path is mishandled.
Invocations with experimental utf-8 support turned on. See bug description
The text was updated successfully, but these errors were encountered: