Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel fails to start if username contains accented characters on Windows #18293

Closed
LakatosI opened this issue May 3, 2023 · 1 comment
Closed
Labels
help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) platform: windows team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug

Comments

@LakatosI
Copy link

LakatosI commented May 3, 2023

Description of the bug:

All Bazel Windows executables fail during startup if the path to the user's home directory contains accented characters. The accented character in most cases comes from the username. Crashes occur either because Bazel can't create necessary files and folders, or because it can't load certain dynamic libraries because the accent is mishandled in the path.

Bazelisk is also affected.

Changing the default code page on Windows to use utf-8 fixes some of the early start-up crashes, but Bazel will eventually mishandle a path and fail. This can be done by going to the system's Language settings, selecting Administrative language settings, clicking Change system locale... and checking the Beta: Use Unicode UTF-8 for worldwide language support box and then restarting the system.

I tested this with the 7.0.0 prerelease build as well. This version also fails. However, some improvements have been made in this version of Bazel. With the code page change described above Bazel will finish launching, however, it still emits an error. See output section at the end of this document

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. Create a new user with an accented character in their username ("István" in my case).
  2. Download the latest version of Bazel (6.1.2 as of writing).
  3. Invoke the downloaded binary.
  4. Bazel will crash during startup. Error message says it wasn't able to create it's directories due to malformed path.

Which operating system are you running Bazel on?

Windows 10/11

What is the output of bazel info release?

WARNING: Invoking Bazel in batch mode since it is not invoked from within a workspace (below a directory having a WORKSPACE file). Error occurred during initialization of VM Unable to load native library:

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

N/A

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

Similar previous issue #3821

Any other information, logs, or outputs that you want to share?

Example invocations with errors:

C:\Users\István\Downloads>.\bazel-6.1.2-windows-x86_64.exe
FATAL: MakeDirectories(C:\Users\Istv�n\_bazel_Istv�n) failed: (error: 5): Access is denied.
C:\Users\István\Downloads>.\bazel-7.0.0-pre.20230420.2-windows-x86_64.exe
FATAL: MakeDirectories(C:\Users\Istv�n\_bazel_Istv�n) failed: (error: 5): Access is denied.

Example invocations running from an elevated prompt. This will create the incorrect paths from above, and will reveal other ways in which the path is mishandled.

C:\Users\István\Downloads>.\bazel-6.1.2-windows-x86_64.exe
WARNING: Invoking Bazel in batch mode since it is not invoked from within a workspace (below a directory having a WORKSPACE file).
Error: Unable to access jarfile C:\\Users\\Istv?n\\_bazel_Istv?n\\install\\e76193b64e56978bde876acece865a29\\A-server.jar

Invocations with experimental utf-8 support turned on. See bug description

C:\Users\István\Downloads>bazel-7.0.0-pre.20230420.2-windows-x86_64.exe
WARNING: Invoking Bazel in batch mode since it is not invoked from within a workspace (below a directory having a WORKSPACE file).
Extracting Bazel installation...
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
java.util.logging.ErrorManager: 4: Failed to open log file
java.io.FileNotFoundException: c:\users\istván\_bazel_istván\u5mzpase\java.log.windev2303eval.István.log.java.20230503-035546.5940 (The system cannot find the path specified)
        at java.base/java.io.FileOutputStream.open0(Native Method)
        at java.base/java.io.FileOutputStream.open(Unknown Source)
        at java.base/java.io.FileOutputStream.<init>(Unknown Source)
        at com.google.devtools.build.lib.util.SimpleLogHandler$Output.open(SimpleLogHandler.java:741)
        at com.google.devtools.build.lib.util.SimpleLogHandler.openOutputIfNeeded(SimpleLogHandler.java:828)
        at com.google.devtools.build.lib.util.SimpleLogHandler.publish(SimpleLogHandler.java:433)
        at java.logging/java.util.logging.Logger.log(Unknown Source)
        at com.google.common.flogger.backend.system.AbstractBackend.log(AbstractBackend.java:76)
        at com.google.common.flogger.backend.system.SimpleLoggerBackend.log(SimpleLoggerBackend.java:31)
        at com.google.common.flogger.AbstractLogger.write(AbstractLogger.java:137)
        at com.google.common.flogger.LogContext.logImpl(LogContext.java:566)
        at com.google.common.flogger.LogContext.log(LogContext.java:686)
        at com.google.devtools.build.lib.analysis.BlazeVersionInfo.logVersionInfo(BlazeVersionInfo.java:65)
        at com.google.devtools.build.lib.analysis.BlazeVersionInfo.setBuildInfo(BlazeVersionInfo.java:80)
        at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:94)
                                            [bazel release 7.0.0-pre.20230420.2]
C:\Users\István\Downloads>bazel-6.1.2-windows-x86_64.exe
WARNING: Invoking Bazel in batch mode since it is not invoked from within a workspace (below a directory having a WORKSPACE file).
Error occurred during initialization of VM
Unable to load native library:
@Pavank1992 Pavank1992 added platform: windows team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels May 3, 2023
@meteorcloudy
Copy link
Member

Maybe related to #18254

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) help wanted Someone outside the Bazel team could own this and removed untriaged labels May 9, 2023
copybara-service bot pushed a commit that referenced this issue Nov 5, 2024
This change patches the app manifest of the `java.exe` launcher in the embedded JDK to always use the UTF-8 codepage on Windows 1903 and later.

This is necessary because the launcher sets sun.jnu.encoding to the system code page, which by default is a legacy code page such as Cp1252 on Windows. This causes the JVM to be unable to interact with files whose paths contain Unicode characters not representable in the system code page, as well as command-line arguments and environment variables containing such characters.

The Windows VMs in CI are not running Windows 1903 or later yet, so this change can currently only be tested locally by running `bazel info character-encoding` and verifying that it prints `sun.jnu.encoding = UTF-8`.

Work towards #374
Work towards #18293
Work towards #23859

Closes #24172.

PiperOrigin-RevId: 693466466
Change-Id: I4914c21e846493a8880ac8c6f5e1afa9fae87366
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Nov 6, 2024
This change patches the app manifest of the `java.exe` launcher in the embedded JDK to always use the UTF-8 codepage on Windows 1903 and later.

This is necessary because the launcher sets sun.jnu.encoding to the system code page, which by default is a legacy code page such as Cp1252 on Windows. This causes the JVM to be unable to interact with files whose paths contain Unicode characters not representable in the system code page, as well as command-line arguments and environment variables containing such characters.

The Windows VMs in CI are not running Windows 1903 or later yet, so this change can currently only be tested locally by running `bazel info character-encoding` and verifying that it prints `sun.jnu.encoding = UTF-8`.

Work towards bazelbuild#374
Work towards bazelbuild#18293
Work towards bazelbuild#23859

Closes bazelbuild#24172.

PiperOrigin-RevId: 693466466
Change-Id: I4914c21e846493a8880ac8c6f5e1afa9fae87366
github-merge-queue bot pushed a commit that referenced this issue Nov 7, 2024
This change patches the app manifest of the `java.exe` launcher in the
embedded JDK to always use the UTF-8 codepage on Windows 1903 and later.

This is necessary because the launcher sets sun.jnu.encoding to the
system code page, which by default is a legacy code page such as Cp1252
on Windows. This causes the JVM to be unable to interact with files
whose paths contain Unicode characters not representable in the system
code page, as well as command-line arguments and environment variables
containing such characters.

The Windows VMs in CI are not running Windows 1903 or later yet, so this
change can currently only be tested locally by running `bazel info
character-encoding` and verifying that it prints `sun.jnu.encoding =
UTF-8`.

Work towards #374
Work towards #18293
Work towards #23859

Closes #24172.

PiperOrigin-RevId: 693466466
Change-Id: I4914c21e846493a8880ac8c6f5e1afa9fae87366

Commit
7bb8d2b

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Nov 8, 2024
Bazel aims to support arbitrary file system path encodings (even raw byte sequences) by attempting to force the JVM to use a Latin-1 locale for OS interactions. As a result, Bazel internally encodes `String`s as raw byte arrays with a Latin-1 coder and no encoding information. Whenever it interacts with encoding-aware APIs, this may require a reencoding of the `String` contents, depending on the OS and availability of a Latin-1 locale.

This PR introduces the concepts of *internal*, *Unicode*, and *platform* strings and adds dedicated optimized functions for converting between these three types (see the class comment on the new `StringEncoding` helper class for details). These functions are then used to standardize and fix conversion throughout the code base. As a result, a number of new end-to-end integration tests for the handling of Unicode in file paths, command-line arguments and environment variables now pass.

Full support for Unicode beyond the current active code page on Windows is left to a follow-up PR as it may require patching the embedded JDK.

* Replace ad-hoc conversion logic with the new consistent set of helper functions.
* Make more parts of the Bazel client's Windows implementation Unicode-aware. This also fixes the behavior of `SetEnv` on Windows, which previously would remove an environment variable if passed an empty value for it, which doesn't match the Unix behavior.
* Drop the `charset` parameter from all methods related to parameter files. The `ISO-8859-1` vs. `UTF-8` choice was flawed since Bazel's internal string representation doesn't maintain any encoding information - `ISO-8859-1` just meant "write out raw bytes", which is the only choice that matches what arguments would look like if passed on the command line.
* Convert server args to the internal string representation. The arguments for requests to the server were already converted to Bazel's internal string representation, which resulted in a mismatch between `--client_cwd` and `--workspace_directory` if the workspace path contains non-ASCII characters.
* Read the downloader config using Bazel's filesystem implementation.
* Make `MacOSXFsEventsDiffAwareness` UTF-8 aware. It previously used the `GetStringUTF` JNI method, which, despite its name, doesn't return the UTF-8 representation of a string, but modified CESU-8 (nobody ever wants this).
* Correctly reencode path strings for `LocalDiffAwareness`.
* Correctly reencode the value of `user.dir`.
* Correctly turn `ExecRequest` fields into strings for `ProcessBuilder` for `bazel --batch run`. This makes it possible to reenable the `test_consistent_command_line_encoding` test, fixing bazelbuild#1775.
* Fix encoding issues in `TargetCompleteEvents`.
* Fix encoding issues in `SubprocessFactory` implementations.
* Drop obsolete warning if `file.encoding` doesn't equal `ISO-8859-1` as file names are encoded with `sun.jnu.encoding` now.
* Consistently reencode internal strings passed into and out of `FileSystem` implementations, e.g. if reading a symlink target. Tests are added that verify the interaction between `FileSystem` implementations and the Java (N)IO APIs on Unicode file paths.

Fixes bazelbuild#1775.

Fixes bazelbuild#11602.

Fixes bazelbuild#18293.

Work towards #374.

Work towards bazelbuild#23859.

Closes bazelbuild#24010.

PiperOrigin-RevId: 694114597
Change-Id: I5bdcbc14a90dd1f0f34698aebcbd07cd2bde7a23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) platform: windows team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants