Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use UTF-8 on Windows 10 Version 1903, fix #1195 #1915

Merged
merged 1 commit into from
Feb 23, 2021

Conversation

jhasse
Copy link
Collaborator

@jhasse jhasse commented Feb 17, 2021

Allows Ninja to use descriptions, filenames and environment variables with characters outside of the ANSI codepage on Windows. Build manifests are now UTF-8 by default (this change needs to be emphasized in the release notes).

WriteConsoleOutput doesn't support UTF-8, but it's deprecated on newer Windows 10 versions anyway (or as Microsoft likes to put it: "no longer a part of our ecosystem roadmap"). We'll use the VT100 sequence just as we do on Linux and macOS.

https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
https://docs.microsoft.com/en-us/windows/console/writeconsoleoutput
https://docs.microsoft.com/de-de/windows/console/console-virtual-terminal-sequences

Fixes #1195 (I hope).

Allows Ninja to use descriptions, filenames and environment variables
with characters outside of the ANSI codepage on Windows. Build manifests
are now UTF-8 by default (this change needs to be emphasized in the
release notes).

WriteConsoleOutput doesn't support UTF-8, but it's deprecated on newer
Windows 10 versions anyway (or as Microsoft likes to put it: "no longer
a part of our ecosystem roadmap"). We'll use the VT100 sequence just as
we do on Linux and macOS.

https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
https://docs.microsoft.com/en-us/windows/console/writeconsoleoutput
https://docs.microsoft.com/de-de/windows/console/console-virtual-terminal-sequences
@jhasse jhasse merged commit ec8de9c into ninja-build:master Feb 23, 2021
@jhasse jhasse deleted the windows-utf8 branch February 23, 2021 09:00
@tristanlabelle
Copy link

@jhasse I'm happy to see Ninja supporting Unicode, but I want to point out consequences to this change that should be taken into account:

This is a breaking change for ninja build files. Since Ninja feeds the raw bytes from those files into -A Win32 functions, this means that Ninja build files will now have to be encoded as utf-8, not ANSI (which is good, but a breaking change). In particular, cmake will have to be updated to not generate Ninja build files as ANSI.

There might also be repercussions with the "include prefix" feature. Ninja does binary comparison of the string coming from its build file and that coming from the cl.exe process output. Previously the first of these strings was ANSI and now it will be UTF-8, so this might cause mismatches depending on the encoding that cl.exe uses.

@jhasse
Copy link
Collaborator Author

jhasse commented Feb 23, 2021

Indeed, thanks for pointing that out.

Note that this will only result in issues when there are non-ASCII characters in the build manifest. Something that would have resulted in problems on Windows anyway so I doubt that many people relied on that. It will still be the first point in the release notes ;)

@bradking
Copy link
Contributor

I've opened CMake Issue 21866 for this, thanks.

@bradking
Copy link
Contributor

#1918 proposes an additional tool to help generators determine the correct encoding for build.ninja files.

@penagos
Copy link

penagos commented Aug 27, 2021

It appears this fix isn't providing full UTF-8 support on Windows 10 (build 19043). With a simple build.ninja file below:

cflags = -Wall
CC = gcc

rule cc
  command = $CC $cflags -c $in -o $out
build foo°.o: cc foo.c

And a source file, foo.c below:

int main(int argc, char** argv) { return 0; }

ninja incorrectly generates an intermediate .o file with the name foo°.o. build.ninja was saved using the UTF-8 encoding. Similarly, ninja does not seem to correctly handle input source files with UTF-8 filenames. If instead we use the following build.ninja file:

cflags = -Wall
CC = gcc

rule cc
  command = $CC $cflags -c $in -o $out
build foo.o: cc foo°.c

and rename the aforementioned C source file to foo°.c, ninja produces the build error:

ninja: error: 'foo┬░.c', needed by 'foo.o', missing and no known rule to make it

Note that the lexer used by ninja did not encounter issues when the build.ninja file had comments, environment variables or target names including UTF-8 characters. The two failure patterns above are reproducible on both cmd and Powershell. Is this change intended to support UTF-8 characters in filenames / filepaths or lay the groundwork for such future support (it wasn't immediately clear from the PR description)?

@jhasse
Copy link
Collaborator Author

jhasse commented Aug 27, 2021

Is this change intended to support UTF-8 characters in filenames / filepaths [...]?

Yes. Looks like you've found a bug.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request May 27, 2022
release 1.11

This release adds Validation Nodes which are a new way to add jobs like linters or static analyzers to the build graph. They are added using |@ and don't produce any outputs. You can read more about the motivation and the syntax here: ninja-build/ninja#1800

Another big change is that Ninja now uses UTF-8 on Windows. This means that while previous versions of Ninja used the local ANSI encoding it will now always use UTF-8 allowing filenames and output with special characters. For this to work you'll need Windows 10 Version 1903 or newer. And for the console output to show Unicode characters you'll need to set the codepage to 65001. More information at: ninja-build/ninja#1915

Note that this is a breaking change if you relied on non-ASCII characters of the local codepage! If you want to query Ninja what codepage it uses in your generator, call `ninja -t wincodepage` and act accordingly.

There are also two new tools:
missingdeps: ninja-build/ninja#1331
inputs: ninja-build/ninja#1730

And as it was often requested, ninja now has a --quiet flag :)

For a complete list of changes see https://github.com/ninja-build/ninja/milestone/3?closed=1
@loriab loriab mentioned this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode support on Windows
4 participants