-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GR-52826] Non-ASCII characters in command line arguments are replaced by U+FFFD in Windows (native-image) #8593
Comments
Thanks a lot for bringing this to our attention, @ackasaber! Let me try to reproduce, debug, and fix this... |
Ok, so I'm having problems to reproduce this. I tried changing my system language but that did not seem to work. In a PowerShell, I can run this: > python -c "import sys; print(sys.argv)" Привет, мир
['-c', 'Привет', 'мир']
> java "-Dfile.encoding=UTF8" LogArguments Привет, мир
> cat .\arguments.txt
??????
??? With and without Do you have any idea what is going on? |
Ok, so I got this working after enabling the beta support for UTF-8, but also, the native executable seems to do the right thing (I added a simple Could you please provide us with more details how to reproduce the inconsistent behavior you're observing? |
Very nice! You've found THE dialog. I confirm that if UTF-8 is activated there, both classic launching and native-image build work consistently and correctly. However, the UTF-8 option is not the default and the majority of users don't know about this dialog. This dialog is a legacy of pre-Unicode days. As you see, it's intended for "non-Unicode programs". Java is Unicode, right? Therefore it shouldn't depend on this option! The default value in the dialog is set during the Windows installation according to the Windows localization. For Russian-localized Windows that I've got it's "Russian (Russia)". It means that legacy A-versions of Windows API functions that take or return strings encode them in Windows-1251 encoding. I intentionally didn't use anything that depends on I also tested running the sample with "English (USA)" for "Language for non-Unicode programs" and got |
Your Python piece by the way works well independent of the "Language for non-Unicode programs", at least in |
Thanks for the info, @ackasaber. I can finally reproduce the issue after changing the system locale (there are just too many ways to changes languages on Windows these days)....now looking into a fix. |
Ok, so I have discussed this internally: The JDK seems to convert arguments in their app launchers: https://github.com/openjdk/jdk/blob/700d2b91defd421a2818f53830c24f70d11ba4f6/src/jdk.jpackage/windows/native/common/WinSysInfo.cpp#L137 Instead of doing this, we can avoid the additional overhead (and potential for errors) by switching to wmain on Windows. This will also allow us to provide other features on Windows such as a javaw.exe like entry point that allows running an app without a command prompt. We currently have no ETA for this but we will update this ticket when we do. |
@CJ-Chen A nice workaround! But not a good general solution for GraalVM. I would be surprised if this bug gets a fix this year. It just can't be a priority: it's only in Windows while the majority of Java apps run on Linux + it's in command line parsing and the majority of Java apps don't do much of it. |
Apparently Microsoft does a U-turn with their encodings zoo and now promotes using UTF-8 for new applications using a dedicated manifest property. The introduced
The articles goes so far as to call "Win32 API [that] might only understand WCHAR" legacy. It's also possible to slap this manifest onto an existing exe. I'll try it in the meantime. See also a blog post from Raymond Chen about this feature. |
Describe the issue
It seems that the command line arguments aren't properly decoded when building via
native-image
in Windows. A simple one-liner that dumps the arguments into a text file demonstrates this.I observed this when trying to pass a Russian-localized "My Pictures" path to a command-line utility.
The issue doesn't happen when building the classic way.
Steps to reproduce the issue
LogArguments.java
source file.It dumps the command line arguments to a file
arguments.txt
in the current directory, one per line and as is (in UTF-16).javac LogArguments.java
native-image LogArguments
logarguments.exe
as followsarguments.txt
file.The hex dump verifies that all non-ASCII characters got replaced by the Unicode U+FFFD character:
Describe GraalVM and your environment:
I reproduced this with the
graalvm-community-openjdk-23+12.1
build.More details
I glanced over GraalVM sources and didn't find any
GetCommandLineW
mentions so there is a good chance the thing was never implemented in Windows andmain
arguments are used.The text was updated successfully, but these errors were encountered: