Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallRye OpenAPI outputs Chinese garbled characters in Json #44569

Closed
MorganMaohong opened this issue Nov 19, 2024 · 40 comments · Fixed by #45993
Closed

SmallRye OpenAPI outputs Chinese garbled characters in Json #44569

MorganMaohong opened this issue Nov 19, 2024 · 40 comments · Fixed by #45993
Milestone

Comments

@MorganMaohong
Copy link

MorganMaohong commented Nov 19, 2024

Description

I downloaded a template project from the official website, added the SmallRey OpenAPI extension, added the @tag annotation to the Rest class, and used Chinese. When I accessed /q/openapi, the response was in garbled format. I guessed that the project was in UTF-8 format. Finally, I tried all the places where I could set the encoding format to UTF-8, but it still didn't match the Chinese display.
Implementation ideas
If you have any implementation ideas, they can go here, however please note that all design change proposals should be posted to the Quarkus developer mailing list (or the corresponding Google Group; see the decisions process document for more information. However, it is normal for the REST request response result to be displayed in Chinese.

@Path("/hello")
@Tag(name = "测试模块")
public class ExampleResource {
    private static final Logger log = LoggerFactory.getLogger(ExampleResource.class);
    @Inject
    EntityManager entityManager;

    @POST
    @Path("/1")
    @Produces(MediaType.TEXT_PLAIN)
    public String hello() {
        String s = new String("".getBytes(StandardCharsets.UTF_8));
        List<User> result = entityManager.createQuery("select e from User e", User.class).getResultList();
        log.info("{}", result);
        return "success哈哈哈哈";
    }
}

Implementation ideas

No response

Copy link

quarkus-bot bot commented Nov 19, 2024

/cc @EricWittmann (openapi), @Ladicek (smallrye), @MikeEdgar (openapi), @jmartisk (smallrye), @phillip-kruger (openapi,smallrye), @radcortez (smallrye)

@MorganMaohong
Copy link
Author

Image
Under normal circumstances, red Chinese characters will be displayed. If it is in utf-8 encoding format, this should not happen.

@gsmet
Copy link
Member

gsmet commented Nov 19, 2024

Can you have a look at the Content-Type HTTP header of your response?

@yuhaibohotmail
Copy link

yuhaibohotmail commented Nov 19, 2024

@MorganMaohong There is no problem in my environment.
@Tag(name = "CategoryService服务", description = "分类服务")

Image

@gsmet gsmet added the triage/needs-feedback We are waiting for feedback. label Nov 19, 2024
@MorganMaohong
Copy link
Author

MorganMaohong commented Nov 20, 2024

您能看一下Content-Type你的响应的 HTTP 标头吗?

Can you have a look at the Content-Type HTTP header of your response?

Image

I see that the character set of content-type is utf-8, but there are still garbled Chinese characters

@MorganMaohong
Copy link
Author

@MorganMaohong There is no problem in my environment. @Tag(name = "CategoryService服务", description = "分类服务")

Image

Thank you for your reply. Can you provide me with a template project with smallrye-openApi extension so that I can test it in my local environment? Thank you very much!

@yuhaibohotmail
Copy link

code-with-quarkus.zip

@MorganMaohong Here is a simple.

@MorganMaohong
Copy link
Author

@gsmet @yuhaibohotmail
My god, after I downloaded your project and ran it, I also reset the Maven environment, but Chinese characters were still displayed in garbled form. Finally, I changed the Windows system encoding format to UTF-8, and after restarting the computer and running the project, the display was correct!
Image
Image

@MorganMaohong
Copy link
Author

@gsmet This extension is very useful. I think it would be perfect if these configurations could be exposed and modified through configuration files.

@geoand
Copy link
Contributor

geoand commented Nov 22, 2024

Which extension are you referring to @MorganMaohong ?

@geoand geoand removed the triage/needs-feedback We are waiting for feedback. label Nov 22, 2024
@MorganMaohong
Copy link
Author

@geoand In this extension, when I use the @Tag and @Operation annotations, the content returned by /q/openapi is garbled in Chinese.


        <dependency>
            <groupId>io.quarkus</groupId>
            <artifactId>quarkus-smallrye-openapi</artifactId>
        </dependency>

@geoand
Copy link
Contributor

geoand commented Nov 22, 2024

But it sounds like it's an issue with your environment, no?

@MorganMaohong
Copy link
Author

@geoand Sorry, it is indeed my environment problem. The extension obtains the system encoding format. I would like to suggest a way to provide a configuration file to avoid this problem. It would be better if the extension uses utf-8 encoding by default.

@geoand
Copy link
Contributor

geoand commented Nov 22, 2024

Let's see what @MikeEdgar thinks about that

@MorganMaohong
Copy link
Author

Thank you so much

@MikeEdgar
Copy link
Contributor

@MorganMaohong , I'm curious if you run mvn package and check the resulting application jar file at META-INF/quarkus-generated-openapi-doc.json (or the YAML version) whether the characters are also garbled there. That will help pinpoint the step where the characters are incorrectly interpreted.

@MorganMaohong
Copy link
Author

@MikeEdgar,Thank you for reminding me. I think I understand why this happens. I checked the file in your way. The default UTF-8 encoding format in vs code is normal. When it is switched to Chinese GBK encoding format, it shows the same garbled characters as the /q/openapi interface. I downloaded the file in the dev interface and found that there was no suffix. After adding the suffix YAML or JSON, it was displayed as garbled characters. I guess that when the IO stream is written without setting the encoding format or the file without the suffix, the system default encoding format will be used. My computer is in the Chinese operating system environment, so it is GBK encoding format. So when I switched to Unicode encoding format, Chinese garbled characters will not appear.

The file found after mvn package

quarkus-generated-openapi-doc.JSON

OpenAPI file downloaded from the dev interface,GitHub cannot upload .yaml files, I changed it to .json

openapi.json

These are my speculations, please forgive me if there are any mistakes.

@MikeEdgar
Copy link
Contributor

I downloaded the file in the dev interface and found that there was no suffix. After adding the suffix YAML or JSON, it was displayed as garbled characters. I guess that when the IO stream is written without setting the encoding format or the file without the suffix, the system default encoding format will be used.

The response will use UTF-8 in the Content-type regardless of how the OpenAPI doc is requested (suffix or not, Accept header or not). I'm curious if you use curl or some other tool besides the browser to fetch /q/openapi whether you will see the correct data being written to disk or the terminal.

resp.headers().set("Content-Type", format.getMimeType() + ";charset=UTF-8");

@MorganMaohong
Copy link
Author

curl or some other tool besides the browser to fetch /q/openapi whether you will see the correct data being written to disk or the terminal.

@MikeEdgar,I switched the Windows language setting back to the Chinese environment, and then used curl and postman to do a test, but garbled characters still appeared. If I switched to unicode, no garbled characters would appear no matter what method I used.

using curl

Image

Using Postman

Image

Windows Unicode beta testing feature

Image

@MikeEdgar
Copy link
Contributor

@MorganMaohong does it happen with other endpoints in the application or just the OpenAPI response? For example, if you create a simple REST endpoint that returnes a byte[] that contains the same UTF-8 encoded strings, do they appear correct to your HTTP clients?

@MorganMaohong
Copy link
Author

@MikeEdgar According to what you said, I created a test REST request and it was normal. In addition, I used various REST tools to test and it was also normal.
There are no problems with other REST requests in the application, regardless of Get, Post or other request methods.

Image

@MikeEdgar
Copy link
Contributor

@MorganMaohong here's one more thing to try to understand where the problem is occurring. Create a META-INF/openapi.json that contains some properly-formatted Chinese characters. After running mvn package, look again at META-INF/quarkus-generated-openapi-doc.json. You previously confirmed that generated JSON was correct when using only annotations, but now we'll confirm if both annotations + other static file result in the problem. If so, I have an idea where the problem is.

@MorganMaohong
Copy link
Author

@MikeEdgar I set up to load the static openapi.json file, and I found that they merged, the key is that they are characters are normal

Image

Then mvn package merged the quarkus-generated-openapi-doc.json file /hello REST request was added after the merge, and the openapi version was changed from 3.0.3 to 3.0.1, which shows that the merge was correct
quarkus-generated-openapi-doc.JSON

@MikeEdgar
Copy link
Contributor

And just to confirm, you did that using the Chinese Windows environment, correct?

@MorganMaohong
Copy link
Author

@MikeEdgar Yes, I'm sure, I checked,GBK encoding is the default encoding for Chinese Windows systems

Image

@MikeEdgar
Copy link
Contributor

@MorganMaohong , since some of the characters do appear to be displaying properly, can you let me know in particular which character(s) are having the issue? Or, are some characters being turned into other (valid) Chinese characters, in addition to those being shown as invalid?

E.g., in your earlier example you had this in your "before" file:

"tags" : [ {
    "name" : "测试",
    "description" : "这是一个简单的测试"
} ],

And in the "after" file:

tags:
- name: 娴嬭瘯
  description: 杩欐槸涓�涓畝鍗曠殑娴嬭瘯

Should these have been the same? Even the tag name is a different length.

@MorganMaohong
Copy link
Author

@MikeEdgar All Chinese characters will have conversion errors, because Chinese characters usually occupy multiple characters in the encoding standard.

Should these have been the same?

They are different, and I can't even read the converted Chinese characters.

@MorganMaohong
Copy link
Author

MorganMaohong commented Nov 25, 2024

@MikeEdgar I suddenly had an idea and used Japanese as the system language, where Tag and description have the same meaning as Chinese, "Test" and "This is a simple test", and the encoding format error also occurred.

@Path("/hello")
@Tag(name = "テスト", description = "これは簡単なテストです")
public class GreetingResource {

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    @Operation(summary = "こんにちは世界")
    public String hello() {
        return "Hello RESTEasy";
    }
}
openapi: 3.0.3
info:
  title: code-with-quarkus API
  version: 1.0.0-SNAPSHOT
tags:
- name: 繝�繧ケ繝�
  description: 縺薙l縺ッ邁。蜊倥↑繝�繧ケ繝医〒縺�
paths:
  /hello:
    get:
      tags:
      - 繝�繧ケ繝�
      summary: 縺薙s縺ォ縺。縺ッ荳也阜
      description: Hello
      responses:
        "200":
          description: OK
          content:
            text/plain:
              schema:
                type: string

@MikeEdgar
Copy link
Contributor

@MorganMaohong have you tried using -Dfile.encoding=UTF-8 for the JVM?

@Niavana97
Copy link

I have the same question after upgrading the Quarkus version from 3.13.2 to 3.17.3.

@Niavana97
Copy link

@MorganMaohong have you tried using -Dfile.encoding=UTF-8 for the JVM?

Can't solve this problem.

@decha-n
Copy link

decha-n commented Jan 29, 2025

Hello,
We have the same problem generating the API Description with French characters.

We are in JDK17, the default charset under windows for us is windows-1252.

The problem no longer exists with a JDK higher than 17. Note that the default charset has been forced to UTF-8 since JDK18. It's a workaround for us.

On inspection of the code, the problem occurred with version 4.X.X of smallerye-open-api.

Class: io.smallrye.openapi.api.SmallRyeOpenAPI
Method : private <V, A extends V, O extends V, AB, OB> void addStaticModel(BuildContext<V, A, O, AB, OB> ctx, InputStream stream, String source, Format fileFormat) (line 588 in version 4.0.5)

InputStream is transformed into InputStreamReader without specifying charset. It therefore inherits Charset.defaultcharset.
The code is compiled in UTF-8 by maven and JVM on Windows not use the same (windows-1252)...

In version 3.X.X, JSON/YAML was delivered via an InputStream.

A fix could be to force the charset to UTF-8 and allow overloading via the properties file.

FYI, we can't force default charset via JVM parameters either.

@gsmet
Copy link
Member

gsmet commented Jan 29, 2025

@decha-n nice detective work. It certainly looks like something we need to fix in SmallRye OpenAPI.

Would you be willing to create a small PR, given you did all the work? If not I can do it.

Project is here: https://github.com/smallrye/smallrye-open-api/ .

@gsmet
Copy link
Member

gsmet commented Jan 29, 2025

From what I can see, we would need to fix it in a few other areas too:

https://github.com/search?q=repo%3Asmallrye%2Fsmallrye-open-api%20%20InputStreamReader&type=code

(at least the one in JsonIO as it's runtime code)

@gsmet
Copy link
Member

gsmet commented Jan 29, 2025

What I'm not sure of though is if we should enforce UTF-8 or if we need a config there.

@MikeEdgar would know better.

@decha-n
Copy link

decha-n commented Jan 29, 2025

I suggested forcing it to UTF-8 because I understand that this is already the case for properties files.

This would be sufficient for French accented characters. It is not potentially sufficient for Asian characters. The most flexible solution would be to make the charset parameterizable

@gsmet
Copy link
Member

gsmet commented Jan 29, 2025

I would recommend to do a first iteration using UTF-8 that we can easily backport.

And then we could improve on it.

But I will let @MikeEdgar chime in as he's the expert.

@MikeEdgar
Copy link
Contributor

I agree with using UTF-8 to start and enhancing later if necessary. Let me know if you'll be opening a PR @decha-n, otherwise I will do it. It looks like just two locations of non-test code need the encoding added (JsonIO and SmallRyeOpenAPI).

@decha-n
Copy link

decha-n commented Jan 29, 2025

I've only just seen your comment @MikeEdgar . I also see that the PR is already done .

Thanks for the quick analysis and implementation of this fix.

@MikeEdgar
Copy link
Contributor

I've only just seen your comment @MikeEdgar . I also see that the PR is already done .

Thanks for the quick analysis and implementation of this fix.

No problem, and thanks go to you for the analysis 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants