SmallRye OpenAPI outputs Chinese garbled characters in Json #44569

MorganMaohong · 2024-11-19T07:05:46Z

Description

I downloaded a template project from the official website, added the SmallRey OpenAPI extension, added the @tag annotation to the Rest class, and used Chinese. When I accessed /q/openapi, the response was in garbled format. I guessed that the project was in UTF-8 format. Finally, I tried all the places where I could set the encoding format to UTF-8, but it still didn't match the Chinese display.
Implementation ideas
If you have any implementation ideas, they can go here, however please note that all design change proposals should be posted to the Quarkus developer mailing list (or the corresponding Google Group; see the decisions process document for more information. However, it is normal for the REST request response result to be displayed in Chinese.

@Path("/hello")
@Tag(name = "测试模块")
public class ExampleResource {
    private static final Logger log = LoggerFactory.getLogger(ExampleResource.class);
    @Inject
    EntityManager entityManager;

    @POST
    @Path("/1")
    @Produces(MediaType.TEXT_PLAIN)
    public String hello() {
        String s = new String("".getBytes(StandardCharsets.UTF_8));
        List<User> result = entityManager.createQuery("select e from User e", User.class).getResultList();
        log.info("{}", result);
        return "success哈哈哈哈";
    }
}

Implementation ideas

No response

quarkus-bot · 2024-11-19T07:05:50Z

/cc @EricWittmann (openapi), @Ladicek (smallrye), @MikeEdgar (openapi), @jmartisk (smallrye), @phillip-kruger (openapi,smallrye), @radcortez (smallrye)

MorganMaohong · 2024-11-19T08:30:48Z

Under normal circumstances, red Chinese characters will be displayed. If it is in utf-8 encoding format, this should not happen.

gsmet · 2024-11-19T09:23:36Z

Can you have a look at the Content-Type HTTP header of your response?

yuhaibohotmail · 2024-11-19T13:46:23Z

@MorganMaohong There is no problem in my environment.
@Tag(name = "CategoryService服务", description = "分类服务")

MorganMaohong · 2024-11-20T00:39:08Z

您能看一下Content-Type你的响应的 HTTP 标头吗？

Can you have a look at the Content-Type HTTP header of your response?

I see that the character set of content-type is utf-8, but there are still garbled Chinese characters

MorganMaohong · 2024-11-20T00:42:58Z

@MorganMaohong There is no problem in my environment. @Tag(name = "CategoryService服务", description = "分类服务")

Thank you for your reply. Can you provide me with a template project with smallrye-openApi extension so that I can test it in my local environment? Thank you very much!

yuhaibohotmail · 2024-11-20T11:17:57Z

code-with-quarkus.zip

@MorganMaohong Here is a simple.

MorganMaohong · 2024-11-21T07:27:19Z

@gsmet @yuhaibohotmail
My god, after I downloaded your project and ran it, I also reset the Maven environment, but Chinese characters were still displayed in garbled form. Finally, I changed the Windows system encoding format to UTF-8, and after restarting the computer and running the project, the display was correct!

MorganMaohong · 2024-11-21T07:30:11Z

@gsmet This extension is very useful. I think it would be perfect if these configurations could be exposed and modified through configuration files.

geoand · 2024-11-22T07:09:53Z

Which extension are you referring to @MorganMaohong ?

MorganMaohong · 2024-11-22T07:17:29Z

@geoand In this extension, when I use the @Tag and @Operation annotations, the content returned by /q/openapi is garbled in Chinese.


        <dependency>
            <groupId>io.quarkus</groupId>
            <artifactId>quarkus-smallrye-openapi</artifactId>
        </dependency>

geoand · 2024-11-22T07:19:33Z

But it sounds like it's an issue with your environment, no?

MorganMaohong · 2024-11-22T07:30:43Z

@geoand Sorry, it is indeed my environment problem. The extension obtains the system encoding format. I would like to suggest a way to provide a configuration file to avoid this problem. It would be better if the extension uses utf-8 encoding by default.

geoand · 2024-11-22T07:32:03Z

Let's see what @MikeEdgar thinks about that

MorganMaohong · 2024-11-22T07:40:15Z

Thank you so much

MikeEdgar · 2024-11-22T13:27:21Z

@MorganMaohong , I'm curious if you run mvn package and check the resulting application jar file at META-INF/quarkus-generated-openapi-doc.json (or the YAML version) whether the characters are also garbled there. That will help pinpoint the step where the characters are incorrectly interpreted.

MorganMaohong · 2024-11-22T15:14:02Z

@MikeEdgar,Thank you for reminding me. I think I understand why this happens. I checked the file in your way. The default UTF-8 encoding format in vs code is normal. When it is switched to Chinese GBK encoding format, it shows the same garbled characters as the /q/openapi interface. I downloaded the file in the dev interface and found that there was no suffix. After adding the suffix YAML or JSON, it was displayed as garbled characters. I guess that when the IO stream is written without setting the encoding format or the file without the suffix, the system default encoding format will be used. My computer is in the Chinese operating system environment, so it is GBK encoding format. So when I switched to Unicode encoding format, Chinese garbled characters will not appear.

The file found after mvn package

quarkus-generated-openapi-doc.JSON

OpenAPI file downloaded from the dev interface,GitHub cannot upload .yaml files, I changed it to .json

openapi.json

These are my speculations, please forgive me if there are any mistakes.

MikeEdgar · 2024-11-22T16:08:08Z

I downloaded the file in the dev interface and found that there was no suffix. After adding the suffix YAML or JSON, it was displayed as garbled characters. I guess that when the IO stream is written without setting the encoding format or the file without the suffix, the system default encoding format will be used.

The response will use UTF-8 in the Content-type regardless of how the OpenAPI doc is requested (suffix or not, Accept header or not). I'm curious if you use curl or some other tool besides the browser to fetch /q/openapi whether you will see the correct data being written to disk or the terminal.

quarkus/extensions/smallrye-openapi/runtime/src/main/java/io/quarkus/smallrye/openapi/runtime/OpenApiHandler.java

Line 59 in 8df922d

resp.headers().set("Content-Type", format.getMimeType() + ";charset=UTF-8");

MorganMaohong · 2024-11-23T01:10:13Z

curl or some other tool besides the browser to fetch /q/openapi whether you will see the correct data being written to disk or the terminal.

@MikeEdgar,I switched the Windows language setting back to the Chinese environment, and then used curl and postman to do a test, but garbled characters still appeared. If I switched to unicode, no garbled characters would appear no matter what method I used.

using curl

Using Postman

Windows Unicode beta testing feature

MikeEdgar · 2024-11-23T13:45:30Z

@MorganMaohong does it happen with other endpoints in the application or just the OpenAPI response? For example, if you create a simple REST endpoint that returnes a byte[] that contains the same UTF-8 encoded strings, do they appear correct to your HTTP clients?

MorganMaohong · 2024-11-23T14:07:18Z

@MikeEdgar According to what you said, I created a test REST request and it was normal. In addition, I used various REST tools to test and it was also normal.
There are no problems with other REST requests in the application, regardless of Get, Post or other request methods.

MikeEdgar · 2024-11-23T14:20:43Z

@MorganMaohong here's one more thing to try to understand where the problem is occurring. Create a META-INF/openapi.json that contains some properly-formatted Chinese characters. After running mvn package, look again at META-INF/quarkus-generated-openapi-doc.json. You previously confirmed that generated JSON was correct when using only annotations, but now we'll confirm if both annotations + other static file result in the problem. If so, I have an idea where the problem is.

MorganMaohong · 2024-11-23T14:52:35Z

@MikeEdgar I set up to load the static openapi.json file, and I found that they merged, the key is that they are characters are normal

Then mvn package merged the quarkus-generated-openapi-doc.json file /hello REST request was added after the merge, and the openapi version was changed from 3.0.3 to 3.0.1, which shows that the merge was correct
quarkus-generated-openapi-doc.JSON

MikeEdgar · 2024-11-23T15:03:05Z

And just to confirm, you did that using the Chinese Windows environment, correct?

MorganMaohong · 2024-11-23T15:32:18Z

@MikeEdgar Yes, I'm sure, I checked,GBK encoding is the default encoding for Chinese Windows systems

MikeEdgar · 2024-11-25T13:09:37Z

@MorganMaohong , since some of the characters do appear to be displaying properly, can you let me know in particular which character(s) are having the issue? Or, are some characters being turned into other (valid) Chinese characters, in addition to those being shown as invalid?

E.g., in your earlier example you had this in your "before" file:

"tags" : [ {
    "name" : "测试",
    "description" : "这是一个简单的测试"
} ],

And in the "after" file:

tags:
- name: 娴嬭瘯
  description: 杩欐槸涓�涓畝鍗曠殑娴嬭瘯

Should these have been the same? Even the tag name is a different length.

MorganMaohong · 2024-11-25T13:44:09Z

@MikeEdgar All Chinese characters will have conversion errors, because Chinese characters usually occupy multiple characters in the encoding standard.

Should these have been the same?

They are different, and I can't even read the converted Chinese characters.

MorganMaohong · 2024-11-25T13:56:58Z

@MikeEdgar I suddenly had an idea and used Japanese as the system language, where Tag and description have the same meaning as Chinese, "Test" and "This is a simple test", and the encoding format error also occurred.

@Path("/hello")
@Tag(name = "テスト", description = "これは簡単なテストです")
public class GreetingResource {

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    @Operation(summary = "こんにちは世界")
    public String hello() {
        return "Hello RESTEasy";
    }
}

openapi: 3.0.3
info:
  title: code-with-quarkus API
  version: 1.0.0-SNAPSHOT
tags:
- name: 繝�繧ｹ繝�
  description: 縺薙ｌ縺ｯ邁｡蜊倥↑繝�繧ｹ繝医〒縺�
paths:
  /hello:
    get:
      tags:
      - 繝�繧ｹ繝�
      summary: 縺薙ｓ縺ｫ縺｡縺ｯ荳也阜
      description: Hello
      responses:
        "200":
          description: OK
          content:
            text/plain:
              schema:
                type: string

MikeEdgar · 2024-12-12T12:00:44Z

@MorganMaohong have you tried using -Dfile.encoding=UTF-8 for the JVM?

Niavana97 · 2024-12-13T13:38:23Z

I have the same question after upgrading the Quarkus version from 3.13.2 to 3.17.3.

Niavana97 · 2024-12-13T13:42:30Z

@MorganMaohong have you tried using -Dfile.encoding=UTF-8 for the JVM?

Can't solve this problem.

decha-n · 2025-01-29T09:15:32Z

Hello,
We have the same problem generating the API Description with French characters.

We are in JDK17, the default charset under windows for us is windows-1252.

The problem no longer exists with a JDK higher than 17. Note that the default charset has been forced to UTF-8 since JDK18. It's a workaround for us.

On inspection of the code, the problem occurred with version 4.X.X of smallerye-open-api.

Class: io.smallrye.openapi.api.SmallRyeOpenAPI
Method : private <V, A extends V, O extends V, AB, OB> void addStaticModel(BuildContext<V, A, O, AB, OB> ctx, InputStream stream, String source, Format fileFormat) (line 588 in version 4.0.5)

InputStream is transformed into InputStreamReader without specifying charset. It therefore inherits Charset.defaultcharset.
The code is compiled in UTF-8 by maven and JVM on Windows not use the same (windows-1252)...

In version 3.X.X, JSON/YAML was delivered via an InputStream.

A fix could be to force the charset to UTF-8 and allow overloading via the properties file.

FYI, we can't force default charset via JVM parameters either.

gsmet · 2025-01-29T09:22:03Z

@decha-n nice detective work. It certainly looks like something we need to fix in SmallRye OpenAPI.

Would you be willing to create a small PR, given you did all the work? If not I can do it.

Project is here: https://github.com/smallrye/smallrye-open-api/ .

gsmet · 2025-01-29T09:23:48Z

From what I can see, we would need to fix it in a few other areas too:

https://github.com/search?q=repo%3Asmallrye%2Fsmallrye-open-api%20%20InputStreamReader&type=code

(at least the one in JsonIO as it's runtime code)

gsmet · 2025-01-29T09:27:53Z

What I'm not sure of though is if we should enforce UTF-8 or if we need a config there.

@MikeEdgar would know better.

decha-n · 2025-01-29T09:34:46Z

I suggested forcing it to UTF-8 because I understand that this is already the case for properties files.

This would be sufficient for French accented characters. It is not potentially sufficient for Asian characters. The most flexible solution would be to make the charset parameterizable

gsmet · 2025-01-29T17:44:52Z

I would recommend to do a first iteration using UTF-8 that we can easily backport.

And then we could improve on it.

But I will let @MikeEdgar chime in as he's the expert.

MikeEdgar · 2025-01-29T17:49:43Z

I agree with using UTF-8 to start and enhancing later if necessary. Let me know if you'll be opening a PR @decha-n, otherwise I will do it. It looks like just two locations of non-test code need the encoding added (JsonIO and SmallRyeOpenAPI).

decha-n · 2025-01-29T19:06:14Z

I've only just seen your comment @MikeEdgar . I also see that the PR is already done .

Thanks for the quick analysis and implementation of this fix.

MikeEdgar · 2025-01-29T19:10:43Z

I've only just seen your comment @MikeEdgar . I also see that the PR is already done .

Thanks for the quick analysis and implementation of this fix.

No problem, and thanks go to you for the analysis 👍

MorganMaohong added the kind/enhancement New feature or request label Nov 19, 2024

quarkus-bot bot added area/openapi area/smallrye labels Nov 19, 2024

gsmet added the triage/needs-feedback We are waiting for feedback. label Nov 19, 2024

geoand removed the triage/needs-feedback We are waiting for feedback. label Nov 22, 2024

yuhaibohotmail mentioned this issue Dec 20, 2024

encoding problems in console of IntelliJ IDEA 2024 (Community Edition) #45229

Closed

MikeEdgar mentioned this issue Jan 29, 2025

fix: use UTF-8 for InputStreamReaders relying on platform encoding smallrye/smallrye-open-api#2182

Merged

MikeEdgar mentioned this issue Jan 31, 2025

Bump smallrye-open-api.version from 4.0.7 to 4.0.8 #45993

Merged

gsmet closed this as completed in #45993 Jan 31, 2025

quarkus-bot bot added this to the 3.19 - main milestone Jan 31, 2025

gsmet modified the milestones: 3.19 - main, 3.18.2 Feb 4, 2025

SmallRye OpenAPI outputs Chinese garbled characters in Json #44569

SmallRye OpenAPI outputs Chinese garbled characters in Json #44569

Comments

MorganMaohong commented Nov 19, 2024 • edited by geoand Loading

Description

Implementation ideas

quarkus-bot bot commented Nov 19, 2024

MorganMaohong commented Nov 19, 2024

gsmet commented Nov 19, 2024

yuhaibohotmail commented Nov 19, 2024 • edited Loading

MorganMaohong commented Nov 20, 2024 • edited Loading

MorganMaohong commented Nov 20, 2024

yuhaibohotmail commented Nov 20, 2024

MorganMaohong commented Nov 21, 2024

MorganMaohong commented Nov 21, 2024

geoand commented Nov 22, 2024

MorganMaohong commented Nov 22, 2024

geoand commented Nov 22, 2024

MorganMaohong commented Nov 22, 2024

geoand commented Nov 22, 2024

MorganMaohong commented Nov 22, 2024

MikeEdgar commented Nov 22, 2024

MorganMaohong commented Nov 22, 2024

MikeEdgar commented Nov 22, 2024

MorganMaohong commented Nov 23, 2024

using curl

Using Postman

Windows Unicode beta testing feature

MikeEdgar commented Nov 23, 2024

MorganMaohong commented Nov 23, 2024

MikeEdgar commented Nov 23, 2024

MorganMaohong commented Nov 23, 2024

MikeEdgar commented Nov 23, 2024

MorganMaohong commented Nov 23, 2024

MikeEdgar commented Nov 25, 2024

MorganMaohong commented Nov 25, 2024

MorganMaohong commented Nov 25, 2024 • edited Loading

MikeEdgar commented Dec 12, 2024

Niavana97 commented Dec 13, 2024

Niavana97 commented Dec 13, 2024

decha-n commented Jan 29, 2025

gsmet commented Jan 29, 2025

gsmet commented Jan 29, 2025

gsmet commented Jan 29, 2025

decha-n commented Jan 29, 2025

gsmet commented Jan 29, 2025

MikeEdgar commented Jan 29, 2025

decha-n commented Jan 29, 2025

MikeEdgar commented Jan 29, 2025

MorganMaohong commented Nov 19, 2024 •

edited by geoand

Loading

yuhaibohotmail commented Nov 19, 2024 •

edited

Loading

MorganMaohong commented Nov 20, 2024 •

edited

Loading

MorganMaohong commented Nov 25, 2024 •

edited

Loading