Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logback should default to JVM's default charset instead of ASCII #27230

Closed
gbaso opened this issue Jul 9, 2021 · 10 comments
Closed

Logback should default to JVM's default charset instead of ASCII #27230

gbaso opened this issue Jul 9, 2021 · 10 comments
Assignees
Labels
type: bug A general bug
Milestone

Comments

@gbaso
Copy link

gbaso commented Jul 9, 2021

Following #23827, default charset where defined for a base logback configuration in 5e26954, as ${CONSOLE_LOG_CHARSET:-default} and ${FILE_LOG_CHARSET:-default}.

Unfortunately, logback interprets that as Charset.forName("default"), which taking a look at sun.nio.cs.StandardCharsets$Aliases is hardcoded to return ASCII. I feel like Charset.defaultCharset() would be a more natural choice.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jul 9, 2021
@wilkinsona wilkinsona changed the title Logback shoud default on default JVM charset instead of ASCII Logback should default to JVM's default charset instead of ASCII Jul 16, 2021
@wilkinsona
Copy link
Member

Thanks for the report, @gbaso. Can you please provide a bit more information about when this has caused a problem? LoggingSystemProperties should set the FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET system properties so I'd like to understand exactly when you're seeing the fallback to default occurring.

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Jul 16, 2021
@gbaso
Copy link
Author

gbaso commented Jul 17, 2021

It seems I was mistaken, and my issue was actually related to openjdk/jdk#4733 (comment).

I do, however, still think that spring boot should take a more opinionated approach than defaulting to the "default" charset (which, as I said, is ASCII), using UTF-8 instead.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 17, 2021
@wilkinsona
Copy link
Member

@gbaso When the system properties are in effect, it should already default to UTF-8. To make sure that we fully understand the problem I'd like to know more about your situation and why that apparently wasn't happening. Can you please provide a minimal example that reproduces the problem that you're having?

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 19, 2021
@gbaso
Copy link
Author

gbaso commented Jul 20, 2021

@wilkinsona I apologize if I was not clear enough, spring boot is working correctly, it was a problem on my end (and in the mechanism the JVM sets the default charset).

However by reading the code, specifically https://github.com/spring-projects/spring-boot/blob/main/spring-boot-project/spring-boot/src/main/resources/org/springframework/boot/logging/logback/defaults.xml, it looks like if for some reasons FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET are unset, they default to the string default:

<property name="FILE_LOG_CHARSET" value="${FILE_LOG_CHARSET:-default}"/>

which is interpreted as Charset.forName("default") and is an alias for ASCII.

Given that even OpenJDK is moving to UTF-8 by default, I think we should replace
value="${FILE_LOG_CHARSET:-default}" with value="${FILE_LOG_CHARSET:-UTF-8}".

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 20, 2021
@wilkinsona
Copy link
Member

it looks like if for some reasons FILE_LOG_CHARSET and CONSOLE_LOG_CHARSET are unset, they default to the string default

Yes, that's right. What I'd like to understand is why, in your case, those system properties were unset. Can you please provide a minimal example that reproduces that behaviour?

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 20, 2021
@gbaso
Copy link
Author

gbaso commented Jul 21, 2021

What I'd like to understand is why, in your case, those system properties were unset

They weren't, I misattributed the source of my issue

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 21, 2021
@liuanxin
Copy link

liuanxin commented Oct 23, 2021

I am a developer from China, US-ASCII encoding will cause Chinese garbled, below is my code

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

$ java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-post-Debian-1deb10u1)
OpenJDK 64-Bit Server VM (build 11.0.11+9-post-Debian-1deb10u1, mixed mode, sharing)

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <!-- This problem occurs from 2.4.0 ~ 2.5.6, version 2.3.12.RELEASE is normal -->
        <version>2.5.6</version>
        <relativePath/>
    </parent>

    <groupId>org.example</groupId>
    <artifactId>test-charset</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <java.version>11</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
    </dependencies>
</project>

logback.xml

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <include resource="org/springframework/boot/logging/logback/defaults.xml" />
    <property name="CONSOLE_LOG_PATTERN" value="[%d] [%t\\(%logger\\) : %p] %class.%method\\(%file:%line\\)%n%m%n"/>
    <!--<property name="CONSOLE_LOG_CHARSET" value="UTF-8"/>-->
    <include resource="org/springframework/boot/logging/logback/console-appender.xml" />

    <root level="debug">
        <appender-ref ref="CONSOLE"/>
    </root>
</configuration>

Test

import lombok.extern.slf4j.Slf4j;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@Slf4j
public class Nil {

    private static final Logger LOG = LoggerFactory.getLogger(Nil.class);

    public static void main(String[] args) {
        log.info("中文 abc");
        LOG.info("abc 中文");
    }
}

output

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:13)
?? abc

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:14)
abc ??

Delete logback.xml, or unpack the commented content <property name="CONSOLE_LOG_CHARSET" value="UTF-8"/>, the problem of garbled characters will disappear.

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:13)
中文 abc

[xxxx-xx-xx xx:xx:xx,xxx] [main(com.example.Nil) : INFO] com.example.Nil.main(Nil.java:14)
abc 中文

ch.qos.logback.core.encoder.LayoutWrappingEncoder#convertToByte, when the value of charset is US-ASCII, it causes garbled characters, normal if it is not set or set to UTF8

@wilkinsona
Copy link
Member

@liuanxin You aren't using Spring Boot's logging system as your main method isn't doing anything with Spring Boot. As a result, CONSOLE_LOG_CHARSET hasn't been set so ASCII is being used. This is the situation that @gbaso described above.

If you want to include Spring Boot's Logback configuration, you should make sure that you're also using Spring Boot's logging system.

@wilkinsona
Copy link
Member

Given that even OpenJDK is moving to UTF-8 by default, I think we should replace value="${FILE_LOG_CHARSET:-default}" with value="${FILE_LOG_CHARSET:-UTF-8}".

Thanks for the suggestion, @gbaso. The switch to UTF-8 is coming in Java 18. Spring Boot 3 will require Java 17 so it won't be until Spring Boot 4 when we raise the baseline beyond 17 that every supported version of Java defaults to UTF-8. As such, I think it's probably a bit early for us to make that switch. I'll flag this for team attention to see what everyone else thinks.

@wilkinsona wilkinsona added for: team-attention An issue we'd like other members of the team to review and removed status: feedback-provided Feedback has been provided labels Oct 25, 2021
@philwebb philwebb added type: bug A general bug and removed for: team-attention An issue we'd like other members of the team to review status: waiting-for-triage An issue we've not yet triaged labels Oct 27, 2021
@philwebb philwebb added this to the 2.4.x milestone Oct 27, 2021
@wilkinsona
Copy link
Member

While I don't think we can use Charset.defaultCharset().name() in our defaults.xml, I think we can simulate its behaviour. The JDK uses the file.encoding system property to determine the default charset and, just in case it's not set, it then falls back to UTF-8 as a last resort.

@wilkinsona wilkinsona self-assigned this Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

5 participants