forked from adrienverge/yamllint
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
decoder: Autodetect detect encoding of YAML files
Before this change, yamllint would open YAML files using open()’s default encoding. As long as UTF-8 mode isn’t enabled, open() defaults to using the system’s locale encoding [1][2]. Most of the time, the locale encoding on Linux systems is UTF-8 [3][4], but it doesn’t have to be [5]. Additionally, the locale encoding on Windows systems is the system’s ANSI code page [6]. As a result, you would have to either enable UTF-8 mode, give Python a custom manifest or enable a beta feature in Windows settings in order to lint UTF-8 YAML files on Windows [2][7]. Finally, using open()’s default encoding is a violation of the YAML spec. Chapter 5.2 says: “On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported. If a character stream begins with a byte order mark, the character encoding will be taken to be as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (x00) characters.” [8] This change fixes all of those problems by implementing the YAML spec’s character encoding detection algorithm. Now, as long as YAML files begins with either a byte order mark or an ASCII character, yamllint will automatically detect them as being UTF-8, UTF-16 or UTF-32. Other character encodings are not supported at the moment. Fixes adrienverge#218. Fixes adrienverge#238. Fixes adrienverge#347. [1]: <https://docs.python.org/3.12/library/functions.html#open> [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode> [3]: <https://sourceware.org/glibc/manual/html_node/Extended-Char-Intro.html> [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale> [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f> [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding> [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
- Loading branch information
1 parent
0770315
commit 30e43b3
Showing
7 changed files
with
457 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.