-
-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD013: Unicode character width #564
Comments
In my mind, rules about line length are meant to ensure that everything fits on the screen or that line lengths are consistent when using a monospaced font. With that understanding, the current implementation of this rule seems correct: it counts the number of visible characters. Trying to base a rule like this on the underlying encoding/representation of the data does not seem generally useful. Something that complicates things is shown in your example: some characters may render wider then the default monospace character width. However, that is a rendering behavior that will vary by font, program, and operating system - and does not seem like it could be addressed by a rule. As such, I feel the current implementation is valid. |
Agree, but displayed line lengths are inconsistent when writing CJK text since majority of monospace CJK fonts are actually duospaced.
Probably I want an option, rather than changing a rule, to customize how to measure line length, based on either the number of characters or unicode width. |
Do you know if there is a RegExp character class to identify "wide" characters? |
Hmm, no. I searched for and found cjk-regex, which matches CJK characters but not all CJK characters are necessarily double width (e.g., single width |
This part of the Unicode spec seems relevant: As do these packages: However, I have a strict "no dependencies" rule and I don't see a clean way of referencing this data otherwise. |
Writing texts using non-ASCII characters in markdown would be daily practice worldwide. One, like me, would want to lint markdown files written in non-ASCII characters.
Regarding MD013, the current implementation of markdownlint seems checking character count of each line, based on regular expression. However, character width of unicode characters varies. For example, CJK characters often have double width.
Imagine that you configured MD013 as
and you have
example.md
:It is expected that both lines 4 and 7 are warned by MD013. However,
Related: psf/black#1197
The text was updated successfully, but these errors were encountered: