Why are control characters treated as zero-width for strings? #6

typesanitizer · 2018-10-11T02:35:15Z

From the docs:

fn width<'a>(&'a self) -> usize
Returns the string's displayed width in columns.
Control characters are treated as having zero width.

(Ignore '\0' for the points below as it has special treatment.)

This seems inconsistent with the behaviour for individual chars, where None is returned in case you have a control character. For consistency, I would expect (A) for a string, if any character has a width of None, the result should have width None XOR (B) control characters always have width Some(0).

IIUC, the second option hasn't been taken for consistency with wcwidth, which returns -1 for control characters. However, not taking the first option can lead to non-intuitive behaviour that can go by unnoticed.
E.g. if the code has LF/TAB/DEL in it, then you can get an answer that doesn't make much sense.

Moreover, this violates an embedding law that one might expect to hold: width(format!("{}", c)) == width(c) (because it doesn't even type-check).

What is the reasoning behind the current behaviour?

P.S. I'm not asking for the library's behaviour to be changed. I'm writing a Haskell implementation and ran into this while looking at the test cases. My library follows (A) because it seemed like the right choice, so I wanted to know why you didn't pick (A).

The text was updated successfully, but these errors were encountered:

jquast · 2023-12-17T16:30:08Z

What is the reasoning behind the current behaviour?

I may be able to answer this for you. From jquast/wcwidth#54 (comment)

I just want to also add that this cannot be fixed in the wcwidth() and wcswidth() functions, as they intend to exactly match function signature and behavior of the POSIX functions.

The reason that C0 and C1 control characters return -1, is that the intended application, a terminal emulator especially, should handle these characters in a stream and remove them from the string before passing on to wcswidth. Especially items like \n, \b, and \t. They become complicated, it depends on the current position of the cursor, and also terminal settings, for example \b can wrap to previous row if it is located at column 0, and the number of spaces incurred by '\t' are dependent on the tab stop setting and the current cursor position. C1 characters like '\x1b' may begin a terminal escape sequence, and that too should be processed before sending to wcswidth, etc.

mibac138 mentioned this issue Mar 23, 2020

Tabs mess up bar width calculation console-rs/indicatif#150

Closed

jlkiri mentioned this issue Oct 1, 2021

Weird tab character rendering zkat/miette#73

Closed

devashishdxt mentioned this issue May 15, 2022

Tabs in source text result in misaligned table borders devashishdxt/cli-table#25

Closed

kdheepak mentioned this issue Jan 27, 2024

Can't render tab character with Paragraph. ratatui/ratatui#876

Open

endbr64 mentioned this issue Nov 25, 2024

fix: replace tabs with spaces before rendering Data tangramdotdev/tangram#351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are control characters treated as zero-width for strings? #6

Why are control characters treated as zero-width for strings? #6

typesanitizer commented Oct 11, 2018 •

edited

Loading

jquast commented Dec 17, 2023

Why are control characters treated as zero-width for strings? #6

Why are control characters treated as zero-width for strings? #6

Comments

typesanitizer commented Oct 11, 2018 • edited Loading

jquast commented Dec 17, 2023

typesanitizer commented Oct 11, 2018 •

edited

Loading