Fix unicde support #69

gchp · 2014-12-29T18:36:46Z

Since #50 was merged, Unicode support is broken. @P1start mentioned in the comments that fixing this shouldn't be too involved.

suhr · 2015-02-16T14:12:01Z

There's actually two issues:

Showing buffer content as ISO8859-1 text (showing one character per byte?)
Misinterpreting unicode input as ASCII characters. For example, йцукенгшщзхъфывапролджэ\ячсмитьбю. is interpreted as 9FC:5=3HI7EJDK20?@>;46M\OGA<8BL1N..

crespyl · 2015-02-16T22:06:48Z

#95 (specifically crespyl@e643737) has some changes that should hopefully fix unicode rendering (it seems to work for the minimal cases in the buffer.rs tests section).
I'm not sure what to do about input; does termbox work with unicode in the first place, and might we need to fix rustbox?

gchp · 2015-02-23T12:39:19Z

From @crespyl on Gitter:

due to the nature of UTF-8, the nth char in a buffer is not necessarily at the nth byte
it should be possible to use something like self.chars().indices().take(n).last().map(|(byte_index, character)| byte_index) to correctly handle multi-byte characters

Related to cursor movement over multi-byte characters.

ghost · 2015-05-13T03:54:04Z

I've been messing around with trying to add unicode support, and it is turning out to be complicated. The biggest problem I have found is that termbox expects each cell to be a single codepoint, even though there sometimes needs to be multiple codepoints per cell. It probably wouldn't be too hard to modify termbox to store each cell as an array of chars rather than a single char, although it would take away some of the simplicity of the library. And, of course, UIBuffer would also have to do this as well.

I think that some problems could be solved by using iterators over cells (where 1 cell = 1 character width) rather than over bytes, chars, or graphemes. For example, an iterator yielding Option<&str> which, for each grapheme, yields the grapheme first and then yields None for each extra character-width the grapheme takes up.

I'm guessing it would be easiest to have Buffer be an abstraction layer for all the byte-level stuff and let every other part of the code deal in characters/graphemes. This would, of course, require heavy changes to the interface of Buffer... but so would changing the data structure backing it, which might inevitably happen anyways.

In summary, it seems like an implementation of unicode support could start from two places: termbox and Buffer.

Fixing the display of @suhr's example text wasn't too hard, but the fix shows why it is probably important not to make code outside of Buffer deal with data on the byte level.

[see spaghetti code here] [see screen shot here]

gchp mentioned this issue Jan 6, 2015

Word movement & deletion vi commands #76

Merged

gchp added enhancement bug labels Jan 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unicde support #69

Fix unicde support #69

gchp commented Dec 29, 2014

suhr commented Feb 16, 2015

crespyl commented Feb 16, 2015

gchp commented Feb 23, 2015

ghost commented May 13, 2015

Fix unicde support #69

Fix unicde support #69

Comments

gchp commented Dec 29, 2014

suhr commented Feb 16, 2015

crespyl commented Feb 16, 2015

gchp commented Feb 23, 2015

ghost commented May 13, 2015