Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicde support #69

Open
gchp opened this issue Dec 29, 2014 · 4 comments
Open

Fix unicde support #69

gchp opened this issue Dec 29, 2014 · 4 comments

Comments

@gchp
Copy link
Owner

gchp commented Dec 29, 2014

Since #50 was merged, Unicode support is broken. @P1start mentioned in the comments that fixing this shouldn't be too involved.

@suhr
Copy link

suhr commented Feb 16, 2015

There's actually two issues:

  • Showing buffer content as ISO8859-1 text (showing one character per byte?)
  • Misinterpreting unicode input as ASCII characters. For example, йцукенгшщзхъфывапролджэ\ячсмитьбю. is interpreted as 9FC:5=3HI7EJDK20?@>;46M\OGA<8BL1N..

@crespyl
Copy link
Contributor

crespyl commented Feb 16, 2015

#95 (specifically crespyl@e643737) has some changes that should hopefully fix unicode rendering (it seems to work for the minimal cases in the buffer.rs tests section).
I'm not sure what to do about input; does termbox work with unicode in the first place, and might we need to fix rustbox?

@gchp
Copy link
Owner Author

gchp commented Feb 23, 2015

From @crespyl on Gitter:

due to the nature of UTF-8, the nth char in a buffer is not necessarily at the nth byte
it should be possible to use something like self.chars().indices().take(n).last().map(|(byte_index, character)| byte_index) to correctly handle multi-byte characters

Related to cursor movement over multi-byte characters.

@ghost
Copy link

ghost commented May 13, 2015

I've been messing around with trying to add unicode support, and it is turning out to be complicated. The biggest problem I have found is that termbox expects each cell to be a single codepoint, even though there sometimes needs to be multiple codepoints per cell. It probably wouldn't be too hard to modify termbox to store each cell as an array of chars rather than a single char, although it would take away some of the simplicity of the library. And, of course, UIBuffer would also have to do this as well.

I think that some problems could be solved by using iterators over cells (where 1 cell = 1 character width) rather than over bytes, chars, or graphemes. For example, an iterator yielding Option<&str> which, for each grapheme, yields the grapheme first and then yields None for each extra character-width the grapheme takes up.

I'm guessing it would be easiest to have Buffer be an abstraction layer for all the byte-level stuff and let every other part of the code deal in characters/graphemes. This would, of course, require heavy changes to the interface of Buffer... but so would changing the data structure backing it, which might inevitably happen anyways.

In summary, it seems like an implementation of unicode support could start from two places: termbox and Buffer.

Fixing the display of @suhr's example text wasn't too hard, but the fix shows why it is probably important not to make code outside of Buffer deal with data on the byte level.

[see spaghetti code here] [see screen shot here]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants