-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode chars doesn't work #193
Comments
Thanks for reporting. I think this is an encoding issue. The original IRC protocols (RFC 1459 and 2812) are ASCII-based so encoding of unicode characters is not specified. I may be wrong, but I suspect current servers do not care about encoding of user messages, for example if a user sends unicode characters encoded in, say, UTF-16, servers are happy to send those to receivers in the same encoding (is there even a way for servers to figure out encoding of messages, in general?), but because there isn't a standard encoding specified, a client that expects messages to be encoded in UTF-8 (e.g. tiny) will decode it incorrectly. I think this is precisely what's happening here (except the sender's encoding may not be UTF-16 but something else). It'd be good to know:
|
How to give you that ?
I don't have that problem with |
irssi makes an effort to detect what encoding was used, and re-encode the message. Weechat probably does too. That sort of approach is based on heuristics so is imperfect, and I'm not sure how complicated it might be to implement. |
@eoli3n how are you testing this in weechat and irssi? I'd like to use a similar setup to test tiny. |
Actually... on 2 hosts, without changing anything, it seems to work now... there's something i don't get here. |
I briefly looked at hexchat and irssi for how they decode incoming messages.
Implementing (2) is trivial as we already have the function in std. I think we should just do that. |
IRC protocol is ASCII-based and encoding of non-ASCII (e.g. unicode) characters is not specified. We expect UTF-8, but previously did not handle other cases correctly and unsafely generated UTF-8 strings from wire messages. This caused #194. We now remove all unchecked indexing and conversion to UTF-8 and use "lossy" conversion which generates a UTF-8 string even in the presence of invalid UTF-8 sequences. For invalid sequences 'U+FFFD REPLACEMENT CHARACTER' is generated. Fixes #194 See also discussion in #193.
I'm closing this as we don't have a reproducer, and @eoli3n mentioned above that they can't reproduce this. UTF-8-encoded characters always worked fine, for non-UTF-8 encodings, we previously did some unsafe stuff which I just fixed, in the worst case you should now see Note that we also assume that the terminal encoding is UTF-8 so if your terminal is not configured for that, changing that may fix the original problem you reported. @eoli3n please re-open if you have this problem again. |
At the bottom you can see that the same chars are working in my terminal.
The text was updated successfully, but these errors were encountered: