You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sadly, the result reminds one of a dark era of the electronics industry, when lsusb reports Schöner USB-Stick.
A classic case of "written in Unicode, interpreted in latin1".
Interestingly, there is never any latin1 involved -- what happens is that the string enters C as UTF-8 string (as intended), but when it passes through _cpy_str_to_utf16 in usbus_control.c, each byte is emitted as the low half of a UTF-16 word (yes, that's what USB is using internally ... not so 21st century any more, but at least it's real UTF-16 and not UCS-2).
What I'd expect to happen
Yeah, if it were that easy I'd send a PR rather than a rant.
Options are:
Do nothing, and point out in the documentation that only ASCII is allowed.
Do nothing, and point out in the documentation that only latin1 is allowed (which, by construction of Unicode code points 128-255, also works -- but that's a character encoding I'd rather not have in 21st century documentation)
Don't copy over non-ASCII bytes (failing safe -- never mojibake, but it may go unnoticed)
Add Unicode support. It's not terribly much code (see the decode_code_point of https://gist.github.com/tylerneylon/9773800, and to support more than BMP, eg. 💾 inside a model name, needs another 3-or-so bit-shift-plus-addition lines and 4 rather than 2 calls to usb_control_slicer_put_char).
Description
USB is a protocol ready for the 21st century, so one might be tempted to use that by using friendly labels:
Sadly, the result reminds one of a dark era of the electronics industry, when lsusb reports
Schöner USB-Stick
.A classic case of "written in Unicode, interpreted in latin1".
Interestingly, there is never any latin1 involved -- what happens is that the string enters C as UTF-8 string (as intended), but when it passes through
_cpy_str_to_utf16
inusbus_control.c
, each byte is emitted as the low half of a UTF-16 word (yes, that's what USB is using internally ... not so 21st century any more, but at least it's real UTF-16 and not UCS-2).What I'd expect to happen
Yeah, if it were that easy I'd send a PR rather than a rant.
Options are:
decode_code_point
of https://gist.github.com/tylerneylon/9773800, and to support more than BMP, eg. 💾 inside a model name, needs another 3-or-so bit-shift-plus-addition lines and 4 rather than 2 calls to usb_control_slicer_put_char).@bergzand: preferences?
Quick copy-paste code
The text was updated successfully, but these errors were encountered: