-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usb: allow UTF16LE characters in string descriptors #6649
Comments
Hello, As a note, Kconfiglib itself has no problem with UTF-8 in I could force the encoding to UTF-8 via All mainstream distributions sanely default to a UTF-8 locale (by setting the related environment variables), so I've never seen this issue until now. Having something like this in the
That's a good idea for Unicode support in general, as many other tools besides Python will use the encoding specified in the environment. A quickfix for Python 3 is to add PEP 538 is related, and specifically talks about Docker. Googling "docker utf-8" indicates that the ASCII default has caused a lot of pain for other tools and languages as well (Java, SQL, Ruby, ...). |
Hi @IvanSanchez , AFAICT this is not accurate: "This is currently blocked by ulfalizer/Kconfiglib#41 (AKA "UTF-8 support in Kconfiglib")." PR 41 is now abandoned in favour of #6716 If you are not observing any issues with UTF-8 in Kconfig comments, but are observing issues with UTF-8 in Kconfig values, then I would bet that the root cause is not related to PR 41 or #6716, but rather some other mechanism that hasn't been tested with UTF-8 yet. |
Sorry about the confusion about ulfalizer/Kconfiglib#41 - I haven't been following that very closely. Will try to make some tests in the upcoming weeks and see if there's something blocking this. |
Should work fine with a UTF-8 locale, including for string values. I've done some testing locally. Kconfiglib doesn't interpret text between quotes (except for escapes and the special values "n", "m", and "y"), so you get it for free from Python. |
Thanks for the analysis @ulfalizer. I can also report that Kconfiglib is behaving as expected, but another piece of our infrastructure is not dealing with UTF8 values well. |
Added a test for having UTF-8 characters in Kconfig values. This ensures that issue zephyrproject-rtos#6649 does not affect any supported platforms and that it does not re-appear in the future. Signed-off-by: Sebastian Bøe <sebastian.boe@nordicsemi.no>
Added a test for having UTF-8 characters in Kconfig values. This ensures that issue #6649 does not affect any supported platforms and that it does not re-appear in the future. Signed-off-by: Sebastian Bøe <sebastian.boe@nordicsemi.no>
Me and @IvanSanchez have been discussing this. Essentially, the user wants to be able to input strings through some build-time interface, He does NOT want to do runtime conversion, e.g. from UTF-8 to UTF-16-little-endian, for performance reasons. And he does NOT want to use the 'gen_inc_file' to include binary data in his application, because it is inconvenient/bad usability to be saving this UTF string in a dedicated file. I'm not sure what the cleanest solution would be. Perhaps Kconfig could have an "encoding" attribute for strings. |
One thing that gets a bit messy is that you might end up with Maybe escapes could be used ( If an |
That would depend on the semantics of the I'm worried that other implementations of Kconfig parsers might choke on "non-standard" option fields, though. But once again, this might be the path of least resistance. |
Hmm... I guess those double |
@IvanSanchez
Yeah, I'm not sure whether Kconfig is the best place to handle this either. |
Maybe some kind of CPP macro; ENCODING_CONVERT_FROM_UTF8_TO_UTF_16_LITTLE_ENDIAN(CONFIG_USB_DESCRIPTOR) which either does the conversion in CPP (If technically possible), or marks the string with an attribute, which is detected and processed in one of the linker passes. Or we do this in the Kconfig GUI frontend, have the GUI convert to the ascii equivalent and have the underlying tools just see weirdly formatted ascii. |
Though maybe this could work: Have You can then easily edit configuration files by hand. I could add that upstream if it turns out to be the nicest solution. Python 2/3 compatibility and Unicode is always a bit of a pain, but it shouldn't get that bad. |
I considered this approach with C macros, but the inability to make the preprocessor loop over characters of a constant string (plus my general lack of experience about C/C++) made me desist. I know If I've overlooked something there, please prove me wrong - I'd love to see a UTF re-encoding implementation with pure macros. |
I just saw #6762 - maybe that would help solve the encoding issue? |
@ulfalizer : Did merging #7296 resolve this issue? |
@SebastianBoe Kconfiglib itself never had a problem with non-ASCII characters. It now also defaults to UTF-8 on Python 3 (see ulfalizer/Kconfiglib@da40c01). |
Closing for now. Feel free to reopen. In retrospect, I think the best way to deal with obscure cases like this might be to require the bytes to be input individually or the like ("\x12\x73", etc.) |
Feature request:
I want to be able to write my name (
Iván
, withá
) as UTF-8 string in the Kconfig files as part ofCONFIG_USB_DEVICE_PRODUCT
, and have it properly converted into a UTF-16-littleendian string in the USB descriptor structures.This is currently blocked by ulfalizer/Kconfiglib#41 (AKA "UTF-8 support in Kconfiglib").This depends on #6716 (AKA "force UTF-8 encoding on Kconfig files")Note that right now the string descriptors are stored as ascii-7, and undergo a naive conversion into utf16le during the initialization of the USB subsystem, at
zephyr/subsys/usb/usb_descriptor.c
Line 644 in 7f28edc
u16 *
constants containing non-null-terminated UTF16LE-encoded strings, and using thoseu16 *
s in the descriptors.Relates to #4661.
The text was updated successfully, but these errors were encountered: