-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character count component counts code points, not characters #1104
Comments
Another good article on the subject: https://blog.jonnew.com/posts/poo-dot-length-equals-two |
@36degrees this seems like it could end up being quite a serious bug if used in a service with multiple language support. If we can't fix it should it be documented? |
Seems like a robust solution is very code heavy which would not be suitable for clientside. I think Dave's suggestion of leaving it as is but documenting how it works would be the best way forwards... |
We noticed a similar issue in the character counts on the GOV.UK Notify service when sending non-English characters to the service. It turns out that Notify was counting bytes and not characters - this was fixed by by the team. |
MDN suggests that you can use the string iterator to count characters
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length Might be worth spiking.... |
It's been available for a short time in Chromium browsers (Chrome and Edge 87, Opera 73, Samsung 14) and Safari (14.1), but is not yet supported in Firefox. A potential issue with this is that it doesn't count new lines. New lines are registered as a code point, but are not considered graphemes as they are not "user-perceiveable" in the same way something like a space character is—they have a blank glyph and no width. Relatedly, do we need to be sure that service teams aren't using the character count to convey technical limitations? For example, if a database column can only support a maximum of 512 characters, then they do want to limit the input to 512 code points, not 512 graphemes. Would this need to be a configuration option? |
I'd suggest making it possible to pass a custom counting function – see #1364. When we do change the counting implementation we should treat it as a breaking change – and we might want to do #1364 first, so service teams can 'override back' to the current code point-based approach. |
Thanks to @querkmachine for linking me to this issue We were both thinking a recently spotted issue in Internet Explorer 8 is likely new lines being counted as two characters. With new lines either as Grapheme counting code examples look huge, but would be great to align client-/server-side counts. Having the "custom counting function" as a Promise would allow a Google ChromeShows "You have 23 characters too many" Internet Explorer |
Think we can let IE8 off here
In 2016 the HTML Standard switched Consensus wasn't found on characters, code points and grapheme clusters: Interesting that WebKit is sticking with grapheme clusters to avoid user confusion:
|
Here's a recent comment on the character count backlog issue: alphagov/govuk-design-system-backlog#67 (comment) Here the issue doesn't appear to be related to a specific browser, but rather that the frontend counts |
We've had a user report of this issue in production today - the frontend character count not matching the backend validation rule. It's deeply confusing for the end user and isn't a great look for our service when it appears it can't even count words consistently. |
We should make sure to benchmark its performance, especially on lower-powered devices and in some of the older browsers that include it. We may also need to look at reducing the number of times the count function is called. |
The character count currently uses
string.length
to establish the length of the user input.string.length
counts code units, not characters, and this can lead to some confusing results when using certain strings.You can see this by trying the following strings into the character component:
We should probably find a less naive way to count characters in strings, but we also need to work out how this will work with any backend validation or data storage on a service, which may already be using a different definition of a 'character' (for example, where the backend or storage treats one character as one byte).
Further reading:
The text was updated successfully, but these errors were encountered: