-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Web interface does not render composed unicode characters correctly #19913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you provide a demo on try.gitea.io? Then people can work on it directly. |
Thank you. I did a quick test on GitHub
GitHub doesn't report the warning. |
Yes, it always worked in github. |
Github is wrong to not report that there is something odd. ë and ë are not the same characters. The problems stems from Safari's rendering. The code looks like: <span class="n">s<span class="escaped-code-point" data-escaped="[U+0304]"><span class="char">̄</span></span>_b</span> The Now, how could we fix? Without having access to a Safari browser I'm not sure. Is there any way get Safari to just do the right thing with the spacing of the combining character here? We do this splitting because it makes writing the escaping/unescaping extremely easy for combining marks: gitea/modules/charset/escape.go Lines 166 to 179 in ac88f21
In order to not do it we'd have to coalesce bytes that can be combined together and emit an escape block for them together eg. <span class="n"><span class="escaped-code-point" data-escaped="s[U+0304]"><span class="char">s̄</span></span>_b</span> That coalescing would require us to write the escaper to understand the rules for rendering of combining characters and have the state for handling these. For example if we had: If it is possible to get Safari just render the combining character in the right place that would be deeply helpful instead. |
Hi @zeripath ! Thanks! Indeed, in Chrome it works fine. It shows the warning (which should be OK) but the symbol is displayed correctly. |
I guess we should coalesce these. |
ugh it's even harder because we run this on already rendered data. |
I don't think it's possible. Seems Safari's unicode composing does not work across tags, so all joined characters must be rendered as one continuous string, e.g. Also, separate issue: This is another false-positive for the Bidi warning, that code seems to be way too aggressive in its detection. |
It's not a BIDI warning. It's a hidden character warning. |
So it's definitely a Safari bug: https://jsfiddle.net/9j0zc4su/ Safari rendering: Firefox rendering: |
Okay, but there's nothing actually "hidden", is there? |
But, should |
I think composition should not be influenced by tag boundaries and other browsers seem to agree as well. |
Which of these are the same ëëëëëëëëëë? |
I'd call them "ambigous characters" and I question whether we should actually warn on them. Hidden is something different in my eyes, like zero width space 😉. |
This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a technique for detecting ambiguous characters and relaxes the detection of combining marks. Control characters are in addition detected as invisible in this implementation whereas they are not on monaco but this is related to font issues. Close go-gitea#19913 Signed-off-by: Andrew Thornton <art27@cantab.net>
This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a technique for detecting ambiguous characters and relaxes the detection of combining marks. Control characters are in addition detected as invisible in this implementation whereas they are not on monaco but this is related to font issues. Close #19913 Signed-off-by: Andrew Thornton <art27@cantab.net>
This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a technique for detecting ambiguous characters and relaxes the detection of combining marks. Control characters are in addition detected as invisible in this implementation whereas they are not on monaco but this is related to font issues. Close go-gitea#19913 Signed-off-by: Andrew Thornton <art27@cantab.net>
Description
All composed UTF-8 characters, like
s̄_b
,ṡ_b
, etc., are not rendered correctly in Gitea.Screenshots
See how
s̄_b
is being rendered. It even shows that there are a hidden unicode characters in this line.Gitea Version
1.16.8
Can you reproduce the bug on the Gitea demo site?
Yes
Operating System
macOS
Browser Version
Safari 15.4
The text was updated successfully, but these errors were encountered: