Regression with display of some wide glyphs/DBCS unicode characters #4375

ramsay-vibes · 2020-01-28T07:31:00Z

Environment

Windows build number: [10.0.18363.592]
Windows Terminal version: [0.8.10261.0]

Steps to reproduce

From an Ubuntu shell cat the attached file (cat utf82.txt).

utf82.txt

Expected behavior

Actual behavior

Description

With WT 0.8.10091.0 and recent earlier versions, the expected behavior occurs. Since release of WT 0.8.10261.0, the actual behavior occurs. Although there are inconsistencies elsewhere, observe in particular that the Khmer, Lao and Thai languages do not display properly.

The text was updated successfully, but these errors were encountered:

DHowett-MSFT · 2020-01-28T19:31:11Z

@milizhang this looks like it could be a regression from #4081. I'm particularly concerned that during rendering we're now ignoring the calculated backing buffer size... this may even cause a recrudescence of #633...

milizhang · 2020-02-02T07:00:22Z

Yes, this is indeed an issue caused by #4081 . That change was designed to only handle one-to-n mapping between glyph and character, but in reality for many scripts it can be a n-to-m mapping.

As we dicussed before, with the current column buffer system it is simply not possible to get an ideal solution for this issue.

The solution I am leaning toward is to respect both glyph cluster and column width. This means we place the glyph cluster at the center of expected space, so that the glyph clusters can be kept together likely in a recognizable way (though things can still get bad in cases that fontScale is applied). The side effect, however, is that it could left whitespace around the cluster.

@encladeus @DHowett-MSFT I have a prototype for this solution. Please let me know if you think this is reasonable.

Without fontScale:

With fontScale:

Screenshot that matches original post:

DHowett-MSFT · 2020-02-02T07:07:44Z

I love it. I know we used to try to crush glyphs into either one or two columns when we shipped the May renderer changes. Perhaps this will make it work more generally?

DHowett-MSFT · 2020-02-02T07:08:26Z

As an aside: what font is that? It looks pretty true to the standard VGA font.

milizhang · 2020-02-02T07:28:24Z

If I understand what you are talking correctly, "crushing glyphs into columns" is exactly what code around Run::fontScale is trying to do. Though I think there's an arguably better way to implement the scaling, as far as we want the display to match the column grid some scaling is likely still needed.
By implementing support of glyph cluster, this change might reduce the usage of fontScale, but sadly it does not eliminate it (as you can see in the screenshots). I think only purposefully designed fonts plus correct column width assignment can land us there.

BTW The font is Fixedsys Excelsior, which is basically Fixedsys + programming ligature. You can find it at kika/fixedsys.

@DHowett-MSFT

## Summary of the Pull Request - Improves the correction of the scaling and spacing that is applied to glyphs if they are too large or too small for the number of columns that the text buffer is expecting ## References - Supersedes #4438 Co-authored-by: Mili (Yi) Zhang <milizhang@gmail.com> - Related to #4704 (#4731) ## PR Checklist * [x] Closes #696 * [x] Closes #4375 * [x] Closes #4708 * [x] Closes a crash that @DHowett-MSFT complained about with `"x" * ($Host.UI.RawUI.BufferSize.Width - 1) + "`u{241b}"` * [x] Eliminates an exception getting thrown with the U+1F3E0 emoji in `_CorrectGlyphRun` * [x] Corrects several graphical issues that occurred after #4731 was merged to master (weird repeats and splits of runs) * [x] I work here. * [x] Tested manually versus given scenarios. * [x] Documentation written into comments in the code. * [x] I'm a core contributor. ## Detailed Description of the Pull Request / Additional comments - The `_CorrectGlyphRun` function now walks through and uses the `_glyphClusters` map to determine the text span and glyph span for each cluster so it can be considered as a single unit for scaling. - The total number of columns expected across the entire cluster text/glyph unit is considered for the available spacing for drawing - The total glyph advances are summed to see how much space they will take - If more space than necessary to draw, all glyphs in the cluster are offset into the center and the extra space is padded onto the advance of the last glyph in the range. - If less space than necessary to draw, the entire cluster is marked for shrinking as a single unit by providing the initial text index and length (that is back-mapped out of the glyph run) up to the parent function so it can use the `_SetCurrentRun` and `_SplitCurrentRun` existing functions (which operate on text) to split the run into pieces and only scale the one glyph cluster, not things next to it as well. - The scale factor chosen for shrinking is now based on the proportion of the advances instead of going through some font math wizardry - The parent that calls the run splitting functions now checks to not attempt to split off text after the cluster if it's already at the end. This was @DHowett-MSFT's crash. - The split run function has been corrected to fix the `glyphStart` position of the back half (it failed to `+=` instead of `=` which resulted in duplicated text, sometimes). - Surrogate pair emoji were not allocating an appropriate number of `_textClusterColumns`. The constructor has been updated such that the trailing half of surrogate pairs gets a 0 column width (as the lead is marked appropriately by the `GetColumns()` function). This was the exception thrown. - The `_glyphScaleCorrections` array stored up over the calls to `_CorrectGlyphRun` now uses a struct `ScaleCorrection` as we're up to 3 values. - The `ScaleCorrection` values are named to clearly indicate they're in relation to the original text span, not the glyph spans. - The values that are used to construct `ScaleCorrection`s within `_CorrectGlyphRun` have been double checked and corrected to not accidentally use glyph index/counts when text index/counts are what's required. ## Validation Steps Performed - Tested the utf82.txt file from one of the linked bugs. Looked specifically at Burmese through Thai to ensure restoration (for the most part) of the behavior - Ensured that U+1f3e0 emoji (🏠) continues to draw correctly - Checked Fixedsys Excelsior font to ensure it's not shrinking the line with its ligatures - Checked ligatureness of Cascadia Code font - Checked combining characters U+0300-U+0304 with a capital A

ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jan 28, 2020

DHowett-MSFT added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. labels Jan 31, 2020

ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jan 31, 2020

DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jan 31, 2020

DHowett-MSFT modified the milestones: Terminal v1.0, Terminal v0.9 Jan 31, 2020

DHowett-MSFT changed the title ~~Regression with display of some multi-byte unicode characters~~ Regression with display of some wide glyphs/DBCS unicode characters Jan 31, 2020

milizhang mentioned this issue Feb 2, 2020

Adding primitive support for glyph clusters and n-to-m mapping. #4438

Closed

5 tasks

DHowett-MSFT closed this as completed in 04955a4 Feb 12, 2020

ghost added the Needs-Tag-Fix Doesn't match tag requirements label Feb 12, 2020

milizhang mentioned this issue Feb 24, 2020

Shaping and display of certain language is completely incorrect. #4708

Closed

miniksa mentioned this issue Feb 28, 2020

Improve glyph scaling correction #4747

Merged

10 tasks

ghost added In-PR This issue has a related PR Resolution-Fix-Committed Fix is checked in, but it might be 3-4 weeks until a release. and removed In-PR This issue has a related PR labels Feb 28, 2020

miniksa mentioned this issue Mar 2, 2020

RTL is worse after #4747 #4779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression with display of some wide glyphs/DBCS unicode characters #4375

Regression with display of some wide glyphs/DBCS unicode characters #4375

ramsay-vibes commented Jan 28, 2020 •

edited

Loading

DHowett-MSFT commented Jan 28, 2020 •

edited

Loading

milizhang commented Feb 2, 2020

DHowett-MSFT commented Feb 2, 2020

DHowett-MSFT commented Feb 2, 2020 •

edited

Loading

milizhang commented Feb 2, 2020 •

edited

Loading

Regression with display of some wide glyphs/DBCS unicode characters #4375

Regression with display of some wide glyphs/DBCS unicode characters #4375

Comments

ramsay-vibes commented Jan 28, 2020 • edited Loading

Environment

Steps to reproduce

Expected behavior

Actual behavior

Description

DHowett-MSFT commented Jan 28, 2020 • edited Loading

milizhang commented Feb 2, 2020

DHowett-MSFT commented Feb 2, 2020

DHowett-MSFT commented Feb 2, 2020 • edited Loading

milizhang commented Feb 2, 2020 • edited Loading

ramsay-vibes commented Jan 28, 2020 •

edited

Loading

DHowett-MSFT commented Jan 28, 2020 •

edited

Loading

DHowett-MSFT commented Feb 2, 2020 •

edited

Loading

milizhang commented Feb 2, 2020 •

edited

Loading