Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix column count issues with certain ligature. #4081

Merged
2 commits merged into from
Jan 21, 2020

Conversation

milizhang
Copy link
Contributor

@milizhang milizhang commented Dec 30, 2019

Summary of the Pull Request

This change tries to fix column size calculation when shaping return glyphs that represents multiple characters (e.g. ligature).

References

This should fix #696.

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

Currently, it seems like CustomTextLayout::_CorrectGlyphRun generally assumes that glyphs and characters have a 1:1 mapping relationship - which holds true for most trivial scenarios with basic western scripts, and also many, but unfortunately not all, monospace "programming" fonts with programming ligatures.

This change makes terminal correctly processes glyphs that represents multiple characters, by properly accumulating the column counts of all these characters together (which I believe is more close to what this code originally intended to do).

There are still many issues existing in both CustomTextLayout as well as the TextBuffer, and the correct solution to them will likely demand large-scale changes, at least at the scale of #3578. I wish small changes like this can serve as a stop gap solution while we take our time to work on the long-term right thing.

Validation Steps Performed

Builds and runs. Manual testing confirmed that it solves #696 with both LigConsalata and Fixedsys Excelsior.

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what this code snippet is actually doing. There's not really an explanation of the codepoints in question and what the font is saying/doing that is making this appear incorrect. Nor is there a description of what the code is correcting here. Given this is a tricky subject, this code snippet needs at the minimum about 500 words more of comments explaining what is going on here, why it's broken, why this fixes it, and why it may or may not be a long term solution.

I did some looking into where the Column count is coming from in the first place, and it's basically straight out of the narrow/wide indicator inside the text buffer. So the characters are correctly narrow, I suspect, but something about the ligating operation is putting it off. How exactly are you detecting the ligation and why is it only applicable in these fonts? Is it something about how these two fonts are authored or is it a natural property of ligatures? Also, why are these fonts subject to this problem, but other fonts with ligatures like our Cascadia Code do not have this problem?

I'd want to understand all of those things before going further with this fix.

I do like things that provide incremental improvement and stifle the pain before a comprehensive solution. Do not get me wrong. I just want to make sure that we're not covering this with a bandage that will make it more complicated to solve holistically later and that will be difficult to understand for whomever does come through later with a more comprehensive solution.

src/renderer/dx/CustomTextLayout.cpp Outdated Show resolved Hide resolved
src/renderer/dx/CustomTextLayout.cpp Outdated Show resolved Hide resolved
@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Dec 30, 2019
@milizhang
Copy link
Contributor Author

I am trying to explain things a little bit more verbosely here, so excuse me if you already know about anything I am talking about below.

  • IDWriteTextAnalyzer::GetGlyphs in DirectWrite (as well as any other font/shaping engine that supports things like GSUB) might return result that the number of glyph is less (or more) than the number of character codes.
  • Hence for many scenarios such as selection, a map from character codes to corresponding glyph is necessary.
    • To solve this issue, IDWriteTextAnalyzer::GetGlyphs included clusterMap as one of its outputs.
  • clusterMap is a map from character index to glyph index.
    • Which means the array index is the character index, and the array content is the corresponding glyph index of such character.
  • To my knowledge both _textClusterColumns and _glyphClusters are arranged on character index, not glyph index.
    • Therefore I believe would be incorrect to access them using i, which is glyph index in this context.
    • This is why charIndex is calculated using std::distance from the beginning of _glyphClusters.
  • The added code does the following:
    • Find the chatacter range in _glyphClusters that represents current run. (runStartIterator, runEndIterator)
    • Within the run range, find the character range that is associated with this glyph from _glyphClusters.
      • Note that since _glyphClusters has results of multiple runs stored together in the same array, on beginning of each run glyph index is resetted to 0. Hence i - run.glyphStart.
    • For each character representing the current glyph, find their column count from _textClusterColumns and add them together into columns.
  • I believe the reason why fonts like Cascadia Code work is because those fonts handles ligatures differently.
    • Instead of returning a glyph with multiple width, Cascadia Code returns multiple glyphs of equal advance width.
    • In example of "===", the first two glyphs are left blank. The last glyph has significant amount of negative left side bearing.
      • This negative LSB allows the glyph to visually occupy the space of all three glyph, while only report advance of one standard width.

Hope this helps. Please let me know if you need any other clarification.

@ghost ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Dec 31, 2019
@miniksa
Copy link
Member

miniksa commented Dec 31, 2019

I am trying to explain things a little bit more verbosely here, so excuse me if you already know about anything I am talking about below.

  • IDWriteTextAnalyzer::GetGlyphs in DirectWrite (as well as any other font/shaping engine that supports things like GSUB) might return result that the number of glyph is less (or more) than the number of character codes.

  • Hence for many scenarios such as selection, a map from character codes to corresponding glyph is necessary.

    • To solve this issue, IDWriteTextAnalyzer::GetGlyphs included clusterMap as one of its outputs.
  • clusterMap is a map from character index to glyph index.

    • Which means the array index is the character index, and the array content is the corresponding glyph index of such character.
  • To my knowledge both _textClusterColumns and _glyphClusters are arranged on character index, not glyph index.

    • Therefore I believe would be incorrect to access them using i, which is glyph index in this context.
    • This is why charIndex is calculated using std::distance from the beginning of _glyphClusters.
  • The added code does the following:

    • Find the chatacter range in _glyphClusters that represents current run. (runStartIterator, runEndIterator)

    • Within the run range, find the character range that is associated with this glyph from _glyphClusters.

      • Note that since _glyphClusters has results of multiple runs stored together in the same array, on beginning of each run glyph index is resetted to 0. Hence i - run.glyphStart.
    • For each character representing the current glyph, find their column count from _textClusterColumns and add them together into columns.

  • I believe the reason why fonts like Cascadia Code work is because those fonts handles ligatures differently.

    • Instead of returning a glyph with multiple width, Cascadia Code returns multiple glyphs of equal advance width.

    • In example of "===", the first two glyphs are left blank. The last glyph has significant amount of negative left side bearing.

      • This negative LSB allows the glyph to visually occupy the space of all three glyph, while only report advance of one standard width.

Hope this helps. Please let me know if you need any other clarification.

That's excellent. Can you please embed much of this as comments inside the code block so it can be referred to in the future, especially when someone gets the idea to further revise this code? Thank you!

@skyline75489
Copy link
Collaborator

I vaguely remembered I did something similar in #3578 and wrote a bunch of weird code. It definitely needs more comment for other people to fully understand the layout process.

@reli-msft
Copy link

There may be cases that a character is transformed into multiple glyphs, or even multiple characters converted into multiple glyphs (it may be uncommon for LGC but there may be cases that a console app outputs complex script text). Also, should test that RTL runs like Arabic won't crash the console.
image
image

@milizhang
Copy link
Contributor Author

@reli-msft I am well aware of those issues. However, I personally consider the ability to handle those things (especially BiDi) as part of "large scale right-things" instead of something that I can easily handle as part of this CL. Therefore I will only ensure things does not crash at this moment.

To handle multi-glyph and N-to-M mapping scenarios, I think the best solution is to properly introduce a concept of cluster. (If my memory is correct I think @skyline75489 did something similar in #3578?) It is also necessary to have better respect to advances and offsets coming from DWrite, in order to deal with things like diacritic glyphs.

It seems like constraints coming from legacy console APIs like ReadConsoleOutput could cause significant trouble with things like Bidi/RTL languages and non-BMP glyphs. Those issues do need very careful designs, as most likely we will need to have some systematic way to manage compatibility. Personally I think these are out of scope for this PR, and we probably should talk about it somewhere else.

@skyline75489
Copy link
Collaborator

@reli-msft provides several enlightening comments in #3546, which are the inspiration of #3578. I have a mixed feeling about it, considering I've opened #3458 which may even increase the complication of the layout problem.

@milizhang milizhang requested a review from miniksa January 9, 2020 09:52
@miniksa
Copy link
Member

miniksa commented Jan 14, 2020

@milizhang I have not forgotten about this. My son was sick on Friday so I stayed home with him and the daycare has been closed every work day since thanks to our area snow. I want to review this, but I need more than 75 seconds of uninterrupted brain power to reconcile what @skyline75489 has said with your change before approving, so it will probably be a few more days before I get to this... also known as whenever the daycare opens again and I can sit at my desk in peace and think. :)

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'm fine taking this to solve the immediate problem.

I think we'll probably have to do more consideration on #3458 anyway, @skyline75489. Last thing I saw, we were still concerned about some of the performance aspects and holding the PR anyway.

Also, I'm not certain that this fix is going to be the end-all-and-be-all of what we do with correcting glyph runs based on the mappings between characters, glyphs, and columns. It seems small enough right now to fix the immediate problem and I don't want to stand in the way of a bit of incremental progress any longer. With the comments embedded, we should be able to capture this additional requirement into whatever future solution we craft in this area.

I also don't see it hurting anything that currently works in the Terminal at this time.

@miniksa miniksa added the Needs-Second It's a PR that needs another sign-off label Jan 16, 2020
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm certainly okay ingesting this code change early in the cycle to give us more time to let it bake, in case it does end up breaking something. But the change looks pretty atomic and should make things better, so lets :shipit:

@zadjii-msft zadjii-msft added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues AutoMerge Marked for automatic merge by the bot when requirements are met Product-Terminal The new Windows Terminal. labels Jan 21, 2020
@ghost
Copy link

ghost commented Jan 21, 2020

Hello @zadjii-msft!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 027f122 into microsoft:master Jan 21, 2020
DHowett-MSFT pushed a commit that referenced this pull request Jan 24, 2020
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request
This change tries to fix column size calculation when shaping return glyphs that represents multiple characters (e.g. ligature).

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? -->
## References

This should fix #696.

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #xxx
* [X] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Requires documentation to be updated
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments

Currently, it seems like CustomTextLayout::_CorrectGlyphRun generally assumes that glyphs and characters have a 1:1 mapping relationship - which holds true for most trivial scenarios with basic western scripts, and also many, but unfortunately not all, monospace "programming" fonts with programming ligatures.

This change makes terminal correctly processes glyphs that represents multiple characters, by properly accumulating the column counts of all these characters together (which I believe is more close to what this code originally intended to do).

There are still many issues existing in both CustomTextLayout as well as the TextBuffer, and the correct solution to them will likely demand large-scale changes, at least at the scale of #3578. I wish small changes like this can serve as a stop gap solution while we take our time to work on the long-term right thing.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

Builds and runs. Manual testing confirmed that it solves #696 with both LigConsalata and Fixedsys Excelsior.

(cherry picked from commit 027f122)
@ghost
Copy link

ghost commented Jan 27, 2020

🎉Windows Terminal Preview v0.8.10261.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues AutoMerge Marked for automatic merge by the bot when requirements are met Needs-Second It's a PR that needs another sign-off Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unbroken runs containing ligatures occasionally cause font size flickering
5 participants