-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partially Addresses #2616. Support combining sequences that don't normalize #2932
Partially Addresses #2616. Support combining sequences that don't normalize #2932
Conversation
PR to this PR incoming with suggested fixes to Unit Tests... |
Updates unit tests to further decouple application from consoledriver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge #2934 and see my comment.
I don't think this fix does what you think it does.
@tig I already finished this. |
I don't think your test in TabView is working. Here's what it should look like: Here's what I see: Another test (in Unicode scenario): sb.Append ('e');
sb.Append ('\u0301');
sb.Append ('\u0301');
testlabel = new Label ($"Should be an e with two accents: {sb}") { X = 20, Y = Pos.Y (label), Width = Dim.Percent (50), CanFocus = true, HotKeySpecifier = new Rune ('&') };
Win.Add (testlabel); Like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is working. See my comments.
There is no problem on the AddRune because on the
Yes I believe is that. |
Do you have an objection to replacing the current code in this PR with the code I suggested above that removes the calls to normalize? (with modifications to deal with |
@tig did you already saw what it do? You are replacing the previous p
This is worst. |
@tig sorry but I don't have more head to handle with this. I think it's reasonable for now, attended to the difficult degree you added to this. |
I dove in deep this morning and I think I understand how all this should work much better. I composed a message on my home computer but didn't have a chance to finish my thoughts before I had to run. As soon as I get home I'll finish it. The net is: We can make combining marks work on both NetDriver and WindowsDriver, but the current design of |
@@ -175,11 +175,22 @@ public void AddRune (Rune rune) | |||
// Normalize to Form C (Canonical Composition) | |||
string normalized = combined.Normalize (NormalizationForm.FormC); | |||
|
|||
Contents [Row, Col - 1].Rune = (Rune)normalized [0]; ; | |||
if (Contents [Row, Col - 1].Rune != default && Contents [Row, Col - 1].Rune != (Rune)' ') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sufficient. Many, many, characters cannot be combined with a combining mark, not just ' '. Try, for example [
.
WT (and other platforms) have sophisticated algorithms for figuring out whether a combining mark can combine with a base character. I am trying to find more details on these to see what we can do.
You can easily test this in WT and PS with (Caskadia Cove Nerd Font):
In Unicode, not all base characters can be combined with combining characters. Whether a base character can be combined with a combining character depends on the rules defined in the Unicode Standard. The Unicode Consortium, the organization responsible for maintaining the Unicode Standard, specifies these rules.
Here are some general guidelines to determine if a base character can be combined with a combining character:
Character Composition Model: Unicode follows a character composition model, which means that some characters can be composed from a base character and one or more combining characters. This allows for a wide range of characters and diacritics to be represented in text.
Compatibility: Unicode defines compatibility characters and compatibility composition rules to ensure that combining characters work as expected with base characters. If a base character is defined as a "combining character target," it means that it can be combined with one or more combining characters.
Normalization Forms: Unicode defines normalization forms (NFC, NFD, NFKC, NFKD) to handle character composition and decomposition. NFC (Normalization Form C) is the most commonly used form for combining characters, and it ensures that text is represented in a composed form when possible.
Character Properties: Unicode assigns specific properties to each character, including whether it can be used as a base character and whether it can combine with combining characters. You can refer to the Unicode Character Database (UCD) to check the properties of individual characters.
Combining Class: Combining characters are assigned a "combining class" value, which determines their combining behavior. Base characters and combining characters are combined according to their combining class values.
Compatibility Decomposition: Some base characters can be combined using compatibility decomposition mappings, even if they don't have explicit combining characters. These mappings are defined in the Unicode Standard.
To determine whether a specific base character can be combined with a combining character, you can consult the Unicode Standard documentation, specifically the Unicode Character Database (UCD) and the Unicode Technical Reports related to normalization and composition. Additionally, Unicode-aware text processing libraries and programming languages often provide functions or methods for handling character composition and decomposition, making it easier to work with combined characters in text processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tig I only had to do that because of the CharacterMap
scenario using the space to separate the runes to avoid incorrect output. I don't have enough knowledge to do better.
Happy to. May be a few days though. I'm not seeing how this is relevant to combining sequences though. Can you explain? |
No it's not. Forget. |
However, it ALSO moves the cursor to the right for each CM. Thus Note, this is the same thing you see with If I In otherwords, internally AtlasEngine is holding each CM as a used column. AtlasEngine also, treats individual CMs ( Unless, I'm missing something, until AtlasEngine changes, on Windows, we cannot really support NON-NORMALIZED combining marks in a way that users would expect. Any glyph made up of a base char and n or more combining chars will use base.ColumnWidth() + n columns. I think for us to support combining marks correctly we need to do the following:
|
Forces non-normalized CMs to be ignored.
@tig why you write |
It was just an example of 3 CMs. |
I see. For have sure it could be normalized. You don't know if there is 3 different combining marks that can normalize on terminal? It would be better. |
You can pile on as many
|
Partially addresses (but does not fix) #2616
Pull Request checklist:
CTRL-K-D
to automatically reformat your files before committing.dotnet test
before commit///
style comments)