Skip to content

Commit

Permalink
Parse UTF-16 surrogates pairs for calculating pattern's position (#11915
Browse files Browse the repository at this point in the history
)

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? --> 
## References
https://github.com/microsoft/terminal/blob/b88ffb21b0725331877ba76bac5a79a4c21eaa03/src/buffer/out/search.cpp#L335-L339

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #8709
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
use `Utf16Parser::Parse` to handle code points from U+010000 to U+10FFFF in UTF-16.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

![image](https://user-images.githubusercontent.com/1068203/145421736-c842c7d4-0136-42d0-ad72-f004f58d9e3b.png)

also the case by @mas90  #8709 (comment):

![image](https://user-images.githubusercontent.com/1068203/145420264-3fe220b4-42c5-44ac-aa94-4e604b164ed3.png)
  • Loading branch information
comzyh authored Dec 9, 2021
1 parent 509ecb1 commit a2d96d6
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions src/buffer/out/textBuffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include "../types/inc/utils.hpp"
#include "../types/inc/convert.hpp"
#include "../../types/inc/Utf16Parser.hpp"
#include "../../types/inc/GlyphWidth.hpp"

#pragma hdrstop
Expand Down Expand Up @@ -2585,16 +2586,17 @@ PointTree TextBuffer::GetPatterns(const size_t firstRow, const size_t lastRow) c
// match and the previous match, so we use the size of the prefix
// along with the size of the match to determine the locations
size_t prefixSize = 0;

for (const auto ch : i->prefix().str())
for (const std::vector<wchar_t> parsedGlyph : Utf16Parser::Parse(i->prefix().str()))
{
prefixSize += IsGlyphFullWidth(ch) ? 2 : 1;
const std::wstring_view glyph{ parsedGlyph.data(), parsedGlyph.size() };
prefixSize += IsGlyphFullWidth(glyph) ? 2 : 1;
}
const auto start = lenUpToThis + prefixSize;
size_t matchSize = 0;
for (const auto ch : i->str())
for (const std::vector<wchar_t> parsedGlyph : Utf16Parser::Parse(i->str()))
{
matchSize += IsGlyphFullWidth(ch) ? 2 : 1;
const std::wstring_view glyph{ parsedGlyph.data(), parsedGlyph.size() };
matchSize += IsGlyphFullWidth(glyph) ? 2 : 1;
}
const auto end = start + matchSize;
lenUpToThis = end;
Expand Down

0 comments on commit a2d96d6

Please sign in to comment.