-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ICU for text search #15858
Use ICU for text search #15858
Conversation
This comment has been minimized.
This comment has been minimized.
template<typename T> | ||
T _adjustBackward(T column) const noexcept; | ||
template<typename T> | ||
T _adjustForward(T column) const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were made into templates so that I can use ptrdiff_t
indices with them.
src/buffer/out/search.cpp
Outdated
_needle(s_CreateNeedleFromString(str)), | ||
_renderData(renderData), | ||
_coordAnchor(s_GetInitialAnchor(renderData, direction)) | ||
// - reverse - True when searching backward/upwards in the buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After going through all the callers of Search
I found that just using bool
parameters is more ergonomic. None of the callers, not even conhost
, actually used the two enums. They just converted them back/forth.
I would suggest reviewing this file without the diff viewer. It's a mostly new implementation.
_fpDirectionDownAverage = 0; | ||
_fpMatchCaseAverage = 0; | ||
_uiFindNextClickedTotal = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems no one ever noticed that we passed reverse == true
(= upwards_search == true
) as the fDirectionDown
parameter. In other words, the telemetry for DirectionDownAverage
was broken. Since I feel like no one is using this telemetry anyways, I chose to remove it instead of fixing it.
src/types/UiaTextRangeBase.cpp
Outdated
auto searchAnchor = _start; | ||
if (searchBackward) | ||
const std::wstring_view queryText{ text, SysStringLen(text) }; | ||
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this code could use the Search
class, because Search
would stop searching once it found something. The new Search
class will go through the entire buffer however.
To fix that we could either add 2 parameters to Search
to specify a bounding range for the search or use TextBuffer
's function directly. I felt like the latter is a better option, because while the 2 for
loops below are verbose and somewhat difficult to understand initially, they're also easier to verify for their correctness IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlos-zamora I didn't quite understand the UiaTextRangeBase::FindText
implementation. Could you review that file for me? 🙂 Also, how do I best test that function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing 😊
As for how to test it on your own, you can do that using "Accessibility Insights":
- open A11y insights
- Click on the terminal
- Under "Patterns", Click "Explore..." button next to "TextPattern"
- A "TextPattern Explorer" window should appear
- Click the "..." button next to the "Document" range
- Click "Find..."
This new window should allow you to directly interact with the API much like any screen reader could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems Accessibility Insights is kinda buggy? I can't really tell how it works, but if you open the Find dialog and ever reach a "not found" situation, it'll continue to say "not found" forever.
After testing it, I think that my implementation works correctly. If I type "eeeee" the find dialog will find all "e"s no matter where they are or whether I use forwards or backwards search. So there don't seem to be any off-by-1 errors in the code.
However, there's one big difference to the old code: My new implementation returns a "not found" result when it has reached the end of the buffer. The old code continued to circle around the buffer if you reached the end. (This didn't always work for the backward search due to the start > _start
check. It's supposed to be start >= _start
, otherwise it can't find results that are at the very start of the buffer at 0,0
.)
Was that intentional? Should the UIA search circle around in the buffer when it has reached either end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this code could use the
Search
class, becauseSearch
would stop searching once it found something. The newSearch
class will go through the entire buffer however.
Won't this cause a decent performance hit? _getOptimizedBufferSize()
used to reduce the search space by ignoring anything past the mutable bottom. Now that we're searching through the whole buffer again, won't FindText('a', ...)
be slower. And if the user calls that function repeatedly, the user may have to wait quite a bit before being able to perform the next search? (FYI I haven't verified the performance question via a11y insights yet, so this could all be false)
It seems Accessibility Insights is kinda buggy? I can't really tell how it works, but if you open the Find dialog and ever reach a "not found" situation, it'll continue to say "not found" forever.
You could also use inspect.exe. It's the predecessor to a11y insights and can be found in the Windows SDK's \bin\<version>\<platform> folder. To interact with the API, do the following:
- open inspect.exe
- click on the terminal (the terminal should be highlighted in the UIA tree in inspect.exe)
- Click "Action" > "Text Pattern Explorer..." (a new "Text Explorer" window should appear)
- Click "Range" > "Find..."
Was that intentional? Should the UIA search circle around in the buffer when it has reached either end?
Interesting! I was originally going to say that we should keep the same behavior as before, but it looks like this experience is different from MS Word's! There were two notable differences with your implementation:
- circling the buffer: MS Word doesn't circle the text area whereas we do in Terminal (old and new impl). So, we probably shouldn't do that anymore.
- no more results: MS Word returns something that is interpreted by inspect.exe as "ProcessFind: FindText couldn't find matching text.". But Terminal (new impl) is interpreted as "ProcessFind: MoveEndpointByRange failed. Error -2147024809".
We should align these experiences (which is a bit annoying because your PR wasn't really intended to deal with any of this). If that's easy to do, it'd be best to do that as a part of this PR or in a follow-up before the next release. If not, I think we should keep the old experience, but I know that's not really possible anymore given that the Search
class is now completely different. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this code could use the
Search
class, becauseSearch
would stop searching once it found something. The newSearch
class will go through the entire buffer however.Won't this cause a decent performance hit?
_getOptimizedBufferSize()
used to reduce the search space by ignoring anything past the mutable bottom.
The new ICU-based searcher can search the entire text buffer in 4ms.
It also avoids searching any part of the text buffer that has not been "committed" to memory. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ICU also has usearch_previous
. We could use it for UIA specifically since it doesn't need the total number of results unlike WT (which is why it uses uregex
because it's way way faster when searching for all results). Just like uregex
it's very easy to use. I can make it fit. 🙂
And I'll try to make our UIA function behave like Word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, it seems that searching through a character-iterator is a C++ API feature and Windows only ships the C API. Well it's probably fine. I feel like this is not going to be a bottleneck no matter what we do.
Does it just so happen to work across wrapped lines now? @asklar literally reported that to me on Teams this weekend 😄 |
Oh wait, I think you meant a different problem, right? I think that's something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving notes before lunch
@@ -122,7 +124,7 @@ class TextBuffer final | |||
ROW& GetScratchpadRow(); | |||
ROW& GetScratchpadRow(const TextAttribute& attributes); | |||
const ROW& GetRowByOffset(til::CoordType index) const; | |||
ROW& GetRowByOffset(til::CoordType index); | |||
ROW& GetMutableRowByOffset(til::CoordType index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a ton of this diff is just this rename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. But it's a necessary change because many callers have a TextBuffer&
but only need a const ROW&
. (TextBufferCellIterator
for instance is widely used and does this.) 😟
src/buffer/out/search.cpp
Outdated
_results = textBuffer.SearchText(str, caseInsensitive); | ||
_mutationCount = textBuffer.GetMutationCount(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably, the caller has a write lock on the text buffer so the MutationCount can't change between the search and getting the count, but figured I'd mention it here. If there's a way to guarantee that, it might be worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could only assert that the lock is being held, but I think that this might be better as a separate PR that does this for all the parts that implicitly rely on the console lock being held. The mutation count could be made atomic, but this won't realistically improve anything, because you need to hold the lock anyways to initiate any kind of search (either to do a new search or to select stuff in the buffer).
src/buffer/out/Row.cpp
Outdated
const til::CoordType columnCount = _columnCount; | ||
const til::CoordType scale = _lineRendition != LineRendition::SingleWidth; | ||
const til::CoordType padding = _doubleBytePadded; | ||
return (columnCount - padding) >> scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised padding comes before un-scaling, but I think it makes sense? A double-byte padded character that doesn't fit on a double-width row didn't fit because it took up 4 columns.. hm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was really unsure in what order these need to be, and I still am a little.
Thinking about it some more, does it have to be like this?
const auto padding = _doubleBytePadded << scale;
return (columnCount - padding) >> scale;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the same as...
return (columnCount >> scale) - padding;
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite since the rounding is different. In your code it would subtract 2 no matter what. For instance, imagine padding is 2 and columnCount is 11. In my code snippet it would return 4 and with yours it would return 3. I think 4 is correct because 4*2+3 = 11 (3 being the max. wide glyph padding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
@@ -65,6 +65,28 @@ struct RowCopyTextFromState | |||
til::CoordType sourceColumnEnd = 0; // OUT | |||
}; | |||
|
|||
// This structure is basically an inverse of ROW::_charOffsets. If you have a pointer | |||
// into a ROW's text this class can tell you what cell that pointer belongs to. | |||
struct CharToColumnMapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nobody consumes this externally, do they? Can it be a forward declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to use this class in other parts of the code base in the future (the renderer primarily).
@@ -26,6 +26,7 @@ | |||
<ClCompile Include="..\precomp.cpp"> | |||
<PrecompiledHeader>Create</PrecompiledHeader> | |||
</ClCompile> | |||
<ClCompile Include="..\UTextAdapter.cpp" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update sources
! and probably the sources
for the UT too.z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
27/35
_searcherGoForward = goForward; | ||
_searcherCaseSensitive = caseSensitive; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you don't want to make these getters on Search
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about it some more, I'd like to do that a different time if anything. Other editors reset the search when there's any change whatsoever. Technically we shouldn't need to track these members - we should just reset the search whenever any field in the dialog is changed. I just haven't done the work for that.
src/buffer/out/textBuffer.hpp
Outdated
@@ -373,12 +372,15 @@ class TextBuffer final | |||
|
|||
TextAttribute _currentAttributes; | |||
til::CoordType _firstRow = 0; // indexes top row (not necessarily 0) | |||
uint64_t _mutationCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can fathom a textbuffer having more than 900 quin-zill-trillion mutations. Is this a count, or an opaque number? If it's an opaque number, which is not orderable, wraparound is implicitly OK.
If somebody is going to use >
on it, we're in trouble.
if (text.size() == 0) | ||
auto lock = _terminal->LockForWriting(); | ||
|
||
if (_searcher.IsStale() || _searcherText != text || _searcherGoForward != goForward || _searcherCaseSensitive != caseSensitive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. we need some way to tell the searcher that it is stale when we switch into the alt buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alt buffer will not have the same mutation count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a one in 900 quin-jillion-millio nchance that it will, but... fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll take that chance! 😄 I mean the worst-case scenario is that the search finds results that are stale.
@lhecker yes this sounds about right - will this PR address hyperlinks being broken up across lines (and launching to the wrong url)? if not, is there a bug tracking that already? |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it. The mutable row stuff is great, and Search is completely understandable by a single human now. Thank you.
I need @carlos-zamora to sign off on the UIA changes -- my biggest concern is searching backwards off a previous search (?) because of accidental off-by-one errors. I am not sure how that API is supposed to work.
Blocking until Carlos signs, but otherwise I am a ✔️
src/buffer/out/Row.cpp
Outdated
const til::CoordType columnCount = _columnCount; | ||
const til::CoordType scale = _lineRendition != LineRendition::SingleWidth; | ||
const til::CoordType padding = _doubleBytePadded; | ||
return (columnCount - padding) >> scale; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call.
// | ||
// @param ut the UText to get the length of. | ||
// @return the length, in the native units of the original text string. | ||
static int64_t U_CALLCONV utextNativeLength(UText* ut) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these functions return a failure if the underlying buffer was mutated since the utext adapter was created? Or is that not related to the purpose of these guys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that this would provide a false sense of security that people might end up relying on. Unlocking the console at all during a search is likely a bug or at least a bad idea because it's pretty error prone (not all state relevant to searching is contained in TextBuffer after all - selections for instance). I think we could make it an assertion? Not sure if it should be assert()
or WI_ASSERT()
. Probably the former, because searching without holding the lock is still somewhat predictable.
|
||
// An excerpt from the ICU documentation: | ||
// | ||
// Extract text from a UText into a UChar buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you complete this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I'll keep the code but remove it from the utextFuncs
struct and add a comment that the correctness of the function hasn't been verified yet. I don't think it's worth it to finish that function, because we don't need it right now. But I do think it's worth keeping because the code has already been written and is probably 99% correct (if not 100%). I initially thought I had to implement it so that uregex
can work, but it seems I was wrong about that. It uses the ut->chunkContents
directly (via UTEXT_NEXT32
and friends).
Basically same here. I think I've got like 3 files left but I didn't want to ✅ without him reading that (cause obviously no one knows how that works) |
src/types/UiaTextRangeBase.cpp
Outdated
auto searchAnchor = _start; | ||
if (searchBackward) | ||
const std::wstring_view queryText{ text, SysStringLen(text) }; | ||
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1); | |
const auto results = _pData->GetTextBuffer().SearchText(queryText, !!ignoreCase, _start.y, _end.y + 1); |
I think we do !!
generally. I'm not really attached to this one though. Idk if one is better than the other.
src/types/UiaTextRangeBase.cpp
Outdated
auto searchAnchor = _start; | ||
if (searchBackward) | ||
const std::wstring_view queryText{ text, SysStringLen(text) }; | ||
const auto results = _pData->GetTextBuffer().SearchText(queryText, ignoreCase != FALSE, _start.y, _end.y + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this code could use the
Search
class, becauseSearch
would stop searching once it found something. The newSearch
class will go through the entire buffer however.
Won't this cause a decent performance hit? _getOptimizedBufferSize()
used to reduce the search space by ignoring anything past the mutable bottom. Now that we're searching through the whole buffer again, won't FindText('a', ...)
be slower. And if the user calls that function repeatedly, the user may have to wait quite a bit before being able to perform the next search? (FYI I haven't verified the performance question via a11y insights yet, so this could all be false)
It seems Accessibility Insights is kinda buggy? I can't really tell how it works, but if you open the Find dialog and ever reach a "not found" situation, it'll continue to say "not found" forever.
You could also use inspect.exe. It's the predecessor to a11y insights and can be found in the Windows SDK's \bin\<version>\<platform> folder. To interact with the API, do the following:
- open inspect.exe
- click on the terminal (the terminal should be highlighted in the UIA tree in inspect.exe)
- Click "Action" > "Text Pattern Explorer..." (a new "Text Explorer" window should appear)
- Click "Range" > "Find..."
Was that intentional? Should the UIA search circle around in the buffer when it has reached either end?
Interesting! I was originally going to say that we should keep the same behavior as before, but it looks like this experience is different from MS Word's! There were two notable differences with your implementation:
- circling the buffer: MS Word doesn't circle the text area whereas we do in Terminal (old and new impl). So, we probably shouldn't do that anymore.
- no more results: MS Word returns something that is interpreted by inspect.exe as "ProcessFind: FindText couldn't find matching text.". But Terminal (new impl) is interpreted as "ProcessFind: MoveEndpointByRange failed. Error -2147024809".
We should align these experiences (which is a bit annoying because your PR wasn't really intended to deal with any of this). If that's easy to do, it'd be best to do that as a part of this PR or in a follow-up before the next release. If not, I think we should keep the old experience, but I know that's not really possible anymore given that the Search
class is now completely different. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm giving this a ✅ on the assumption that the UIA stuff is fine. I don't feel like any of these comments are blocking. And this makes searchv2 so much nicer
src/buffer/out/Row.cpp
Outdated
auto mapper = _createCharToColumnMapper(offset); | ||
return mapper.GetLeadingColumnAt(offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems weird to me that you need to pass the offset into the ctor (via _createCharToColumnMapper
), then immediately just call Get*ColumnAt(offset)
with the same offset. I don't think I'll block over that but it feels weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_createCharToColumnMapper
only uses the offset to guess the initial column. GetLeadingColumnAt
will then actually compute the proper offset. This approach avoids GetLeadingColumnAt
being O(n)
most of the time.
The reason the CharToColumnMapper
exists as a separate class is because I want to use it for DirectWrite to map the character positions that DirectWrite returns directly back to columns. Instead, right now, we got a really ugly and inefficient approach. You can see it if you search for bufferLineColumn
.
@@ -1348,6 +1345,97 @@ void Terminal::_updateUrlDetection() | |||
} | |||
} | |||
|
|||
// Interns URegularExpression instances so that they can be reused. This method is thread-safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I thinks I get what this does, but it might deserve a bit of a longer comment than this.
If I'm not mistaken, this basically creates a process-wide cache of unique_uregex
s. That way, if someone comes asking for a particular regex, (on any thread), we can hand back the pre-compiled version of it, if we've already seen it. There's also some affordance for flushing out old cache entries, if there's more than 128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird one. In the compiler/language runtime field, it pretty much means means "caching". You almost always only hear it in the context of strings (e.g. "string interning").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL i guess. Feel free to ignore me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I turned it into a small class so we can more easily extract it in the future. I left it in the same place though because I still feel like it's better to keep this relatively "new" code close to where we're going to add custom patterns for the first time soon.
const auto guard = shared.lock.lock_shared(); | ||
if (const auto it = shared.set.find(pattern); it != shared.set.end()) | ||
{ | ||
return ICU::unique_uregex{ uregex_clone(it->second.re.get(), &status) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to get std::move
'd on the way out? I always forget how exactly move semantics work
(and below too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returned objects are always being moved, but with much stronger guarantees than regular std::move
. return
will behave as if there was no function call at all. In other words, returning a std::string
by value is like constructing that string in the caller directly.
std::move
is basically just a T&
with extra syntactic sugar. So, if you return std::move(string);
it will actually be forced to call the string(string&&)
constructor on the way out, because you're not returning an object value anymore - you're returning a reference which now needs to be converted to a value. That's why one should never return std::move
.
The reason it can be seen anyways sometimes is because when you have code like this:
struct StringWrapper {
// implicit constructor!
StringWrapper(std::string&& str) {}
};
StringWrapper foobar() {
std::string str = "...";
// calls the implicit constructor
return std::move(str);
}
getFontCollection()
in FontCache.h
is an example for that.
This comment has been minimized.
This comment has been minimized.
// | ||
// An alternative approach would be to not make this method thread-safe and give each | ||
// Terminal instance its own cache. I'm not sure which approach would've been better. | ||
ICU::unique_uregex Intern(const std::wstring_view& pattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI suppress whitespace changes. The conversion to a class barely changed anything.
This is a resurrection of #8588. That PR became painfully stale after the `ControlCore` split. Original description: > ## Summary of the Pull Request > This is a PoC for: > * Search status in SearchBox (aka number of matches + index of the current match) > * Live search (aka search upon typing) > ## Detailed Description of the Pull Request / Additional comments > * Introduced this optionally (global setting to enable it) > * The approach is following: > * Every time the filter changes, enumerate all matches > * Upon navigation just take the relevant match and select it > I cleaned it up a bit, and added support for also displaying the positions of the matches in the scrollbar (if `showMarksOnScrollbar` is also turned on). It's also been made SUBSTANTIALLY easier after #15858 was merged. Similar to before, searching while there's piles of output running isn't _perfect_. But it's pretty awful currently, so that's not the end of the world. Gifs below. * closes #8631 (which is a bullet point in #3920) * closes #6319 Co-authored-by: Don-Vito <khvitaly@gmail.com> --------- Co-authored-by: khvitaly <khvitaly@gmail.com>
…row (#16775) ## Summary of the Pull Request URL detection was broken again in #15858. When the regex matched, we calculate the column(cell) by its offset, we use forward or backward iteration of the column to find the correct column that displays the glyphs of `_chars[offset]`. https://github.com/microsoft/terminal/blob/abf5d9423a23b13e9af4c19ca35858f9aaf0a63f/src/buffer/out/Row.cpp#L95-L104 However, when calculating the `currentOffset` we forget that MSB of `_charOffsets[col]` could be `1`, or col is pointing to another glyph in preceding column. https://github.com/microsoft/terminal/blob/abf5d9423a23b13e9af4c19ca35858f9aaf0a63f/src/buffer/out/Row.hpp#L223-L226
The ultimate goal of this PR was to use ICU for text search to
Previously we used
towlower
and only supported BMP glphs.This allows us to search for all results in the entire text buffer
at once without having to do so asynchronously.
Unfortunately, this required some significant changes too:
mapped back to buffer coordinates. This required the introduction of
CharToColumnMapper
to implement sort of a reverse-_charOffsets
mapping. It turns text (character) positions back into coordinates.
It used the current selection as the starting position for the new
search. But since ICU's
uregex
cannot search backwards we'rerequired to accumulate all results in a vector first and so we
need to cache that vector in between searches.
to track any changes made to
TextBuffer
. The way this commit solvesit is by splitting
GetRowByOffset
intoGetRowByOffset
forconst ROW
access andGetMutableRowByOffset
which increments amutation counter on each call. The
Search
instance can then compareits cached mutation count against the previous mutation count.
Finally, this commit makes 2 semi-unrelated changes:
text search anyways. This significantly improves performance at
large window sizes.
UiaTracing
were fixed. In particular2 functions which passed strings as
wstring
by copy are nowusing
wstring_view
andTraceLoggingCountedWideString
.Related to #6319 and #8000
Validation Steps Performed