-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an efficient text stream write function #14821
Conversation
db90593
to
1a24ff3
Compare
This comment has been minimized.
This comment has been minimized.
1a24ff3
to
a6c5e7a
Compare
This comment has been minimized.
This comment has been minimized.
uint16_t column = 0; | ||
for (const auto& ch : std::wstring_view{ L"AB\u304bC\u304dDE " }) | ||
{ | ||
const uint16_t width = ch >= 0x80 ? 2 : 1; | ||
pRow->ReplaceCharacters(column, width, { &ch, 1 }); | ||
column += width; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot why I made this change... But it's much shorter now, so there's that at least.
src/buffer/out/Row.cpp
Outdated
@@ -65,6 +68,65 @@ constexpr OutIt copy_n_small(InIt first, Diff count, OutIt dest) | |||
return dest; | |||
} | |||
|
|||
RowTextIterator::RowTextIterator(std::span<const wchar_t> chars, std::span<const uint16_t> charOffsets, uint16_t offset) noexcept : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RowTextIterator
isn't used anywhere yet, but I did use it during development as a debug aid. I expect this struct to be used in the future for things like CHAR_INFO
reads.
This comment has been minimized.
This comment has been minimized.
I began writing a new implementation for Unfortunately it still pegs an entire CPU core, but it'll be equally fun and satisfying to fix that: We can still...
Unfortunately, for the life of me, I can't figure out how to properly implement cursor wrapping. From what I learned about our VT code I would've assumed that I need to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand this, and I need a second adult to think they also understand it.
@@ -499,6 +739,11 @@ void ROW::_resizeChars(uint16_t colExtEnd, uint16_t chExtBeg, uint16_t chExtEnd, | |||
} | |||
} | |||
|
|||
til::small_rle<TextAttribute, uint16_t, 1>& ROW::Attributes() noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently using it in a follow-up branch. I can remove it for this PR, but I also feel like it doesn't hurt keeping it in.
I've just tried out this branch, and there is one change in behavior which I think is potentially bad. If you have a wide glyph output with a black background, and you overwrite the start of it with a narrow glyph using a white background, both cells that were previously occupied by the wide glyph are changed from black to white. In the previous implementation, the second cell would remain black, which I think was the correct behavior. An example of where this can be a problem is when you have a popup window with a white background on top of some wide glyphs using a black background. Whenever the border of the window intersects with a wide glyph it'll end up filling more cells that intended, giving the window a broken effect. See below: |
Hmm unless I forgot something I don't think we need the ability to write text without modifying attributes right now, do we? At least not in any fast path I mean (in a hypothetical, seldomly used slow path we can just make a backup of the row and restore the attributes from the backup). Because then I could just move writing attributes into the |
I'm not positive, but I thought that's what |
I've just noticed another problem (possibly related to the issue above). When you write a narrow glyph over the start of a wide glyph, the cursor position is moved up by two. So if you try to write a line of English text over some Japanese text, you end up with everything double-spaced! |
I've tested the first issue with this: printf '\x1b7猫咪\x1b8\x1b[107ma\x1b[m\n' And the second issue with this: stdbuf -o0 printf '\x1b7猫咪\x1b8a'; printf 'b\n' As far as I can tell the latest commit fixes both issues. Thank you so much for finding these! |
// Exclude rarely-used stuff from Windows headers | ||
#define WIN32_LEAN_AND_MEAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prevents the inclusion of commdlg.h
in the BufferOut
project, which causes a conflict with ROW::ReplaceText
(ReplaceText
is a macro). I wish we could somehow disable all these TCHAR
macros in the /um/
headers.
(I'll finish fixing the unit tests Monday and add some tests for the 2 issues. I'll also probably split off the pch.h changes and create an extra PR for that.)
For now I kept the existing approach of having separate text- and attribute-writing methods on But I'm really not sure what the best approach is for the new "batch writing" APIs I'm adding in this PR. I don't like the design this PR uses at all, but I'm also failing to come up with anything better. There's of course the option to keep something like While initially coding this, I thought that splitting it up into distinct purposes (making text and attribute writing separate) would allow us to better compose them back together to implement our various APIs, make it easier to test the now smaller behavior, implement extended grapheme clusters (which requires access to full strings and not just iterators). Now, I'm not too sure anymore about the design aspect. Although I am already quite happy with how much it simplified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through everything now, but the ROW
internals were a bit over my head so I'm afraid I just skimmed most of that.
Other than the comments I made in the code, there was one other regression I noticed in testing, but I couldn't see why it was failing. It's quite possible it's not this PR that's to blame, but I'm just making a note of it here so it doesn't get lost.
printf "\e[15832;16;27;22;54\$x"
That should fill a block with a width of 14 glyphs, taking up 28 columns, but it's now writing out 28 glyphs, taking up 56 columns.
You can ignore this. It seems I checked out your branch when there was a "wip" commit that was the source of this issue. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it now. I really appreciate @j4james giving this a once over, seems like a bunch of edge cases were already caught.
Excited to see what's in store next 😉
@@ -376,6 +376,25 @@ bool TextBuffer::_PrepareForDoubleByteSequence(const DbcsAttribute dbcsAttribute | |||
return fSuccess; | |||
} | |||
|
|||
void TextBuffer::ConsumeGrapheme(std::wstring_view& chars) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda weird that this is a static on textbuffer when it just does a til::utf16_pop
, but presumably this does more in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future I'll replace this function with some ICU code that actually advances the string by an entire grapheme cluster. So, if you have something like å ("a˚") it'll advance by 2 code points and not just 1 like it does now.
It's possible to just put the ICU code into til
and use it throughout the code directly, just like we do it right now with the UTF-16 helpers. But from now on, I'd like to avoid doing that, even if it means writing such static
methods, because I'd like to keep everything string handling related as close and tight as possible in the future. I think OutputCellIterator
had the same intention originally and was well meant, but it suffers from being a leaky abstraction. Lots of code is now built around an implicit assumption that OutputCellIterator
will always advance by exactly 1 or 2 columns. Using the til
UTF-16 helpers directly elsewhere in our code would have a similar effect in my opinion, because it would equally leak (and potentially incorrectly leak) any assumptions about how TextBuffer
handles incoming text under normal circumstances.
@@ -229,6 +301,36 @@ void ROW::TransferAttributes(const til::small_rle<TextAttribute, uint16_t, 1>& a | |||
_attr.resize_trailing_extent(gsl::narrow<uint16_t>(newWidth)); | |||
} | |||
|
|||
til::CoordType ROW::NavigateToPrevious(til::CoordType column) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may want to rename to something like NavigateToPreviousGlyph
or Codepoint
or whatever the technically correct name for the unit we're moving backward by here. I instinctly thought this just did a clamp(column-1, 0, row.length)
Or at least add a comment above it
(and the same below in NavigateToNext
)
This adds PR adds a couple foundational functions and classes to make
our TextBuffer more performant and allow us to improve our Unicode
correctness in the future, by getting rid of our dependence on
OutputCellIterator
. In the future we can then replace the simpleUTF-16 code point iterator with a proper grapheme cluster iterator.
While my focus is technically on Unicode correctness, the ~4x VT
throughput increase in OpenConsole is pretty nice too.
This PR adds:
NavigateToPrevious
,NavigateToNext
)They're based on functions that align the cursor to the start/end
of the current cell, so such functions can be added as well.
ReplaceText
to write a raw string of text with the possibility tospecify a right margin.
CopyRangeFrom
will allow us to make reflow much faster, as it's ableto bulk-copy already measured strings without re-measuring them.
Related to #8000
Validation Steps Performed
wide glyph reflow at the end of a row ✅
background color at the end of the line, and "ん" on the next line: