Add an efficient text stream write function #14821

lhecker · 2023-02-10T16:43:30Z

This adds PR adds a couple foundational functions and classes to make
our TextBuffer more performant and allow us to improve our Unicode
correctness in the future, by getting rid of our dependence on
OutputCellIterator. In the future we can then replace the simple
UTF-16 code point iterator with a proper grapheme cluster iterator.

While my focus is technically on Unicode correctness, the ~4x VT
throughput increase in OpenConsole is pretty nice too.

This PR adds:

A new, simpler ROW iterator (unused in this PR)
Cursor movement functions (NavigateToPrevious, NavigateToNext)
They're based on functions that align the cursor to the start/end
of the current cell, so such functions can be added as well.
ReplaceText to write a raw string of text with the possibility to
specify a right margin.
CopyRangeFrom will allow us to make reflow much faster, as it's able
to bulk-copy already measured strings without re-measuring them.

Related to #8000

Validation Steps Performed

enwik8.txt, zhwik8.txt, emoji-test.txt, all work with proper
wide glyph reflow at the end of a row ✅
This produces "a 咪" where only "a" has a white background:
```
printf '\e7こん\e8\x1b[107ma\x1b[m\n'
```

This produces "abん":

stdbuf -o0 printf '\x1b7こん\x1b8a'; printf 'b\n'

This produces "xy" at the end of the line:

stdbuf -o0 printf '\e[999C\bこ\bx'; printf 'y\n'

This produces red whitespace followed by "こ " in the default
background color at the end of the line, and "ん" on the next line:
```
printf '\e[41m\e[K\e[m\e[999C\e[2Dこん\n'
```

zadjii-msft · 2023-02-10T17:10:21Z

me reading this pr body rn

src/terminal/adapter/adaptDispatch.cpp

tools/ConsoleTypes.natvis

lhecker · 2023-03-09T14:19:09Z

src/inc/test/CommonState.hpp

+        uint16_t column = 0;
+        for (const auto& ch : std::wstring_view{ L"AB\u304bC\u304dDE      " })
+        {
+            const uint16_t width = ch >= 0x80 ? 2 : 1;
+            pRow->ReplaceCharacters(column, width, { &ch, 1 });
+            column += width;
+        }


I forgot why I made this change... But it's much shorter now, so there's that at least.

lhecker · 2023-03-09T14:20:52Z

src/buffer/out/Row.cpp

@@ -65,6 +68,65 @@ constexpr OutIt copy_n_small(InIt first, Diff count, OutIt dest)
    return dest;
 }

+RowTextIterator::RowTextIterator(std::span<const wchar_t> chars, std::span<const uint16_t> charOffsets, uint16_t offset) noexcept :


RowTextIterator isn't used anywhere yet, but I did use it during development as a debug aid. I expect this struct to be used in the future for things like CHAR_INFO reads.

lhecker · 2023-03-12T16:57:22Z

I began writing a new implementation for TextBuffer::Reflow based on first principles over at b6c9e3f. Even at roughly 400x200 large viewports I can now resize OpenConsole at >120 FPS, whereas it previously ran at about 10 FPS for me. The difference is quite satisfying.

Unfortunately it still pegs an entire CPU core, but it'll be equally fun and satisfying to fix that: We can still...

implement lazy initialization for ROW so that we don't have to first fill all those chars with whitespace and charOffsets with 0, 1, 2, 3, 4, ...
try to replace GetLastNonSpaceCharacter with our existing virtual bottom logic and/or avoid running it twice on OpenConsole's side by better integrating SCREEN_INFORMATION::ResizeWithReflow
vectorize ROW::WriteHelper::CopyRangeFrom, because it is the innermost loop during resize and vectorization is quite trivial there (I added the code already, it's commented out though)

Unfortunately, for the life of me, I can't figure out how to properly implement cursor wrapping. From what I learned about our VT code I would've assumed that I need to use Cursor::DelayEOLWrap(), but the old code didn't use it at all. So I'm left wondering now how the old code "unwrapped" cursors when widening the viewport. 😅 I mean I do get how the old code works, but I'm failing to understand how it works arithmetically, so that I can rewrite it.

src/buffer/out/Row.cpp

DHowett

I think I understand this, and I need a second adult to think they also understand it.

src/terminal/adapter/adaptDispatch.cpp

DHowett · 2023-03-17T22:39:17Z

src/buffer/out/Row.cpp

@@ -499,6 +739,11 @@ void ROW::_resizeChars(uint16_t colExtEnd, uint16_t chExtBeg, uint16_t chExtEnd,
    }
 }

+til::small_rle<TextAttribute, uint16_t, 1>& ROW::Attributes() noexcept


I'm currently using it in a follow-up branch. I can remove it for this PR, but I also feel like it doesn't hurt keeping it in.

j4james · 2023-03-18T00:01:32Z

I've just tried out this branch, and there is one change in behavior which I think is potentially bad. If you have a wide glyph output with a black background, and you overwrite the start of it with a narrow glyph using a white background, both cells that were previously occupied by the wide glyph are changed from black to white. In the previous implementation, the second cell would remain black, which I think was the correct behavior.

An example of where this can be a problem is when you have a popup window with a white background on top of some wide glyphs using a black background. Whenever the border of the window intersects with a wide glyph it'll end up filling more cells that intended, giving the window a broken effect. See below:

lhecker · 2023-03-18T00:08:13Z

Hmm unless I forgot something I don't think we need the ability to write text without modifying attributes right now, do we? At least not in any fast path I mean (in a hypothetical, seldomly used slow path we can just make a backup of the row and restore the attributes from the backup).

Because then I could just move writing attributes into the ROW::Write function which gives it access to the colEnd variable (the last written column, excluding padding space).

j4james · 2023-03-18T00:14:41Z

Hmm unless I forgot something I don't think we need the ability to write text without modifying attributes right now, do we?

I'm not positive, but I thought that's what WriteConsoleOutputCharacter does.

j4james · 2023-03-18T00:17:59Z

I've just noticed another problem (possibly related to the issue above). When you write a narrow glyph over the start of a wide glyph, the cursor position is moved up by two. So if you try to write a line of English text over some Japanese text, you end up with everything double-spaced!

…am-writing

lhecker · 2023-03-18T14:38:19Z

I've tested the first issue with this:

printf '\x1b7猫咪\x1b8\x1b[107ma\x1b[m\n'

And the second issue with this:

stdbuf -o0 printf '\x1b7猫咪\x1b8a'; printf 'b\n'

As far as I can tell the latest commit fixes both issues. Thank you so much for finding these!

lhecker · 2023-03-18T14:41:24Z

src/inc/LibraryIncludes.h

+// Exclude rarely-used stuff from Windows headers
+#define WIN32_LEAN_AND_MEAN


This prevents the inclusion of commdlg.h in the BufferOut project, which causes a conflict with ROW::ReplaceText (ReplaceText is a macro). I wish we could somehow disable all these TCHAR macros in the /um/ headers.

lhecker · 2023-03-18T15:12:04Z

(I'll finish fixing the unit tests Monday and add some tests for the 2 issues. I'll also probably split off the pch.h changes and create an extra PR for that.)

I'm not positive, but I thought that's what WriteConsoleOutputCharacter does.

For now I kept the existing approach of having separate text- and attribute-writing methods on ROW. TextBuffer can then call them as it sees fit. So in case of WriteConsoleOutputCharacter I imagine we can just add another TextBuffer method which only calls ReplaceText and handles line wrapping in a loop. There's also the option to make backups of the row's attributes and restore the backup afterwards. I'm personally not too concerned about the overhead there, because the overhead of the cell-wise iterator-based APIs we have right now are way beyond anything that involves making a few memory copies. Basically, I'm confident that I'll be able to replicate anything that OutputCellIterator can do now in the future as well, no matter what.

But I'm really not sure what the best approach is for the new "batch writing" APIs I'm adding in this PR. I don't like the design this PR uses at all, but I'm also failing to come up with anything better. There's of course the option to keep something like OutputCellIterator and continue to only have a single ROW:Write function, which then internally has a switch/case to specialize on all the different kinds a OutputCellIterator can be (text writing, attribute writing, text and attribute writing, filling with characters, copying from another row, etc.).

While initially coding this, I thought that splitting it up into distinct purposes (making text and attribute writing separate) would allow us to better compose them back together to implement our various APIs, make it easier to test the now smaller behavior, implement extended grapheme clusters (which requires access to full strings and not just iterators). Now, I'm not too sure anymore about the design aspect.

Although I am already quite happy with how much it simplified CommandListPopup.cpp despite these shortcomings: ae0f682#diff-0e5f3869db09d92c3424d43a535efe53edfff1a1a4b8fb95081f39839a172179

j4james

I've read through everything now, but the ROW internals were a bit over my head so I'm afraid I just skimmed most of that.

Other than the comments I made in the code, there was one other regression I noticed in testing, but I couldn't see why it was failing. It's quite possible it's not this PR that's to blame, but I'm just making a note of it here so it doesn't get lost.

~~printf "\e[15832;16;27;22;54\$x"~~

~~That should fill a block with a width of 14 glyphs, taking up 28 columns, but it's now writing out 28 glyphs, taking up 56 columns.~~

src/buffer/out/Row.cpp

src/terminal/adapter/adaptDispatch.cpp

src/buffer/out/textBuffer.cpp

j4james · 2023-03-19T02:32:02Z

That should fill a block with a width of 14 glyphs, taking up 28 columns, but it's now writing out 28 glyphs, taking up 56 columns.

You can ignore this. It seems I checked out your branch when there was a "wip" commit that was the source of this issue.

src/terminal/adapter/adaptDispatch.cpp

lhecker · 2023-03-22T19:44:32Z

diddly.

zadjii-msft

I think I get it now. I really appreciate @j4james giving this a once over, seems like a bunch of edge cases were already caught.

Excited to see what's in store next 😉

zadjii-msft · 2023-03-13T19:11:15Z

src/buffer/out/textBuffer.cpp

@@ -376,6 +376,25 @@ bool TextBuffer::_PrepareForDoubleByteSequence(const DbcsAttribute dbcsAttribute
    return fSuccess;
 }

+void TextBuffer::ConsumeGrapheme(std::wstring_view& chars) noexcept


kinda weird that this is a static on textbuffer when it just does a til::utf16_pop, but presumably this does more in the future?

In the future I'll replace this function with some ICU code that actually advances the string by an entire grapheme cluster. So, if you have something like å ("a˚") it'll advance by 2 code points and not just 1 like it does now.

It's possible to just put the ICU code into til and use it throughout the code directly, just like we do it right now with the UTF-16 helpers. But from now on, I'd like to avoid doing that, even if it means writing such static methods, because I'd like to keep everything string handling related as close and tight as possible in the future. I think OutputCellIterator had the same intention originally and was well meant, but it suffers from being a leaky abstraction. Lots of code is now built around an implicit assumption that OutputCellIterator will always advance by exactly 1 or 2 columns. Using the til UTF-16 helpers directly elsewhere in our code would have a similar effect in my opinion, because it would equally leak (and potentially incorrectly leak) any assumptions about how TextBuffer handles incoming text under normal circumstances.

zadjii-msft · 2023-03-13T19:34:46Z

src/buffer/out/Row.cpp

@@ -229,6 +301,36 @@ void ROW::TransferAttributes(const til::small_rle<TextAttribute, uint16_t, 1>& a
    _attr.resize_trailing_extent(gsl::narrow<uint16_t>(newWidth));
 }

+til::CoordType ROW::NavigateToPrevious(til::CoordType column) const noexcept


nit: may want to rename to something like NavigateToPreviousGlyph or Codepoint or whatever the technically correct name for the unit we're moving backward by here. I instinctly thought this just did a clamp(column-1, 0, row.length)

Or at least add a comment above it

(and the same below in NavigateToNext)

lhecker marked this pull request as draft February 10, 2023 16:43

lhecker force-pushed the dev/lhecker/8000-stream-writing branch from db90593 to 1a24ff3 Compare March 6, 2023 17:09

This comment has been minimized.

Sign in to view

Add an efficient text stream write function

a6c5e7a

lhecker force-pushed the dev/lhecker/8000-stream-writing branch from 1a24ff3 to a6c5e7a Compare March 6, 2023 17:15

lhecker marked this pull request as ready for review March 6, 2023 17:20

j4james reviewed Mar 7, 2023

View reviewed changes

src/terminal/adapter/adaptDispatch.cpp Show resolved Hide resolved

src/terminal/adapter/adaptDispatch.cpp Outdated Show resolved Hide resolved

DHowett reviewed Mar 7, 2023

View reviewed changes

tools/ConsoleTypes.natvis Outdated Show resolved Hide resolved

Address feedback, Simplify code

f8e6884

This comment has been minimized.

Sign in to view

lhecker added 2 commits March 9, 2023 15:16

Make spell-check happy

3d04d37

Fix AuditMode failures

99f19a9

lhecker commented Mar 9, 2023

View reviewed changes

Add CopyRangeFrom to later implement TextBuffer::Reflow

173e916

This comment has been minimized.

Sign in to view

Remove debug helpers

5040328

lhecker added 2 commits March 12, 2023 18:08

Fix AuditMode failures

17e3ca5

Fix compilation

f9fbb8a

zadjii-msft added a commit that referenced this pull request Mar 13, 2023

PRE-MERGE #14821 Add an efficient text stream write function

cf1bc0a

DHowett reviewed Mar 17, 2023

View reviewed changes

src/buffer/out/Row.cpp Outdated Show resolved Hide resolved

src/buffer/out/Row.cpp Show resolved Hide resolved

DHowett approved these changes Mar 17, 2023

View reviewed changes

lhecker added 3 commits March 18, 2023 15:20

Simplify member naming scheme, Improve performance

d1163b5

Fix cursor movement and attribute writing

94fbb52

Merge remote-tracking branch 'origin/main' into dev/lhecker/8000-stre…

3c5c3d4

…am-writing

lhecker commented Mar 18, 2023

View reviewed changes

Fix unit tests

934a06d

Fix unit tests

a99b9d6

j4james reviewed Mar 19, 2023

View reviewed changes

Address feedback by j4james, Potentially fix exact line wrapping

99e669a

j4james reviewed Mar 19, 2023

View reviewed changes

src/terminal/adapter/adaptDispatch.cpp Outdated Show resolved Hide resolved

Revert exact wrap, Remove unused code, Improve tests

db89261

This comment has been minimized.

Sign in to view

Make the spelling-bot happy

e77f2be

This comment has been minimized.

Sign in to view

Make the spelling-bot happpier?

0ca2286

zadjii-msft approved these changes Mar 24, 2023

View reviewed changes

Address feedback

4b070fd

zadjii-msft added a commit that referenced this pull request Mar 24, 2023

PRE-MERGE #14821 Add an efficient text stream write function

aae06a4

DHowett approved these changes Mar 24, 2023

View reviewed changes

DHowett merged commit f20cd3a into main Mar 24, 2023

DHowett deleted the dev/lhecker/8000-stream-writing branch March 24, 2023 22:20

zadjii-msft mentioned this pull request Apr 6, 2023

Don't reflow a line as wrapped if it broke on the last cell #5368

Closed

5 tasks

zadjii-msft mentioned this pull request Sep 5, 2023

Very slow rendering of colored text #4129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an efficient text stream write function #14821

Add an efficient text stream write function #14821

lhecker commented Feb 10, 2023 •

edited

Loading

zadjii-msft commented Feb 10, 2023

This comment has been minimized.

This comment has been minimized.

lhecker Mar 9, 2023

lhecker Mar 9, 2023

This comment has been minimized.

lhecker commented Mar 12, 2023 •

edited

Loading

DHowett left a comment

DHowett Mar 17, 2023

lhecker Mar 18, 2023

j4james commented Mar 18, 2023

lhecker commented Mar 18, 2023

j4james commented Mar 18, 2023

j4james commented Mar 18, 2023

lhecker commented Mar 18, 2023

lhecker Mar 18, 2023

lhecker commented Mar 18, 2023 •

edited

Loading

j4james left a comment •

edited

Loading

j4james commented Mar 19, 2023

This comment has been minimized.

This comment has been minimized.

lhecker commented Mar 22, 2023

zadjii-msft left a comment

zadjii-msft Mar 13, 2023

lhecker Mar 24, 2023 •

edited

Loading

zadjii-msft Mar 13, 2023

		// Exclude rarely-used stuff from Windows headers
		#define WIN32_LEAN_AND_MEAN

Add an efficient text stream write function #14821

Add an efficient text stream write function #14821

Conversation

lhecker commented Feb 10, 2023 • edited Loading

Validation Steps Performed

zadjii-msft commented Feb 10, 2023

This comment has been minimized.

This comment has been minimized.

lhecker Mar 9, 2023

Choose a reason for hiding this comment

lhecker Mar 9, 2023

Choose a reason for hiding this comment

This comment has been minimized.

lhecker commented Mar 12, 2023 • edited Loading

DHowett left a comment

Choose a reason for hiding this comment

DHowett Mar 17, 2023

Choose a reason for hiding this comment

lhecker Mar 18, 2023

Choose a reason for hiding this comment

j4james commented Mar 18, 2023

lhecker commented Mar 18, 2023

j4james commented Mar 18, 2023

j4james commented Mar 18, 2023

lhecker commented Mar 18, 2023

lhecker Mar 18, 2023

Choose a reason for hiding this comment

lhecker commented Mar 18, 2023 • edited Loading

j4james left a comment • edited Loading

Choose a reason for hiding this comment

j4james commented Mar 19, 2023

This comment has been minimized.

This comment has been minimized.

lhecker commented Mar 22, 2023

zadjii-msft left a comment

Choose a reason for hiding this comment

zadjii-msft Mar 13, 2023

Choose a reason for hiding this comment

lhecker Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

zadjii-msft Mar 13, 2023

Choose a reason for hiding this comment

lhecker commented Feb 10, 2023 •

edited

Loading

lhecker commented Mar 12, 2023 •

edited

Loading

lhecker commented Mar 18, 2023 •

edited

Loading

j4james left a comment •

edited

Loading

lhecker Mar 24, 2023 •

edited

Loading