Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cost of cursor invalidation #15500

Merged
merged 6 commits into from
Apr 10, 2024
Merged

Reduce cost of cursor invalidation #15500

merged 6 commits into from
Apr 10, 2024

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Jun 2, 2023

Performance of printing enwik8.txt at the following block sizes:
4KiB (printf): 53MB/s -> 58MB/s
128KiB (cat): 170MB/s -> 235MB/s

This commit is imperfect. Support for more than one rendering
engine was "hacked" into Renderer and is not quite correct.
As such, this commit cannot fix cursor invalidation correctly either,
and while some bugs are fixed (engines may see highly inconsistent
TextBuffer and Cursor states), it introduces others (an error in the
first engine may result in the second engine not executing).
Neither of those are good and the underlying issue remains to be fixed.

Validation Steps Performed

  • Seems ok? ✅

@lhecker lhecker added Product-Conhost For issues in the Console codebase Area-Performance Performance-related issue labels Jun 2, 2023
@lhecker lhecker force-pushed the dev/lhecker/vt-perf3 branch 2 times, most recently from aaba650 to de72cb6 Compare June 30, 2023 15:01
@zadjii-msft
Copy link
Member

ah this needs merges into it before it's reviewable, doesn't it

Base automatically changed from dev/lhecker/vt-perf3 to main July 5, 2023 19:26
@lhecker lhecker force-pushed the dev/lhecker/vt-perf4 branch from 3cb78a4 to 7aa3731 Compare February 21, 2024 22:33
@lhecker lhecker marked this pull request as ready for review February 21, 2024 22:34
@zadjii-msft zadjii-msft added this to the Terminal v1.21 milestone Feb 27, 2024
@zadjii-msft zadjii-msft self-assigned this Mar 25, 2024
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10-40% throughput improvement in conhost with this? Yea this is worth it IMO.

Only holding off ✅ because the WaitForPaintCompletionAndDisable thing scares me


if (_renderer)
{
_renderer->TriggerTeardown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer need to _pThread->WaitForPaintCompletionAndDisable(INFINITE);?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. The destructor blocks until the renderer is fully shut down, which is similar to WaitForPaintCompletionAndDisable but simpler and faster.

I'll re-add the explicit destructor calls here just to be sure nothing regresses. We use a lot of plain/unsafe pointers after all.

src/cascadia/TerminalControl/HwndTerminal.cpp Show resolved Hide resolved
@@ -64,44 +64,90 @@ Renderer::~Renderer()
// - HRESULT S_OK, GDI error, Safe Math error, or state/argument errors.
[[nodiscard]] HRESULT Renderer::PaintFrame()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much more legible w/o whitespace:
image

src/renderer/base/renderer.cpp Show resolved Hide resolved
src/renderer/base/renderer.cpp Show resolved Hide resolved
@zadjii-msft zadjii-msft removed their assignment Mar 26, 2024
@lhecker
Copy link
Member Author

lhecker commented Mar 26, 2024

10-40% throughput improvement in conhost with this? Yea this is worth it IMO.

FYI the next best area to optimize would be AdaptDispatch::_DoLineFeed which would bring up to +32%. Its costs is a combination of overly zealous invalidation (currently required by ConPTY however), our current hyperlink implementation via hashmaps and a lot of smaller flaws.

For non-English text the cost distribution is wastly different and there we would get the biggest benefit via async rendering with buffer snapshots (+34%) and improving IsGlyphFullWidth (binary tree --> trie; +27%).

Comment on lines 24 to 30
constexpr til::rect ScreenToBufferLine(const til::rect& line, const LineRendition lineRendition)
{
// Use shift right to quickly divide the Left and Right by 2 for double width lines.
const auto scale = lineRendition == LineRendition::SingleWidth ? 0 : 1;
return { line.left >> scale, line.top, line.right >> scale, line.bottom };
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right hand side may be off by one when you're dealing with exclusive coordinates with an odd value. For example, if the right screen coordinate is 7 exclusive (6 inclusive), that should map to a buffer coordinate of 4 exclusive (3 inclusive). A simple right shift works for inclusive coordinates (6 >> 1 = 3), but not for exclusive coordinates (7 >> 1 = 3, but we want 4).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think the current approach is valid as well, in a certain way. Currently, these two related functions round down the width of the buffer:

  • TextBuffer::GetLineWidth
  • ROW::GetReadableColumnCount

If this function where to round up the right coordinate, then passing a viewport sized til::rect will end up having a different size than the size reported by the above two functions.

I do prefer your suggestion, but do we need to change other code first before we can round exclusive coordinates up here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess this means we've had this inconsistency for a while right? Because the inclusive_rect variant of ScreenToBufferLine may have reported an inclusive .right of 59 while the above two functions reported a max. width of 59. Hmm... I'm not sure how to best resolve this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're expected to be inconsistent. GetLineWidth tells you how many buffer columns you can fit on the screen, so it has to round down if only half of the last column will fit. But ScreenToBufferLine is used to determine how much of the buffer is required to cover a given screen area. If you round down, you won't get enough buffer content, and the last screen cell won't be updated.

Copy link
Member Author

@lhecker lhecker Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah I see what you mean. However, I'm also a little squirmish about what you said. This function isn't used anywhere yet and so I'll remove it for now and we can revisit it later. 🙂

Idle thoughts: Personally speaking, I would prefer if we could consistently use til::rect with its exclusive coordinates everywhere, even in areas of our code where inclusive coordinates would be a better fit, just so that everything works consistently. I wonder if this issue would go away with such a change as well... Probably not really, but it may still avoid some potential for confusion.

src/buffer/out/LineRendition.hpp Outdated Show resolved Hide resolved
}
}

FOREACH_ENGINE(pEngine)
{
RETURN_IF_FAILED(pEngine->Present());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one big functional change here is that we now Present under lock. We did not do that before - theoretically it allowed us to yeet bits at the GPU (or whatever) while the console continued working. It was built on the idea that the slow operation would be finalization.

Is this no longer required? Is there a risk to making this locking change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _pData->LockConsole(); call (and unlock) is inside an extra scope. So this will still run without the lock being held.

{
LOG_IF_FAILED(pEngine->PaintCursor(cursorInfo.value()));
LOG_IF_FAILED(pEngine->PaintCursor(_currentCursorOptions.value()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use *_currentCursorOptions (i think it skips a check where .value() makes a check? but we just checked)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will be able to optimize redundant trivially inlinable branches away if there isn't a call in between that it can't inspect (the compiler must assume that memory may have mutated at any time during an external call). In this case it'll be able to optimize away the check for sure (just checked it), however this still isn't the case for Debug builds and so tl;dr: Yeah 100% agreed.


if (_currentCursorOptions)
{
_currentCursorOptions->coordCursor += delta;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wat? why weren't we doing this before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_currentCursorOptions is only a member to smuggle state up outside a function call and back into an entirely unrelated one at a later time. It's never used beyond that. I.e. it wasn't preserved between render passes.

@DHowett DHowett enabled auto-merge April 2, 2024 18:46
@DHowett DHowett added this pull request to the merge queue Apr 10, 2024
Merged via the queue into main with commit 20b0bed Apr 10, 2024
20 checks passed
@DHowett DHowett deleted the dev/lhecker/vt-perf4 branch April 10, 2024 19:27
lhecker added a commit that referenced this pull request Jul 26, 2024
Regressed in #15500, incorrectly fixed in #17332, exposed by #17583.
My ineptitude on full display. If this isn't the last cursor
invalidation bug I'm going to cry.

Closes #17615

## Validation Steps Performed
* cmd.exe
* a directory with 6 files
* 80x24 viewport
* run `cls`
* run `dir` twice
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants