-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce cost of cursor invalidation #15500
Conversation
aaba650
to
de72cb6
Compare
ah this needs merges into it before it's reviewable, doesn't it |
3cb78a4
to
7aa3731
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10-40% throughput improvement in conhost with this? Yea this is worth it IMO.
Only holding off ✅ because the WaitForPaintCompletionAndDisable
thing scares me
|
||
if (_renderer) | ||
{ | ||
_renderer->TriggerTeardown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer need to _pThread->WaitForPaintCompletionAndDisable(INFINITE);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. The destructor blocks until the renderer is fully shut down, which is similar to WaitForPaintCompletionAndDisable
but simpler and faster.
I'll re-add the explicit destructor calls here just to be sure nothing regresses. We use a lot of plain/unsafe pointers after all.
@@ -64,44 +64,90 @@ Renderer::~Renderer() | |||
// - HRESULT S_OK, GDI error, Safe Math error, or state/argument errors. | |||
[[nodiscard]] HRESULT Renderer::PaintFrame() | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the next best area to optimize would be For non-English text the cost distribution is wastly different and there we would get the biggest benefit via async rendering with buffer snapshots (+34%) and improving |
src/buffer/out/LineRendition.hpp
Outdated
constexpr til::rect ScreenToBufferLine(const til::rect& line, const LineRendition lineRendition) | ||
{ | ||
// Use shift right to quickly divide the Left and Right by 2 for double width lines. | ||
const auto scale = lineRendition == LineRendition::SingleWidth ? 0 : 1; | ||
return { line.left >> scale, line.top, line.right >> scale, line.bottom }; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right hand side may be off by one when you're dealing with exclusive coordinates with an odd value. For example, if the right screen coordinate is 7 exclusive (6 inclusive), that should map to a buffer coordinate of 4 exclusive (3 inclusive). A simple right shift works for inclusive coordinates (6 >> 1 = 3
), but not for exclusive coordinates (7 >> 1 = 3
, but we want 4).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I think the current approach is valid as well, in a certain way. Currently, these two related functions round down the width of the buffer:
TextBuffer::GetLineWidth
ROW::GetReadableColumnCount
If this function where to round up the right
coordinate, then passing a viewport sized til::rect
will end up having a different size than the size reported by the above two functions.
I do prefer your suggestion, but do we need to change other code first before we can round exclusive coordinates up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I guess this means we've had this inconsistency for a while right? Because the inclusive_rect
variant of ScreenToBufferLine
may have reported an inclusive .right
of 59 while the above two functions reported a max. width of 59. Hmm... I'm not sure how to best resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're expected to be inconsistent. GetLineWidth
tells you how many buffer columns you can fit on the screen, so it has to round down if only half of the last column will fit. But ScreenToBufferLine
is used to determine how much of the buffer is required to cover a given screen area. If you round down, you won't get enough buffer content, and the last screen cell won't be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I see what you mean. However, I'm also a little squirmish about what you said. This function isn't used anywhere yet and so I'll remove it for now and we can revisit it later. 🙂
Idle thoughts: Personally speaking, I would prefer if we could consistently use til::rect
with its exclusive coordinates everywhere, even in areas of our code where inclusive coordinates would be a better fit, just so that everything works consistently. I wonder if this issue would go away with such a change as well... Probably not really, but it may still avoid some potential for confusion.
} | ||
} | ||
|
||
FOREACH_ENGINE(pEngine) | ||
{ | ||
RETURN_IF_FAILED(pEngine->Present()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one big functional change here is that we now Present under lock. We did not do that before - theoretically it allowed us to yeet bits at the GPU (or whatever) while the console continued working. It was built on the idea that the slow operation would be finalization.
Is this no longer required? Is there a risk to making this locking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _pData->LockConsole();
call (and unlock) is inside an extra scope. So this will still run without the lock being held.
src/renderer/base/renderer.cpp
Outdated
{ | ||
LOG_IF_FAILED(pEngine->PaintCursor(cursorInfo.value())); | ||
LOG_IF_FAILED(pEngine->PaintCursor(_currentCursorOptions.value())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can use *_currentCursorOptions
(i think it skips a check where .value()
makes a check? but we just checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler will be able to optimize redundant trivially inlinable branches away if there isn't a call in between that it can't inspect (the compiler must assume that memory may have mutated at any time during an external call). In this case it'll be able to optimize away the check for sure (just checked it), however this still isn't the case for Debug builds and so tl;dr: Yeah 100% agreed.
|
||
if (_currentCursorOptions) | ||
{ | ||
_currentCursorOptions->coordCursor += delta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wat? why weren't we doing this before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_currentCursorOptions
is only a member to smuggle state up outside a function call and back into an entirely unrelated one at a later time. It's never used beyond that. I.e. it wasn't preserved between render passes.
Performance of printing enwik8.txt at the following block sizes:
4KiB (printf): 53MB/s -> 58MB/s
128KiB (cat): 170MB/s -> 235MB/s
This commit is imperfect. Support for more than one rendering
engine was "hacked" into
Renderer
and is not quite correct.As such, this commit cannot fix cursor invalidation correctly either,
and while some bugs are fixed (engines may see highly inconsistent
TextBuffer and Cursor states), it introduces others (an error in the
first engine may result in the second engine not executing).
Neither of those are good and the underlying issue remains to be fixed.
Validation Steps Performed