-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for VT paging operations #16615
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! I haven't finished reviewing it yet unfortunately, but here are a couple initial thoughts I had...
da21fd9
to
117323f
Compare
117323f
to
f5b0fdd
Compare
I think this is ready for review now. I've been using it for a while, and it works nicely for my use cases, although I'm not sure how much benefit it will be for anyone else. And I should warn you that it might have some impact on performance, but I'm hoping it isn't significant. I did a test run with the new benchmark tool, and the only tests that looked a lot worse were the 4Ki WriteConsole ones, but the times are in microseconds, so I'm not sure they're really as bad as they look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, LGTM. I can confirm that there's a consistent performance regression, albeit not a large one. It can probably be fixed by making the code more "data oriented" and by avoiding large struct copies. ("large" for MSVC = anything larger than a pointer, unfortunately.) But I'm not particularly worried about this regression. We have a lot of way lower hanging fruits we can tackle instead.
I'm currently mostly worried about the overall architecture of the terminal, since that has a much bigger impact on performance than any SIMD, etc., we may ever do, and may also help us keep the code lean.
// If we're changing the visible page, what we do is swap out the current | ||
// visible page into its backing buffer, and swap in the new page from the | ||
// backing buffer to the main buffer. That way the rest of the system only | ||
// ever has to deal with the main buffer. | ||
if (makeVisible && _visiblePageNumber != newPageNumber) | ||
{ | ||
const auto& newBuffer = _getBuffer(newPageNumber, pageSize); | ||
auto& saveBuffer = _getBuffer(_visiblePageNumber, pageSize); | ||
for (auto i = 0; i < pageSize.height; i++) | ||
{ | ||
saveBuffer.GetMutableRowByOffset(i).CopyFrom(visibleBuffer.GetRowByOffset(visibleTop + i)); | ||
} | ||
for (auto i = 0; i < pageSize.height; i++) | ||
{ | ||
visibleBuffer.GetMutableRowByOffset(visibleTop + i).CopyFrom(newBuffer.GetRowByOffset(i)); | ||
} | ||
_visiblePageNumber = newPageNumber; | ||
redrawRequired = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More importantly, I'm a little worried about this change. Why don't we swap the visible buffer in the _api
here? Is it because of some fundamental reason or did you do it due to our current, rather "intertwined" architecture?
If it's the latter I'm afraid this PR is making things worse. If I'm reading it correctly, the original DECs did not seem to have made a distinction between the "main visible" buffer and other pages. They were all pages, but one of them was simply the visible. Is my understanding correct? If so, do you think it's feasible to change the architecture in this PR? I'd completely understand if that was too difficult or time consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we swap the visible buffer in the
_api
here?
Well the first problem is that anything in the _api
needs to be written twice, because the buffer management in Windows Terminal is entirely separate from conhost. In an ideal world we'd have a unified buffer API that covered all the needs of the legacy console API, the XTerm alt buffers, and the VT paging, and which was shared between conhost and WT, but addressing that problem is way more work than I wanted to take on now.
And besides that, I'm not sure there'd be any benefit to it anyway. When swapping the visible page, it wouldn't just be a matter of pointing the renderer at the appropriate buffer - it's kind of mix. The scrollback (the S
in that diagram above) is shared between all pages. It's only the P+V
part that changes when you switch the visible page.
the original DECs did not seem to have made a distinction between the "main visible" buffer and other pages.
That is correct, but they didn't support scrollback and paging at the same time, so the S
section in the diagram above would not have existed when paging was enabled. But modern terminals do tend to support both, so I thought it would be best if we could handle that too. And if you want to retain scrollback (which is stored in the main buffer) when switching the visible page, there's got to be some form of buffer merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what you wrote, and speaking long term, would you be in favor of us splitting the storage of the viewport/page and scrollback into separate buffers? I've been thinking about whether this would allow us to use a more efficient compression scheme for the scrollback, while using a more performant architecture for the viewport (= no run-length compression of attributes, etc.). It may also simplify building an infinite scrollback feature. Now, reading your comment, it seems that this may also simplify the paging implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense to me, at least from a VT point of view. And if that split also enabled us to have the viewport/page storage being a different width to the scrollback buffer, that might be a convenient way to support VT windowing (not an essential feature, but it's something which might be nice to have one day).
The things I think would be most problematic would be the legacy console APIs, and buffer resizing. But if you want to support infinite scrollback, those areas are likely to be complicated regardless of the architecture.
{ | ||
// Page buffers are created on demand, and are sized to match the active | ||
// page dimensions without any scrollback rows. | ||
buffer = std::make_unique<TextBuffer>(pageSize, TextAttribute{}, 0, false, _renderer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(BTW It'd be great if we could move to using shared_ptr<TextBuffer>
over time so that we can more safely share buffers with the rendering thread, or simply between components.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I just though the unique ptr was more efficient, but I'm happy to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the unique_ptr
here and I don't think you need to change it. If we were to change it, I think we should make it consistent across the code base.
The only difference between unique_ptr
and shared_ptr
is that the latter allocates an 8 byte large control block which contains the strong and weak reference counts. Reference counting itself only occurs on copies and so moving a shared_ptr
has the same overhead as a unique_ptr
(= none, apart from the struct copy). The allocation of the control block can be avoided by using make_shared
. I believe this makes shared_ptr
as cost-free as unique_ptr
.
|
||
Page PageManager::ActivePage() const | ||
{ | ||
return Get(_activePageNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be able to improve performance quite a bit here by caching the active page and handing out a mutable reference only. (Same for VisiblePage
.) Constructing and returning large structs like this is expensive, particularly with MSVC. (GetBufferAndViewport
has the same overhead as Get
, and we may avoid both overheads this way.)
However, I believe this cannot be done right now since the visibleViewport
and isMainBuffer
are managed by the _api
. If the PageManager
was in charge of the active and visible page, as well as the viewport1 then it could cache information in a robust manner I believe. I suppose this ties into my other comment re: architecture.
Footnotes
-
"viewports"? I'm sure if viewport is a plural in VT world. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I believe this cannot be done right now since the
visibleViewport
andisMainBuffer
are managed by the_api
.
Yeah, I had hoped that one day we might avoid this issue by having AdaptDispatch
register some sort of callback with the api, so it can get notified of buffer and viewport changes, instead of having to query them every time. But we always have to deal with the fact that the legacy console APIs are in control of those aspects of the architecture, so that information has to be passed across the api boundary one way or the other. Again, not something I want to try and solve now.
However, if it's the copying of these page objects that is causing the performance regression, that might still be something I can improve. When I have more time I'll do some experimenting and see if I can find a way to make that any faster.
FWIW I'm still somewhat worried about putting the |
Yesterday I asked Leonard to explain this to me so I'd be able to review it for 1.21. I'm sorry we let you marinate for 3 months! I'm on this today, for release next week or so. Thanks so much, James, for everything. |
@DHowett Honestly I don't mind if you want to leave this for a while. I was hoping there might be a way to improve the performance based on Leonard's suggestion about page caching, but I've been putting off working on that because I was having too much fun actually using the new functionality (e.g. see VT-Rex). That said, I also wouldn't mind if you chose to merge it as is, but I'm somewhat concerned this might be a feature that nobody else uses besides me, and so for everyone else it would just be an unwanted performance regression. |
I don't think you need to worry about the performance too much - The regression is rather small. I forgot how much it was, but it was like -5% or something. The recent cursor caching PR (if you've seen it) alone brought us +20%, and there's a hundred more places like that. IMO the best thing we can do for perf. is to instead shrink our architecture. If we have less interfaces, less adapters, less classes, and so on, and more data driven design overall, we'll probably straight up double our VT performance without even touching the underlying algorithms. In ConPTY for instance, if we ignore VtEngine, the actual text and attribute insertion including width measurement only consumes around 20% of CPU time in the worst case. BTW if anyone (or you) wants to work on perf., here's something that would probably net us another >10% uplift, with the potential to improve performance much more in the long run (= it's an unblocker for other optimizations): void Cursor::DelayEOLWrap() noexcept
{
return _cPosition.x >= _parentBuffer._width;
} That is, we represent a delayed cursor by putting it into a column past the right end of the buffer. We would need to modify til::point Cursor::GetPosition() const noexcept
{
const auto width = _parentBuffer._width;
return { std::min(_cPosition.x, width - 1), _cPosition.y };
} One major VT perf. cost is that our parser simply isn't a traditional one: It should ideally be implemented as a state machine with lookup table. Something like Alacritty's parser: https://github.com/alacritty/vte/blob/master/src/table.rs
P.S.: Another very difficult, but very impactful change would be calling However, I think the other changes above are more important right now as they simplify our architecture. |
Just on this point, I don't think the boolean flag is avoidable, because delayed wrapping is not actually limited to the final column of the display. With margins applied, you can have a delayed wrap occurring almost anywhere. And technically I think you can probably set the flag manually with |
FYI, I've had a chance to experiment with the page caching, but I couldn't get it to make much of a difference in terms of performance. So what's here now is as good as it's going to get from me (at least in the short term). If you all are happy with it as it is, it would be nice to have it merged, but I don't mind if you decide against it. |
Ah that's fine then. We're about to fork off release-1.21 now and I think afterwards we can merge it (for version 1.22). I can't speak for them, but it may still take a little bit, because Dustin & Mike are quite busy right now with the release of not just this, but sudo, bug fixes, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holy crap I reviewed this like a month ago and just never hit ✅. My bad. Well, let's get this in for 1.22 now, shall we?
Alright! Let's get this merge conflict unconflicted and we can land this thing. Thank you for your patience. James, generally speaking even if you're the only person who would benefit from a particular aspect of VT compatibility I'd still love for us to have it. 🙂 |
@DHowett In case you haven't seen, the merge conflicts have been resolved now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I've read it all over, I think I understand it well enough, and I see the ways we can promote better buffer management across the rest of the codebase (which I think is one of Leonard's concerns about having Pages live in the Adapter).
Let's do it.
James, this feels like another tour-de-force. Thank you.
Thanks, it looks promising. |
@alabuzhev If you're using the nightly/canary build it should be supported. |
@j4james my bad, I was using an outdated one. Thanks. |
## Summary of the Pull Request When we receive a stream of output at the bottom of the page that wraps over more than two lines, that is expected to pan the viewport down by multiple rows to accommodate all of the output. However, when the output is received in a single write, that did not work correctly. The problem was that we were reusing a `Page` instance across multiple `_DoLineFeed` calls, and the viewport cached in that `Page` wasn't valid after the first call. This PR fixes the issue by adjusting the cached viewport when we determine it has been moved by `_DoLineFeed`. ## References and Relevant Issues The bug was introduced in PR #16615 when paging support was added. ## Validation Steps Performed I've verified that the test case in #17351 is now working correctly, and have added a unit test covering this scenario. ## PR Checklist - [x] Closes #17351 - [x] Tests added/passed
This PR adds support for multiples pages in the VT architecture, along
with new operations for moving between those pages:
NP
(Next Page),PP
(Preceding Page),PPA
(Page Position Absolute),PPR
(PagePosition Relative), and
PPB
(Page Position Back).There's also a new mode,
DECPCCM
(Page Cursor Coupling Mode), whichdetermines whether or not the active page is also the visible page, and
a new query sequence,
DECRQDE
(Request Displayed Extent), which can beused to query the visible page.
References and Relevant Issues
When combined with
DECCRA
(Copy Rectangular Area), which can copybetween pages, you can layer content on top of existing output, and
still restore the original data afterwards. So this could serve as an
alternative solution to #10810.
Detailed Description of the Pull Request / Additional comments
On the original DEC terminals that supported paging, you couldn't have
both paging and scrollback at the same time - only the one or the other.
But modern terminals typically allow both, so we support that too.
The way it works, the currently visible page will be attached to the
scrollback, and any content that scrolls off the top will thus be saved.
But the background pages will not have scrollback, so their content is
lost if it scrolls off the top.
And when the screen is resized, only the visible page will be reflowed.
Background pages are not affected by a resize until they become active.
At that point they just receive the traditional style of resize, where
the content is clipped or padded to match the new dimensions.
I'm not sure this is the best way to handle resizing, but we can always
consider other approaches once people have had a chance to try it out.
Validation Steps Performed
I've added some unit tests covering the new operations, and also done a
lot of manual testing.
Closes #13892
Tests added/passed