Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for VT paging operations #16615

Merged
merged 21 commits into from
May 17, 2024
Merged

Conversation

j4james
Copy link
Collaborator

@j4james j4james commented Jan 29, 2024

This PR adds support for multiples pages in the VT architecture, along
with new operations for moving between those pages: NP (Next Page),
PP (Preceding Page), PPA (Page Position Absolute), PPR (Page
Position Relative), and PPB (Page Position Back).

There's also a new mode, DECPCCM (Page Cursor Coupling Mode), which
determines whether or not the active page is also the visible page, and
a new query sequence, DECRQDE (Request Displayed Extent), which can be
used to query the visible page.

References and Relevant Issues

When combined with DECCRA (Copy Rectangular Area), which can copy
between pages, you can layer content on top of existing output, and
still restore the original data afterwards. So this could serve as an
alternative solution to #10810.

Detailed Description of the Pull Request / Additional comments

On the original DEC terminals that supported paging, you couldn't have
both paging and scrollback at the same time - only the one or the other.
But modern terminals typically allow both, so we support that too.

The way it works, the currently visible page will be attached to the
scrollback, and any content that scrolls off the top will thus be saved.
But the background pages will not have scrollback, so their content is
lost if it scrolls off the top.

And when the screen is resized, only the visible page will be reflowed.
Background pages are not affected by a resize until they become active.
At that point they just receive the traditional style of resize, where
the content is clipped or padded to match the new dimensions.

I'm not sure this is the best way to handle resizing, but we can always
consider other approaches once people have had a chance to try it out.

Validation Steps Performed

I've added some unit tests covering the new operations, and also done a
lot of manual testing.

Closes #13892
Tests added/passed

@microsoft-github-policy-service microsoft-github-policy-service bot added Issue-Task It's a feature request, but it doesn't really need a major design. Area-VT Virtual Terminal sequence support Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. labels Jan 29, 2024
Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! I haven't finished reviewing it yet unfortunately, but here are a couple initial thoughts I had...

src/terminal/adapter/PageManager.hpp Show resolved Hide resolved
src/terminal/adapter/PageManager.cpp Show resolved Hide resolved
src/terminal/adapter/PageManager.cpp Outdated Show resolved Hide resolved
src/cascadia/TerminalCore/Terminal.hpp Outdated Show resolved Hide resolved
@j4james j4james marked this pull request as ready for review March 11, 2024 00:14
@j4james
Copy link
Collaborator Author

j4james commented Mar 11, 2024

I think this is ready for review now. I've been using it for a while, and it works nicely for my use cases, although I'm not sure how much benefit it will be for anyone else.

And I should warn you that it might have some impact on performance, but I'm hoping it isn't significant. I did a test run with the new benchmark tool, and the only tests that looked a lot worse were the 4Ki WriteConsole ones, but the times are in microseconds, so I'm not sure they're really as bad as they look.

Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, LGTM. I can confirm that there's a consistent performance regression, albeit not a large one. It can probably be fixed by making the code more "data oriented" and by avoiding large struct copies. ("large" for MSVC = anything larger than a pointer, unfortunately.) But I'm not particularly worried about this regression. We have a lot of way lower hanging fruits we can tackle instead.
I'm currently mostly worried about the overall architecture of the terminal, since that has a much bigger impact on performance than any SIMD, etc., we may ever do, and may also help us keep the code lean.

Comment on lines +162 to +180
// If we're changing the visible page, what we do is swap out the current
// visible page into its backing buffer, and swap in the new page from the
// backing buffer to the main buffer. That way the rest of the system only
// ever has to deal with the main buffer.
if (makeVisible && _visiblePageNumber != newPageNumber)
{
const auto& newBuffer = _getBuffer(newPageNumber, pageSize);
auto& saveBuffer = _getBuffer(_visiblePageNumber, pageSize);
for (auto i = 0; i < pageSize.height; i++)
{
saveBuffer.GetMutableRowByOffset(i).CopyFrom(visibleBuffer.GetRowByOffset(visibleTop + i));
}
for (auto i = 0; i < pageSize.height; i++)
{
visibleBuffer.GetMutableRowByOffset(visibleTop + i).CopyFrom(newBuffer.GetRowByOffset(i));
}
_visiblePageNumber = newPageNumber;
redrawRequired = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, I'm a little worried about this change. Why don't we swap the visible buffer in the _api here? Is it because of some fundamental reason or did you do it due to our current, rather "intertwined" architecture?

If it's the latter I'm afraid this PR is making things worse. If I'm reading it correctly, the original DECs did not seem to have made a distinction between the "main visible" buffer and other pages. They were all pages, but one of them was simply the visible. Is my understanding correct? If so, do you think it's feasible to change the architecture in this PR? I'd completely understand if that was too difficult or time consuming.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we swap the visible buffer in the _api here?

Well the first problem is that anything in the _api needs to be written twice, because the buffer management in Windows Terminal is entirely separate from conhost. In an ideal world we'd have a unified buffer API that covered all the needs of the legacy console API, the XTerm alt buffers, and the VT paging, and which was shared between conhost and WT, but addressing that problem is way more work than I wanted to take on now.

And besides that, I'm not sure there'd be any benefit to it anyway. When swapping the visible page, it wouldn't just be a matter of pointing the renderer at the appropriate buffer - it's kind of mix. The scrollback (the S in that diagram above) is shared between all pages. It's only the P+V part that changes when you switch the visible page.

the original DECs did not seem to have made a distinction between the "main visible" buffer and other pages.

That is correct, but they didn't support scrollback and paging at the same time, so the S section in the diagram above would not have existed when paging was enabled. But modern terminals do tend to support both, so I thought it would be best if we could handle that too. And if you want to retain scrollback (which is stored in the main buffer) when switching the visible page, there's got to be some form of buffer merging.

Copy link
Member

@lhecker lhecker Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what you wrote, and speaking long term, would you be in favor of us splitting the storage of the viewport/page and scrollback into separate buffers? I've been thinking about whether this would allow us to use a more efficient compression scheme for the scrollback, while using a more performant architecture for the viewport (= no run-length compression of attributes, etc.). It may also simplify building an infinite scrollback feature. Now, reading your comment, it seems that this may also simplify the paging implementation...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me, at least from a VT point of view. And if that split also enabled us to have the viewport/page storage being a different width to the scrollback buffer, that might be a convenient way to support VT windowing (not an essential feature, but it's something which might be nice to have one day).

The things I think would be most problematic would be the legacy console APIs, and buffer resizing. But if you want to support infinite scrollback, those areas are likely to be complicated regardless of the architecture.

{
// Page buffers are created on demand, and are sized to match the active
// page dimensions without any scrollback rows.
buffer = std::make_unique<TextBuffer>(pageSize, TextAttribute{}, 0, false, _renderer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW It'd be great if we could move to using shared_ptr<TextBuffer> over time so that we can more safely share buffers with the rendering thread, or simply between components.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I just though the unique ptr was more efficient, but I'm happy to change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the unique_ptr here and I don't think you need to change it. If we were to change it, I think we should make it consistent across the code base.

The only difference between unique_ptr and shared_ptr is that the latter allocates an 8 byte large control block which contains the strong and weak reference counts. Reference counting itself only occurs on copies and so moving a shared_ptr has the same overhead as a unique_ptr (= none, apart from the struct copy). The allocation of the control block can be avoided by using make_shared. I believe this makes shared_ptr as cost-free as unique_ptr.


Page PageManager::ActivePage() const
{
return Get(_activePageNumber);
Copy link
Member

@lhecker lhecker Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to improve performance quite a bit here by caching the active page and handing out a mutable reference only. (Same for VisiblePage.) Constructing and returning large structs like this is expensive, particularly with MSVC. (GetBufferAndViewport has the same overhead as Get, and we may avoid both overheads this way.)

However, I believe this cannot be done right now since the visibleViewport and isMainBuffer are managed by the _api. If the PageManager was in charge of the active and visible page, as well as the viewport1 then it could cache information in a robust manner I believe. I suppose this ties into my other comment re: architecture.

Footnotes

  1. "viewports"? I'm sure if viewport is a plural in VT world.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I believe this cannot be done right now since the visibleViewport and isMainBuffer are managed by the _api.

Yeah, I had hoped that one day we might avoid this issue by having AdaptDispatch register some sort of callback with the api, so it can get notified of buffer and viewport changes, instead of having to query them every time. But we always have to deal with the fact that the legacy console APIs are in control of those aspects of the architecture, so that information has to be passed across the api boundary one way or the other. Again, not something I want to try and solve now.

However, if it's the copying of these page objects that is causing the performance regression, that might still be something I can improve. When I have more time I'll do some experimenting and see if I can find a way to make that any faster.

@lhecker
Copy link
Member

lhecker commented Mar 25, 2024

FWIW I'm still somewhat worried about putting the PageManager into the VT specific code. However, I don't think I could suggest an alternative.
As we've talked about many times before, I would personally greatly prefer if there would not be 2 terminal API implementations, but rather just a single Terminal class. If such a single Terminal class existed, I think the pages should be stored there. I believe that this may long term simplify certain operations that we can't foresee yet.

@DHowett
Copy link
Member

DHowett commented Apr 23, 2024

Yesterday I asked Leonard to explain this to me so I'd be able to review it for 1.21. I'm sorry we let you marinate for 3 months! I'm on this today, for release next week or so.

Thanks so much, James, for everything.

@j4james
Copy link
Collaborator Author

j4james commented Apr 23, 2024

@DHowett Honestly I don't mind if you want to leave this for a while. I was hoping there might be a way to improve the performance based on Leonard's suggestion about page caching, but I've been putting off working on that because I was having too much fun actually using the new functionality (e.g. see VT-Rex).

That said, I also wouldn't mind if you chose to merge it as is, but I'm somewhat concerned this might be a feature that nobody else uses besides me, and so for everyone else it would just be an unwanted performance regression.

@lhecker
Copy link
Member

lhecker commented Apr 23, 2024

I don't think you need to worry about the performance too much - The regression is rather small. I forgot how much it was, but it was like -5% or something. The recent cursor caching PR (if you've seen it) alone brought us +20%, and there's a hundred more places like that.

IMO the best thing we can do for perf. is to instead shrink our architecture. If we have less interfaces, less adapters, less classes, and so on, and more data driven design overall, we'll probably straight up double our VT performance without even touching the underlying algorithms. In ConPTY for instance, if we ignore VtEngine, the actual text and attribute insertion including width measurement only consumes around 20% of CPU time in the worst case.


BTW if anyone (or you) wants to work on perf., here's something that would probably net us another >10% uplift, with the potential to improve performance much more in the long run (= it's an unblocker for other optimizations): DelayEOLWrap() is currently a boolean flag and it and GetDelayedAtPosition() are somewhat costly. We could implement it this way for instance:

void Cursor::DelayEOLWrap() noexcept
{
    return _cPosition.x >= _parentBuffer._width;
}

That is, we represent a delayed cursor by putting it into a column past the right end of the buffer. We would need to modify GetPosition to be something like

til::point Cursor::GetPosition() const noexcept
{
    const auto width = _parentBuffer._width;
    return { std::min(_cPosition.x, width - 1), _cPosition.y };
}

One major VT perf. cost is that our parser simply isn't a traditional one: It should ideally be implemented as a state machine with lookup table. Something like Alacritty's parser: https://github.com/alacritty/vte/blob/master/src/table.rs
This is because a major cost resides in StateMachine::_ActionParam and friends, as all those branches (including those abstracted away in the STL, etc.) are fairly costly overall (around 1/3rd of our total CPU cost).

AdaptDispatch::_DoLineFeed is also rather slow. I'm not entirely yet why, but maybe it could be "math'ified" a bit more?


P.S.: Another very difficult, but very impactful change would be calling AccessibilityNotifier functions only after StateMachine::ProcessString returns in DoWriteConsole. This would net us +100% perf in conhost (and later in WT if we ever get a conhost.lib). It can be done by accumulating a single til::point_span that covers the entire damaged region, and a total cumulative scroll offset. It's fine if the span covers too much area (for instance because of CUP sequences jumping around and changing things in multiple places), because such apps aren't even accessible now either.

However, I think the other changes above are more important right now as they simplify our architecture.

@j4james
Copy link
Collaborator Author

j4james commented Apr 24, 2024

DelayEOLWrap() is currently a boolean flag and it and GetDelayedAtPosition() are somewhat costly.

Just on this point, I don't think the boolean flag is avoidable, because delayed wrapping is not actually limited to the final column of the display. With margins applied, you can have a delayed wrap occurring almost anywhere. And technically I think you can probably set the flag manually with DECCIR, without even being on the margin.

@j4james
Copy link
Collaborator Author

j4james commented May 2, 2024

FYI, I've had a chance to experiment with the page caching, but I couldn't get it to make much of a difference in terms of performance. So what's here now is as good as it's going to get from me (at least in the short term). If you all are happy with it as it is, it would be nice to have it merged, but I don't mind if you decide against it.

@lhecker
Copy link
Member

lhecker commented May 2, 2024

Ah that's fine then. We're about to fork off release-1.21 now and I think afterwards we can merge it (for version 1.22). I can't speak for them, but it may still take a little bit, because Dustin & Mike are quite busy right now with the release of not just this, but sudo, bug fixes, etc.

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

holy crap I reviewed this like a month ago and just never hit ✅. My bad. Well, let's get this in for 1.22 now, shall we?

@DHowett
Copy link
Member

DHowett commented May 7, 2024

Alright! Let's get this merge conflict unconflicted and we can land this thing. Thank you for your patience.

James, generally speaking even if you're the only person who would benefit from a particular aspect of VT compatibility I'd still love for us to have it. 🙂

@j4james
Copy link
Collaborator Author

j4james commented May 10, 2024

@DHowett In case you haven't seen, the merge conflicts have been resolved now.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! I've read it all over, I think I understand it well enough, and I see the ways we can promote better buffer management across the rest of the codebase (which I think is one of Leonard's concerns about having Pages live in the Adapter).

Let's do it.

James, this feels like another tour-de-force. Thank you.

@DHowett DHowett added this pull request to the merge queue May 17, 2024
Merged via the queue into microsoft:main with commit 4a243f0 May 17, 2024
15 checks passed
@j4james j4james deleted the feature-paging branch May 21, 2024 00:01
@alabuzhev
Copy link
Contributor

this could serve as an alternative solution to #10810

Thanks, it looks promising.
I couldn't make it work in the latest OpenConsole though, is it entirely unsupported there?

@j4james
Copy link
Collaborator Author

j4james commented Jun 3, 2024

@alabuzhev If you're using the nightly/canary build it should be supported.

@alabuzhev
Copy link
Contributor

@j4james my bad, I was using an outdated one. Thanks.

github-merge-queue bot pushed a commit that referenced this pull request Jun 3, 2024
## Summary of the Pull Request

When we receive a stream of output at the bottom of the page that wraps
over more than two lines, that is expected to pan the viewport down by
multiple rows to accommodate all of the output. However, when the output
is received in a single write, that did not work correctly.

The problem was that we were reusing a `Page` instance across multiple
`_DoLineFeed` calls, and the viewport cached in that `Page` wasn't valid
after the first call. This PR fixes the issue by adjusting the cached
viewport when we determine it has been moved by `_DoLineFeed`.

## References and Relevant Issues

The bug was introduced in PR #16615 when paging support was added.

## Validation Steps Performed

I've verified that the test case in #17351 is now working correctly, and
have added a unit test covering this scenario.

## PR Checklist
- [x] Closes #17351
- [x] Tests added/passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-VT Virtual Terminal sequence support Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for the VT Paging APIs (NP, PP, PPA)
5 participants