Proposal: revamp the refresh method for better rendering performance #150

jerch · 2016-06-28T11:22:26Z

To start the discussion about the screen representation and rendering process here are some first ideas:

pre container instead of div
pre context can be rendered faster by most browser engines due to the reduced capabilities. Downside - reduced capabilities ;)
document fragments
Although this needs more JS code with all the explicit node manipulation it is super fast compared to innerHTML. IMHO a must-do ;)
clever caching
The emulator is caching real DOM nodes at line level at the moment. Maybe another caching strategy in combination with fragments might lift the burden to some degree.
use canvas instead of DOM nodes
This is a hard one, not sure if this is worth the trouble at all. Would give the full power of what is going to be shown. Downside - needs tons of JS code to get the layouting done and all those nifty events we get with DOM nodes for free. No clue if it will show better performance in the end since text rendering on canvas is not that good. Maybe someone has experience with that, therefore open for discussion...
spot and rearrange style requesting calls
Those calls will almost always trigger an incremental synchronous re-layouting (full stop until reflow is done). Needs to be checked...

Somewhat offtopic but still performance relevant:

scrolling
What about native scrolling? Other ideas?
output alignment
Some unicode chars will break the terminal grid. Needs some CSS rules, maybe even some kind of font glyph measuring tricks.

The text was updated successfully, but these errors were encountered:

Tyriar · 2016-06-28T22:54:58Z

While canvas may be more performant, getting ti to act like a text editor may be tricky (selection, context menus, IMEs, etc.), also it's a very radical change that makes it behave significantly different. It would be good if there was separation between the rendering and terminal state, a renderer could then be swapped in or out allowing for choice. Not really something that needs attention immediately though.

The first 2 points will have by far the biggest impact imo, that's where we should focus. Resolving point 2 should also fix the layout thrashing (point 5). I still plan on picking up the document fragment change when I get some time away from vscode stabilization (1-2 weeks).

Also on native scrolling, this is only possible if the entire buffer is created in the DOM. This would be an enormous memory footprint for little payoff.

parisk · 2016-06-29T07:44:28Z

My two cents here:

Let's leave the canvas approach out of the discussion for now. Since this will need significant effort to provide the proper UX, we should have clear pros and cons between the two approaches and actually we should have reached a point where text rendering should be considered doomed.
As far as the native scrolling goes, I am pretty sure we can do this without rendering the whole buffer. CodeMirror (https://github.com/codemirror/codemirror) does this also, but I have not dug into the internals yet.

My proposal on this is first try to replace divs with pres in a PR dedicated to this goal and see how this goes and then move forward with further optimizations step by step.

jerch · 2016-06-29T10:00:32Z

I second your remarks about the canvas idea. Too much of an effort for an uncertain result. Might be worth a look again once a bigger overhaul is needed, but not at the current state.

Imo missing in the list above - CSS optimization. Here is a nice overview of CSS concerns. Atm there are two selectors that might be performance critical - .terminal:not(:focus) .terminal-cursor and .terminal .xterm-rows > div.

Gonna go for some pre vs. div tests first.

jerch · 2016-06-29T12:59:33Z

Native scrolling might be possible in combination with partial buffer rendering upon scrolling. Clusterize.js uses such a trick to show tons of data in a list.

Tyriar · 2016-06-29T18:43:09Z

@jerch I don't think those selectors should be particularly badly performing. .terminal:not(:focus) shouldn't be bad as it's rooted on .terminal so it only needs to check a single element for focus (O(1)). Just :not(focus) would be bad since it would need to check every element for focus (O(n)).

Similar with .terminal .xterm-rows > div. If anything the direct descendant selector would improve performance by only applying to the children of .xterm-rows.

jerch · 2016-06-30T13:40:46Z

As far as I understand the concerns the evaluation from right to left of the selector makes them counterintuitive. For .terminal .xterm-rows > div this kinda selects all divs in the page to filter in the next step divs with parents of class 'xterm-rows'. The following descendant rule will then filter the divs with a 'terminal' class ancestor. If that's still the case, this will get really ugly within a more complex page.

But the referred document is pretty old, I am not sure, if it is still that bad (http://calendar.perfplanet.com/2011/css-selector-performance-has-changed-for-the-better/)

Since I dont know how to test the CSS stuff reliably, I would not spend to much time for it.

jerch · 2016-07-01T00:34:12Z

Now I feel like Don Quixote tilting at windwills - ever wondered what this greyish pie in dev tools timeline summary is? It is "Other". Hmm.

I ran several tests with different settings to get better output performance and was able to lower the total runtime of ls -lR /usr/lib from 6s to 5s here. Dev tools say the rendering takes now about 500ms and the JS scripting about 2000ms. Sweet. Mission accomplished.
Wait - 2.5s in dev tools but the whole command takes about 5s. Isn't there something wrong? Oh yeah there is this greyish pie. It was always there. Always at around 50%. Hmm. Must be god-given. Dont't ask, even google cant't tell you. It is "Other", leave it be.

Dramatic prologue, short answer - websockets are slow. I switched off all the frontend handling (no parsing, no output) to see the raw performance. ls -lR /usr/lib still took around 3s to get into chrome. Btw the command generates 5.3 MB of data on my system, which is a lousy throughput of 1.8 MB/s (local delivery - wth).

By playing with the server side buffer size I was able to raise the throughput to 3.5 MB/s. Seems the websocket stack itself is to heavy to cope with many short data and would benefit from buffering before sending.

current master:

websocket buffered (plus some other rendering changes):

Change:
You can see and test it here.

jerch · 2016-07-01T09:32:26Z

Did some tests with pre. It is on par to slightly worse. Imo not worth any further testing.

Another possible optimization - the terminal container .terminal is not explicit styled atm in the demo. This triggers a full reflow up to document in chrome and adds ~1ms to a single layout and paint. In a more complex embedding page this gonna hurt much more. It can be fixed with setting explicit values for width, height and overflow (info).

parisk · 2016-07-01T15:45:31Z

Thanks a lot for all these information @jerch. They are really useful. I will take a deeper look later, because I couldn't find the time today. Could you please provide us with some more information about the websocket server you used and the settings you used to set up buffering?

jerch · 2016-07-03T18:53:30Z

@parisk You can find the adjustments here. The websocket buffer is just a quick hack to your app.js to see the performance difference. The changes should be applicable to any websocket lib (in fact the buffer is before websocket invocation), though Im not sure if it will show such an impact on a remote system, comm latency might just swallow the effect idk.
Now the total runtime is almost on par with xterm on my computer, but with less output rendering. The rendering still needs some attention since a single rendering is still quite heavy (fragments with some caching tweaks will show some impact imo - I leave that to Tyriar to test).

Also the separation of the terminal inner state handling and the rendering might be a goal (as Tyriar mentioned above). This would give other optimization options like using web workers for not DOM related work. I did a quick&dirty test with one worker thread for websocket comm and state handling and sole rendering in the main thread with these results:

As you can see the main thread is now idle half of the time giving back precious CPU time for other stuff. The striped yellow part is the worker "Scripting" (almost at 100%). With this approach even FF is able to show frames again. (FF just blocks atm until the command is done).
Still much room for improvement...

parisk · 2016-07-11T18:19:10Z

@jerch thanks for providing all this data. After releasing 1.0.0 this week, I will get on top of this, in order to set up a "roadmap" or course of action in order to tune the rendering process.

Tyriar · 2016-12-31T19:43:08Z

The situation now is certainly better than before, ls -lR /usr/lib only takes around 2s for me

Tyriar · 2017-01-01T00:05:55Z

Looking at this now after the changes I've done as part of microsoft/vscode#17875 I don't think it's worth pursuing any of these points. While caching would be nice, it comes at the cost of potentially quite significant memory overhead and performance is fine as is imo. Caching is less likely to be needed now also due to the queue change (#438) which will likely see the entire viewport change a lot more often than individual rows in which case caching would only help scrollback at the cost of lots of memory.

jerch mentioned this issue Jun 28, 2016

Keep slave program in sync with terminal #146

Closed

Tyriar mentioned this issue Jul 5, 2016

Terminal does not support ballistic scroll microsoft/vscode#8343

Closed

Tyriar mentioned this issue Jul 20, 2016

It would be cool if xterm.js described its mission or distinguishing features #198

Closed

Tyriar mentioned this issue Sep 11, 2016

A blocking process freezes entire ide. microsoft/vscode#11805

Closed

Tyriar mentioned this issue Oct 25, 2016

Integrated terminal unlimited buffer microsoft/vscode#14320

Closed

This was referenced Dec 28, 2016

Improve xterm.js performance microsoft/vscode#17875

Closed

Performance of refresh is still subpar #134

Closed

Tyriar closed this as completed Jan 1, 2017

jerch mentioned this issue Jan 1, 2017

Improve refresh queue #438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: revamp the refresh method for better rendering performance #150

Proposal: revamp the refresh method for better rendering performance #150

jerch commented Jun 28, 2016 •

edited

Loading

Tyriar commented Jun 28, 2016 •

edited

Loading

parisk commented Jun 29, 2016

jerch commented Jun 29, 2016

jerch commented Jun 29, 2016

Tyriar commented Jun 29, 2016 •

edited

Loading

jerch commented Jun 30, 2016 •

edited

Loading

jerch commented Jul 1, 2016 •

edited

Loading

jerch commented Jul 1, 2016

parisk commented Jul 1, 2016

jerch commented Jul 3, 2016 •

edited

Loading

parisk commented Jul 11, 2016

Tyriar commented Dec 31, 2016

Tyriar commented Jan 1, 2017

Proposal: revamp the refresh method for better rendering performance #150

Proposal: revamp the refresh method for better rendering performance #150

Comments

jerch commented Jun 28, 2016 • edited Loading

Tyriar commented Jun 28, 2016 • edited Loading

parisk commented Jun 29, 2016

jerch commented Jun 29, 2016

jerch commented Jun 29, 2016

Tyriar commented Jun 29, 2016 • edited Loading

jerch commented Jun 30, 2016 • edited Loading

jerch commented Jul 1, 2016 • edited Loading

jerch commented Jul 1, 2016

parisk commented Jul 1, 2016

jerch commented Jul 3, 2016 • edited Loading

parisk commented Jul 11, 2016

Tyriar commented Dec 31, 2016

Tyriar commented Jan 1, 2017

jerch commented Jun 28, 2016 •

edited

Loading

Tyriar commented Jun 28, 2016 •

edited

Loading

Tyriar commented Jun 29, 2016 •

edited

Loading

jerch commented Jun 30, 2016 •

edited

Loading

jerch commented Jul 1, 2016 •

edited

Loading

jerch commented Jul 3, 2016 •

edited

Loading