-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer performance improvements #791
Comments
Another approach that could be mixed in: use indexeddb, websql or filesystem api to page out inactive scrollback entries to disk 🤔
|
Great proposal. I agree with that 3. is the best way to go for now as it lets us save memory while supporting true color as well. If we reach there and things continue to go well, we can then optimize as proposed in 5. or in any other way that comes in our minds at that time and makes sense. 3. is great 👍. |
@mofux, while there is definitely a use case for using disk-storage-backed techniques to reduce the memory footprint, this can degrade the user experience of the library in browser environments that ask the user for permission for using disk storage. |
Regarding Supporting the future: |
+100 about WebWorker in the future, but I think we need change list versions of browsers which we are supporting because not all off them can use it... |
When I say @mofux good thinking with @AndrienkoAleksandr yeah if we wanted to use |
Wow nice list :) I also tend to lean towards 3. since it promises a big cut in memory consumption for over 90% of the typical terminal usage. Imho memory optimisation should be the main goal at this stage. Further optimization for specific use cases might be applicable on top of this (what comes to my mind: "canvas like apps" like ncurses and such will use tons of single cell updates and kinda degrade the @AndrienkoAleksandr yeah I like the webworker idea too since it could lift some burden from the mainthread. Problem here is (beside the fact that it might not be supported by all wanted target systems) is the some - the JS part is not such a big deal anymore with all the optimizations, xterm.js has seen over the time. Real issue performance-wise is the layouting/rendering of the browser... @mofux The paging thing to some "foreign memory" is a good idea, though it should be part of some higher abstraction and not of the "gimme an interactive terminal widget thing" that xterm.js is. This could be achieved by an addon imho. Offtopic: Did some tests with arrays vs. typedarrays vs. asm.js. All I can say - OMG, it is like |
@jerch to clarify, is that arrays vs typedarrays is 1:1 to 1:5? |
Woops nice catch with the comma - i meant |
@jerch cool, a 50% speed up from typedarrays in addition to better memory would definitely be worth investing in for now. |
Very fragile right now, only works for basic IO Part of xtermjs#791
Idea for memory saving - maybe we could get rid of the |
@jerch we need to access it quite a bit, and we can't lazy load it or anything because when reflow comes we will need the width of every character in the buffer. Even if it was fast we might still want to keep it around. Might be better to make it optional, assuming 1 if it's not specified: type CharData = [string, number?]; // not sure if this is valid syntax
[
// 'a'
['a'],
// '文'
['文', 2],
// after wide
['', 0],
...
] |
Some more flags we'll support in the future: #580 |
Another thought: Only the bottom portion of the terminal ( Managing the bottom dynamic portion of the terminal independently and pushing to scrollback when the line is pushed out could lead to some significant gains. For example, the bottom portion could be more verbose to favor modifying of attributes, faster access, etc. whereas the scrollback can be more of an archival format as proposed in the above. |
Another thought: it's probably a better idea to restrict eg: To the right of the red/green diffs are unstyled "blank" cells. |
Regarding a possible UTF8 input encoding and the internal buffer layout I did a rough test. To rule out the much higher impact of the canvas renderer on the total runtime I did it with the upcoming webgl renderer. With my
The playground branch does an early conversion from UTF8 to UTF32 before doing the parsing and storing (conversion adds ~ 30 ms). The speedup is mainly gained by the 2 hot functions during input flow, Summary: Conclusion: |
Buffer layout for the upcoming true color support: Currently the typed array based buffer layout is the following (one cell):
where Idea is to rearrange the bits to a better packrate to make room for the additional RGB values:
Proside of this approach is the relatively cheap access to every value by one index acess and max. 2 bit operations (and/or + shift). The memory footprint is stable to the current variant, but still quite high with 12 bytes per cell. This could be further optimized by sacrificing some runtime with switching to UTF16 and an
Now we are down to 4 bytes per cell + some room for the attrs. Now the attrs could be recycled for other cells, too. Yay mission accomplished! - Ehm, one second... Comparing the memory footprint the second approach clearly wins. Not so for the runtime, there are three major factors that increase the runtime alot:
The sexiness of the second approach is the additional memory saving. Therefore I tested it with the playground branch (see comment above) with a modified Yeah, we are kinda back to where we started from before changing to UTF8 + typed arrays in the parser. Memory usage dropped from ~ 1.5 MB to ~ 0.7 MB though (demo app with 87 cells and 1000 lines scrollback). From here on its a matter of saving memory vs. speed. Since we already saved alot memory by switching from js arrays to typed arrays (dropped from ~ 5.6 MB to ~ 1.5 MB for the C++ heap, cutting off the toxic JS Heap behavior and GC) I think we should go here with the speedier variant. Once memory usage gets a pressing issue again we still can switch over to a more compact buffer layout as described in the second approach here. |
I agree, let's optimise for speed as long as memory consumption is not a concern. I'd also like to avoid the indirection as far as possible because it makes the code harder to read and maintain. We already have quite a lot of concepts and tweaks in our codebase that make it hard for people (including me 😅) to follow the code flow - and bringing in more of these should always be justified by a very good reason. IMO saving another megabyte of memory doesn't justify it. Nevertheless, I'm really enjoying reading and learning from your exercises, thanks for sharing it in such a great detail! |
@mofux Yeah thats true - code complexity is much higher (UTF16 surrogate read ahead, intermediate codepoint calcs, tree container with ref counting on attr entries). |
There is one big advantage of indirecting to an attr object: It is much more extensible. You can add annotations, glyphs, or custom painting rules. You can store link information in a possibly-cleaner and more efficient way. Perhaps define an One idea is to use two arrays per BufferLine: a Uint32Array, and ICellPainter array, with one element each for each cell. The current ICellPainter is a property of the parser state, and so you just reuse the same ICellPainter as long as the color/attribute state doesn't change. If you need to add special properties to a cell, you first clone the ICellPainter (if it might be shared). You can pre-allocate ICellPainter for the most common color/attribute combinations - at the very least have a unique object corresponding to the default colors/attributes. Style changes (such as changing default foreground/background colors) can be implemented by just updating the corresponding ICellPainter instance(s), without having to update each cell. There are possible optimizations: For example use different ICellPainter instances for single-width and double-width characters (or zero-width characters). (That saves 2 bits in each Uint32Array element.) There are 11 available attribute bits in Uint32Array (more if we optimize for BMP characters). These can be used to encode the most common/useful color/attribute combinations, which can be used to index the most common ICellPainter instances. If so, the ICellPainter array can be allocated lazily - i.e. only if some cell in the line requires a "less-common" ICellPainter. One could also remove the _combined array for non-BMP characters, and store those in the ICellPainter. (That requires a unique ICellPainter for each non-BMP character, so there is a tradeoff here.) |
@PerBothner Yeah an indirection is more versatile and thus better suited for uncommon extras. But since they are uncommon Id like not to optimize for them in the first place. Few notes on what I've tried in several testbeds:
Now this flat 32 bit layout turns out to be optimized for the common stuff and uncommon extras are not possible with it. True. Well we still have markers (not used to them so I cannot tell right now what they are capable of), and yepp - there are still free bits in the buffer (which is a good thing for future needs, e.g. we could use them as flags for special treatment and such). Tbh for me its a pity that the 16 bit layout with attrs storage performs that bad, halving the memory usage is still a big deal (esp. when ppl start to use scroll lines >10k), but the runtime penalty and the code complexity outweight the higher mem needs atm imho. Can you elaborate on the ICellPainter idea? Maybe I missed some crucial feature so far. |
My goal for DomTerm was to enable and encourage richer interaction just what is enabled by a traditional terminal emulator. Using web technologies enables many interesting things, so it would be a shame to just focus on being a fast traditional terminal emulator. Especially since many use cases for xterm.js (such as REPLs for IDEs) can really benefit from going beyond simple text. Xterm.js does well on the speed side (is anyone complaining about speed?), but it does not do so well on features (people are complaining about missing truecolor and embedded graphics, for example). I think it may be worthwhile to focus a bit more on flexibility and slightly less on performance. "Can you elaborate on the ICellPainter idea?" In general, ICellPainter encapsulates all the per-cell data except the character code/value, which comes from the Uint32Array. That is for "normal" character cells - for embedded images and other "boxes" the character code/value may not make sense.
Mapping a cell to ICellPainter can be done various ways. The obvious is for each BufferLine to have a ICellPainter array, but that requires an 8-byte pointer (at least) per cell. One possibility is to combine the _combined array with the ICellPainter array: If the IS_COMBINED_BIT_MASK is set, then the ICellPainter also includes the combined string. Another possible optimization is to use the available bits in the Uint32Array as an index into an array: That adds some extra complication and indirection, but saves space. |
I'd like to encourage us to check if we can do it the way monaco-editor does it (I think they found a really smart and performant way). Instead of storing such information in the buffer, they allow you to create // decorations are buffer-dependant (we need to know which buffer to decorate)
const decoration = buffer.createDecoration({
type: 'link',
data: 'https://www.google.com',
range: { startRow: 2, startColumn: 5, endRow: 2, endColumn: 25 }
}); Later on a renderer could pick up those decorations and draw them. Please check out this small example that shows how the monaco-editor api looks like: For things like rendering pictures inside the terminal monaco uses a concept of view zones that can be seen (among other concepts) in an example here: |
@PerBothner Thx for clarification and the sketchup. A few notes on that. We eventually plan to move the input chain + buffer into a webworker in the future. Thus the buffer is meant to operate on an abstract level and we cannot use any render/representation related stuff there yet like pixel metrics or any DOM nodes. I see your needs for this due to DomTerm being highly customizable, but I think we should do that with an enhanced internal marker API and can learn here from monaco/vscode (thx for th pointers @mofux). I am still not satisfied with the outcome of the 16 bit layout test results. Since a final decision is not yet pressing (we wont see any of this before 3.11), I gonna keep testing it with a few changes (its still the more intruiging solution for me than the 32 bit variant). |
I also think we should go with something close to this to start, we can explore other options later but this will probably be the easiest to get up and running. Attribute indirection definitely has promise IMO as there typically aren't that many distinct attributes in a terminal session.
Something like this is where I'd like to see things go. One idea I had along these lines was to allow embedders to attach DOM elements to ranges to enable custom things to be drawn. There are 3 things I can think of at the moment that I'd like to accomplish with this:
All of these could be achieved with an overlay and it's a pretty approachable type of API (exposing a DOM node) and can work regardless of renderer type. I'm not sure we want to get into the business of allowing embedders to change how background and foreground colors are drawn. @jerch I'll put this on the 3.11.0 milestone as I consider this issue finished when we remove the JS array implementation which is planned for then. #1796 is also planned to be merged then, but this issue was always meant to be about improving the buffer's memory layout. Also, a lot of this later discussion would probably be better had over at #484 and #1852 (created as there wasn't a decorations issue). |
@Tyriar Woot - finally closed 😅 |
🎉 🕺 🍾 |
Problem
Memory
Right now our buffer is taking up too much memory, particularly for an application that launches multiple terminals with large scrollbacks set. For example, the demo using a 160x24 terminal with 5000 scrollback filled takes around 34mb memory (see microsoft/vscode#29840 (comment)), remember that's just a single terminal and 1080p monitors would likely use wider terminals. Also, in order to support truecolor (#484), each character will need to store 2 additional
number
types which will almost double the current memory consumption of the buffer.Slow fetching of a row's text
There is the other problem of needing to fetch the actual text of a line swiftly. The reason this is slow is due to the way that the data is laid out; a line contains an array of characters, each having a single character string. So we will construct the string and then it will be up for garbage collection immediately afterwards. Previously we didn't need to do this at all because the text is pulled from the line buffer (in order) and rendered to the DOM. However, this is becoming an increasingly useful thing to do though as we improve xterm.js further, features like the selection and links both pull this data. Again using the 160x24/5000 scrollback example, it takes 30-60ms to copy the entire buffer on a Mid-2014 Macbook Pro.
Supporting the future
Another potential problem in the future is when we look at introducing some view model which may need to duplicate some or all of the data in the buffer, this sort of thing will be needed to implement reflow (#622) properly (#644 (comment)) and maybe also needed to properly support screen readers (#731). It would certainly be good to have some wiggle room when it comes to memory.
This discussion started in #484, this goes into more detail and proposes some additional solution.
I'm leaning towards solution 3 and moving towards solution 5 if there is time and it shows a marked improvement. Would love any feedback! /cc @jerch, @mofux, @rauchg, @parisk
1. Simple solution
This is basically what we're doing now, just with truecolor fg and bg added.
Pros
Cons
2. Pull text out of CharData
This would store the string against the line rather than the line, this would probably see very large gains in selection and linkifying and would be more useful as time goes on having quick access to a line's entire string.
Pros
Int32Array
Cons
3. Store attributes in ranges
Pulling the attributes out and associating them with a range. Since there can never be overlapping attributes, this can be laid out sequentially.
Pros
.flags
instead of[0]
)Cons
4. Put attributes in a cache
The idea here is to leverage the fact that there generally aren't that many styles in any one terminal session, so we should not create as few as necessary and reuse them.
Pros
.flags
instead of[0]
)Cons
5. Hybrid of 3 & 4
Pros
.flags
instead of[0]
)Cons
CharAttributes
per block?CharAttributeEntry
object6. Hybrid of 2 & 3
This takes the solution of 3 but also adds in a lazily evaluates text string for fast access to the line text. Since we're also storing the characters in
CharData
we can lazily evaluate it.Pros
.flags
instead of[0]
)Cons
Solutions that won't work
Int32Array
will not work as it takes far to long to convert the int back to a character.The text was updated successfully, but these errors were encountered: