Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve redraw performance by making Clip be a Region instead of Rectangle #3413

Closed
tig opened this issue Apr 17, 2024 · 22 comments · Fixed by #3798
Closed

Improve redraw performance by making Clip be a Region instead of Rectangle #3413

tig opened this issue Apr 17, 2024 · 22 comments · Fixed by #3798

Comments

@tig
Copy link
Collaborator

tig commented Apr 17, 2024

TUI apps need to be performant under highly-constrained conditions, such as over SSH. Currently the Terminal.Gui system for drawing tries to optimize/reduce redraws via ConsoleDriver.Clip which is a rectangle. This works ok but a system based on an irregular Region could be far more efficient.

Ages ago I experimented with re-implementing Clip using a list of rectangles in this PR:

Note I somehow overwrote the goodness in that PR, but much of the work is in older commits, but the essence was:

		/// <summary>
		/// Gets or sets the clip rectangle that <see cref="AddRune(Rune)"/> and <see cref="AddStr(ustring)"/> are 
		/// subject to. Setting this property is equivalent to calling <see cref="ClearClipRegion()"/>
		/// and <see cref="AddToClipRegion(Rect)"/>.
		/// </summary>
		/// <value>The rectangle describing the bounds of <see cref="ClipRegion"/>.</value>
		public Rect Clip {
			get {
				if (ClipRegion.Count == 0) {
					return new Rect (0, 0, Cols, Rows);
				}

				int minX = ClipRegion.Min (rect => rect.X);
				int minY = ClipRegion.Min (rect => rect.Y);
				int maxX = ClipRegion.Max (rect => rect.X + rect.Width);
				int maxY = ClipRegion.Max (rect => rect.Y + rect.Height);

				return new Rect (minX, minY, maxX - minX, maxY - minY);
			}
			set {
				ClearClipRegion ();
				AddToClipRegion (value);
			}
		}

		List<Rect> _clipRegion = new List<Rect> ();

I abandoned that PR because, at the time, ConsoleDriver and all the implementations were too fragile with way too much duplicated code. I also realized my "list of rectangles" idea was too naive and I was re-inventing the wheel: System.Drawing.Region already exists.

At some point, tackling this PR will be worthy.

If we can't get to it before releasing v2, at the least we should ensure the current View.SetClip API can be forward-compatible so fixing this Issue is not a breaking change to v2.

Other Discussions

I don't think an array of Rectangles is the right approach. I think we want to build the next-gen clipping support on a Region class that is either directly this:

https://learn.microsoft.com/en-us/dotnet/api/system.drawing.region?view=dotnet-plat-ext-8.0

Or, is a clone of it.

So, unless you built your experiment on Region, no.

By chance I noticed the Region class but I didn't use it to avoid having to add another library. I created a RectangleExtensions class to manipulate its Rectangle methods with the array. But since it didn't solve the problem with flickering and I didn't use the Region class, I'm going to give up on continuing with my PR.

Originally posted by @BDisp in #3408 (comment)

@dodexahedron
Copy link
Collaborator

Rectangle already has that functionality built-in, no?

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

Rectangle already has that functionality built-in, no?

I don't think so.

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

The Rectangle class can only return the union, intersection of two rectangles, but not greater than two. This requires more work and manipulation.
But in reality Region is composed of rectangles in which RegionData encapsulates the data contained in that region. It's actually a much more elaborate model in that it customizes the entire process into classes instead just of creating a static RectangleExtensions class.

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

But it is not possible to use the System.Drawing.Common package because it involves installing Microsoft.Win32.SystemEvents. Will it be compatible with other platforms? I don't think so.
A lot of CA1416: Validate platform compatibility messages appears.

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

In any case, the Region class works differently in relation to the static RectangleExtensions class that I created. Region initializes with an area defined by the constructor and in the class I created I can use several arbitrary areas in any area of the screen. Therefore, as long as a cell is within the limits of a certain area, it will be drawn.

image

In this case, instead of creating a Rectangle array I have to create a Region array, which will have the same effect, with the advantage of containing the RegionData.
@tig I could start working on this but just in case you haven't started any PRs in this regard yet.

@tig
Copy link
Collaborator Author

tig commented Apr 18, 2024

In any case, the Region class works differently in relation to the static RectangleExtensions class that I created. Region initializes with an area defined by the constructor and in the class I created I can use several arbitrary areas in any area of the screen. Therefore, as long as a cell is within the limits of a certain area, it will be drawn.

image

In this case, instead of creating a Rectangle array I have to create a Region array, which will have the same effect, with the advantage of containing the RegionData. @tig I could start working on this but just in case you haven't started any PRs in this regard yet.

Go for it!

@dodexahedron
Copy link
Collaborator

Cache locality is important to maintain for something like this.

Be sure to allocate the arrays from Memory<T> and use the pool to avoid excessive heap activity and fragmentation.

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

Cache locality is important to maintain for something like this.

Be sure to allocate the arrays from Memory<T> and use the pool to avoid excessive heap activity and fragmentation.

I'm using HashSet<T> to avoid duplicates Rectangle/RectangleF. Does Memory<T> avoids duplicates entries?

@dodexahedron
Copy link
Collaborator

If using HashSet, then just using the ArrayPool is fine.

Be sure to over-allocate up front to avoid re-allocation. Rectangles are cheap (16 bytes).

Example, whenever the HashSet is initially created:

HashSet<Rectangle> rectangleHashSet = [..ArrayPool<Rectangle>.Shared.Rent (256)];

This will occupy one page in memory (4K). Cheap, easy, unlikely to re-allocate, and very cache-friendly.

@dodexahedron
Copy link
Collaborator

But also, note that duplicates are really cheap and likely to cost less than advantages you can gain by being able to operate over Spans and slices of Memory and such, depending on what a duplicate means.

@dodexahedron
Copy link
Collaborator

Another thought:

Any time you have to enumerate the entire set, copy it to a span first and enumerate that. It makes a HUGE difference.

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

I want to prevent entering repeated values to avoid unnecessary repetitive recalculations.
Can you explain to me what you said about ArrayPool<T>? Does that mean I have to use it or does HashSet<T> use it internally?

@dodexahedron
Copy link
Collaborator

ArrayPool exists for more efficient allocation of arrays and reduced garbage collector activity.

All you need to do, in this case, is exactly as shown, when constructing the HashSet. After that, you don't have further control over what the HashSet will do, internally (and that's another reason you want to over-allocate as shown).

HashSet deals with arrays, under the hood, like most built-in collections. And, like most built-in collections, you can tell it the array to back itself with, initially, in the constructor. It still only contains as many elements as the HashSet itself is aware of - you've just pre-allocated the storage in a contiguous block and avoided internal re-allocation unless the collection grows beyond the initial allocation.

So nope - no extra work required beyond that. There are additional things you can do, but this is a perfectly good starting point.

@dodexahedron
Copy link
Collaborator

dodexahedron commented Apr 18, 2024

As for repetitive calculations:

You can perform several hundred operations on the stack in the time just a single operation that involves a heap access takes.

The heap is EXPENSIVE. Avoiding it as much as possible in hot paths is very valuable and is a significant reason most of the changes I've made are faster than the originals.

@dodexahedron
Copy link
Collaborator

That 256 number was important, BTW.

  • Best to use powers of 2.
  • 256 is unlikely to be exceeded.
  • 256 * 16 (size of a Rectangle) = 4K, which is one memory page.

@tig
Copy link
Collaborator Author

tig commented Apr 18, 2024

Why not just steal the code from System.Drawing.Region???

@BDisp
Copy link
Collaborator

BDisp commented Apr 18, 2024

Why not just steal the code from System.Drawing.Region???

Because it has many references to specific types of Windows Forms and I preferred to start from scratch adapting it to Terminal.Gui.

@dodexahedron
Copy link
Collaborator

dodexahedron commented Apr 19, 2024

Have you looked at some of the other existing drawing/graphics-related libraries out there? There are some that are fantastic and highly optimized. And if only some functionality is needed, specific code can be lifted if the licenses are compatible, to avoid bloat. Or we can just publish single-file and trimmed for the same result.

Even if a gfx library isn't intended for text-mode applications, you can still use the data structures, too. Pretty easy to turn a bitmap in memory into unicode for the screen buffer on the fly.

@dodexahedron
Copy link
Collaborator

If implementing from scratch, here are a couple of ideas for what might represent pretty significant but easy-to-code optimizations at very low cost:

One is to keep track of at least some of the negative space, such as just one rectangle of it.

For example, if the region looks like a big L, there would be whatever rectangles make up the L, and one rectangle that is the pre-computed rectangle none of the others touch. Hit testing and drawing could both then pay the cost of one check to that rectangle first, before checking positive space, likely speeding up most operations, especially if the negative space is large.

The other idea is to explicitly cache combined areas into a bitmap or other sort of structure, so that the entire stack of rectangles doesn't have to be computed for each iteration of the loop. You'd add a subscription to any change events on Views to invalidate the cache and only re-compute the cached structure when it's not valid.

These can also be combined.

@dodexahedron
Copy link
Collaborator

Another idea:

Create two or more bitmaps/collections of values - One or more of which are metadata for each screen coordinate and one of which is explicit content of that screen coordinate, if any exists. Metadata could be things such as what needs to happen to that pixel (process, skip, composite, etc) or a reference to the top View that would actually be responsible for drawing that character, or anything else that can be used to speed things up.

Such things would also be very vectorizable for SIMD by the compiler without additional work.

@dodexahedron
Copy link
Collaborator

I empathize with the desire to avoid a seemingly more complex scenario.

However, both of those issues are performance killers all by themselves, which defeats the purpose.

I haven't looked more in-depth at the code than those comments and still need to take some time to sit down and look at it all in-situ.

If something is being redone completely, with a specific purpose, we gotta be sure we're actually achieving the desired goal and doing so in a good way.

dynamic is a non-starter, though. The entire DLR is incompatible with trimming.

Generics may be required, but there may very well be other alternatives.

Since @tig has a workaround for the v2 CI builds that works just fine for now, I'll switch focus to having a look at the whole implementation, here.

I'm sure the overall design is probably good. But things like this are significant performance costs that are avoidable.

@dodexahedron
Copy link
Collaborator

Is the current code on your branch the latest you have?

I'm going to clone that branch so I can look at it in context.

tig added a commit that referenced this issue Nov 10, 2024
@tig tig closed this as completed in #3798 Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Status: 🆕 Not Triaged
3 participants