Skip to content

Conversation

Beilinson
Copy link

@Beilinson Beilinson commented Sep 24, 2025

Description

The goal of this PR is to improve performance with scene.drillPick, without changing the algorithm significantly.

The first commit changing the internal data structure of _pickObjects to Map was inspired by the performance boosts seen in #12896, and led to a surprising 75% decrease in time.

I tried further reducing unneeded code by removing some funny byte-float-byte transformations and reusing a scratch array for pixels in Context.readPixels, which also led to around ~10% improvements over the previous iteration.

Further, I realized that in sandcastles such as the one described in #9660, where may non-fully overlapping entities may be drawn, and the drill pick rectangle may be particularly large, instead of PickFramebuffer.end exiting early after one entity was found, it could continue counting entities and iterate over the entire pickPolygon. This led to the most significant performance boost, the attached sandcastle now takes ~30ms down from ~3s on my machine, since the slowest part of picking is the need to rerender for every single entity in the pick rectangle currently (i.e, 220 entities = 220 renders + 220 buffer reads + 220 iterations over bigger and bigger parts of the image)

Issue number and link

#9660

Testing plan

Sandcastle example modified from #9660:
local

image

main

image

Author checklist

  • I have submitted a Contributor License Agreement
  • I have added my name to CONTRIBUTORS.md
  • I have updated CHANGES.md with a short summary of my change
  • I have added or updated unit tests to ensure consistent code coverage
  • I have updated the inline documentation, and included code examples where relevant
  • I have performed a self-review of my code

Copy link

Thank you for the pull request, @Beilinson!

✅ We can confirm we have a CLA on file for you.

@Beilinson Beilinson changed the title Depth picking performance drill picking performance Sep 25, 2025
Copy link
Contributor

@mzschwartz5 mzschwartz5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Beilinson

Thanks for the contribution! I left a number of comments, but they're mostly minor. The performance improvement seems huge. I was particularly surprised the Map change gave a big difference, since the plain object {} is also backed by a hash table. (In contrast to #12896, where we were iterating over an array. Map had a huge impact there).

I just have a few questions, but they're pretty important:

  1. Paraphrasing a comment I left on PickFramebuffer.js, can you elaborate on why not returning early from PickFramebuffer.end speeds up picking? Seems counterintuitive to me.
  2. Following from that, does this performance gain come at the cost of some other use case of picking? (like, does it speed up drill picking at the cost of slowing down regular picking?)
  3. In the example sandcastle you linked under the testing section, it only picks 40 pins (compared to the 220 in your measurements). What do I change to increase it to 220?

Comment on lines 738 to 739
while (defined(pickedResults) && pickedResults.length > 0 && !shouldBreak) {
for (const pickedResult of pickedResults) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we don't need a for loop inside a while loop, right? I haven't fully processed this code, but what is the while loop doing, exactly? Can we just... remove it, and only use the for loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note- if we do really need this while loop, I think I'd rather see the inner for loop as its own function (to be frank, I'd prefer that the existing code had been written as several smaller functions).

Then, you would no longer need shouldBreak - you could just say if !myForLoopFunction(...) break.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH i would've liked to use a label on the outer while loop rather than the shouldBreak flag, but I see your point. I''m not sure refactoring it out into a separate function would be great, this is the bread and butter of this method and it would need to get multiple parameters to refactor. thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the double loops:

  1. while loops as long as there are entities (and hasn't reached the limit) and redoes the pickCallback
  2. for loop to actually iterate over all the entities and add/break once hit the limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to refactor the inner for loop into its own function, the args would bepickedResults, results, and pickedPrimitives - did I miss anything? That doesn't sound too bad to me.

Unrelated, is defined(pickedResults) a necessary while loop condition? Isn't pickedResults guaranteed to be defined, it just may be 0-length?

Copy link
Author

@Beilinson Beilinson Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRayIntersection in Picking.js returns object[] | undefined, which is used as the drillPick callback in getRayIntersections. Do you prefer me to change it to return Frozen.EMPTY_ARRAY?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the for loop to an external function, it does take quite a few arguments but overall looks cleaner to me

@Beilinson
Copy link
Author

Beilinson commented Sep 26, 2025

Hey @mzschwartz5 , thanks for the review!

I was particularly surprised the Map change gave a big difference, since the plain object {} is also backed by a hash table. (In contrast to #12896, where we were iterating over an array. Map had a huge impact there).

delete obj[key] is very slow on objects compared to map.delete(key) (about 75% difference apparently) source. Because the picking is a depth peeling algorithm, the entire _pickObjects map is reconstructed every iteration, which means deleting and reinserting all keys each time, which is where much of the slowdown came from.

1. Paraphrasing a comment I left on `PickFramebuffer.js`, can you elaborate on why _not_ returning early from `PickFramebuffer.end` speeds _up_ picking? Seems counterintuitive to me.

The drill picking implemented by Cesium is similar to Depth Peeling. Each iteration, entities are rendered to some offscreen buffer, that buffer is then read into memory and then we iterate over the pixels (spirally from the center, this is primarily used for picking the entities "closest" to the pointer, not really useful when picking an actual rectangle). The old algorithm would return the moment one primitive was found.

The sandcastle demonstrates a scenario where picking isn't directly of a 3x3 pixel rectangle at the pointer, rather a significant portion of the screen. This is also a use case we have in our enterprise system, where we support "drag selecting" multiple entities.

The important thing is depth peeling is critical when entities are rendered 100% directly on top of another and you are picking a very small (3x3) area. In this use case however, the entities dont have a significant amount of overlap, meaning each iteration we could find more entities rendered, but instead we exit early to re-render everything and re-read everything.

By letting the PickFramebuffer continue trying to search for entities, we solve this so over a large rectangle at the given zoom in the sandcastle all entities are found in one pass, and only one re-render is needed (to make sure that there are no entities under those ones). Thats the significant performance boost.

2. Following from that, does this performance gain come at the cost of some other use case of picking? (like, does it speed up drill picking at the cost of slowing down regular picking?)

No, regular picking still picks a (3x3 by default) rectangle, and stops after finding the first entity (limit=1). Also I didn't remove the spiral code (https://github.com/CesiumGS/cesium/blob/ce6c3e28fc7e550b32a5c67335aa4490d0e84c34/packages/engine/Source/Scene/PickFramebuffer.js#L82C3-L86C83), so that also still works the same.

3. In the example sandcastle you linked under the testing section, it only picks 40 pins (compared to the 220 in your measurements). What do I change to increase it to 220?

Probably has to do with screen size, you could try resizing the viewer size or alter the sandcastle so the pickRectangle is the entire canvas, but that would be even more unfair

@Beilinson
Copy link
Author

Beilinson commented Sep 26, 2025

Essentially this PR has two separate (not necessarily exclusive) improvements:

  1. The ~75% speedup in drillPicking a small area (3x3) with many overlapping entities (due to the Map switch)
  2. A more significant speedup (at best, N=# of entities, then from O(N) to O(1)), when picking a large area with many spread out non overlapping entities

To be precise, 2 speeds up from O(N) to O(N2), where N2 is the maximum # of fully overlapping entities at the given zoom and render resolution.

@mzschwartz5
Copy link
Contributor

mzschwartz5 commented Sep 28, 2025

@Beilinson

No, regular picking still picks a (3x3 by default) rectangle, and stops after finding the first entity (limit=1).

Got it. So the behavior of regular picking is exactly the same as before this PR, because it uses limit=1. Perfect.

I guess the natural follow up is then, does this slow down the performance of regular 3x3 drill picking, where we most likely do want 1 result at a time? (I need to go over the PR again, and maybe I'll figure out the answer myself, but good to document it anyway).


Took another pass, still interested in your response to the above question, but otherwise I think this is looking pretty good barring those few style-related discussions.

@Beilinson
Copy link
Author

Beilinson commented Sep 29, 2025

@mzschwartz5

I guess the natural follow up is then, does this slow down the performance of regular 3x3 drill picking, where we most likely do want 1 result at a time? (I need to go over the PR again, and maybe I'll figure out the answer myself, but good to document it anyway).

It's the exact same code flow as previously because it returns early after finding one result (limit <=0) break in PickFramebuffer.prototype.end, so should be the same performance as before

@mzschwartz5
Copy link
Contributor

@Beilinson

It's the exact same code flow as previously because it returns early after finding one result (limit <=0) break in PickFramebuffer.prototype.end, so should be the same performance as before

No, I don't think this is true. Take this example: I'm doing a 10x10 drill pick, on a region of the screen where 5 entities are overlapped, and I've set the limit to 5. On each pick pass, now, even once we've found 1 entity, PickFramebuffer.prototype.end will continue to search through the whole 10x10 region because it hasn't hit the limit on this pass - even though there's nothing left to find on this layer. Does that sound right to you? (I'm no expert on picking but that's my understanding of how this works)

Essentially, this PR is a tradeoff - better performance for large pick rectangles when entities are non-overlapping, at the same depth, but worse performance when they are overlapping at various depths. Is the tradeoff worth it? We might need to discuss. The primary use-case of drill picking is to pick objects that are overlapping each other at various depth layers.

So iff this has a significant performance impact on large-rectangle drill picks containing overlapping objects, we may want to consider other options. (like maybe a new picking API specifically for this type of picking?).

@mzschwartz5
Copy link
Contributor

mzschwartz5 commented Sep 29, 2025

Also, won't this change which entities get picked? The old method prioritizes entities closest to the mouse position, depth-first if you will. The new method prioritizes breadth-first.

This is probably more important of a question than the performance one above, as I assume that, overall, performance is probably better all-around given the other changes (map, rgb conversion)

@Beilinson
Copy link
Author

Ah sorry I misread your question. Yes you're right, this PR causes breadth-first instead of depth-first search.

Regarding the fact that you iterate over empty pixels after already finding an entity rather than early return, I took the liberty to benchmark the iteration speed before and after my map+byte conversion improvements:

New sandbox In the new example, I click directly on the entities, which are all 100% overlapping

Main:
Picking between 469.0,210.0 with a 11x11 rectangle
Picked 400 pins in 471ms.

Color + Map improvements:
Picking between 451.2,186.6 with a 11x11 rectangle
Picked 400 pins in 358ms.

Color+Map+Iterate over all pixels:
Picking between 452.0,203.4 with a 11x11 rectangle
Picked 400 pins in 367ms.

I benchmarked all these several times, and the numbers weren't 100% consistent obviously but this was the average first click time.

As a user of both scenarios, I feel a 10ms slowdown (which isn't felt because this pr also gives 100ms speedup) for a pick of 400 entities, while getting a O(N) improvement over non-overlapping entities is worth it. Iterating over 100 pixels really isn't all that slow when all the iteration does is check a map

@Beilinson
Copy link
Author

Also, won't this change which entities get picked? The old method prioritizes entities closest to the mouse position, depth-first if you will. The new method prioritizes breadth-first.

This is probably more important of a question than the performance one above, as I assume that, overall, performance is probably better all-around given the other changes (map, rgb conversion)

Agreed this is 100% a more important question. On any sized rectangle, given no limit, the result will be the same (different order, but for large picks thats an improvement). On small picks (3x3) it is most likely guranteed to be the same as before, because I still iterate centrally outwards, meaning if any outer entities pixels exist they are are probably of entities hidden by the central entity.

Given a large rectangle, and a small limit, and a mix of overlapping and non-overlapping entities, then the results will be different - drillPick will now prioritize visible entities over completely hidden ones.

From my point of view this is a matter of intention: as a user, we use drillPick with the default 3x3 to pick things directly under the mouse. The use of drillPick with a large rectangle is to drag select, in which case if we have a limit (which we do) we prefer to highlight the visible entities over the hidden ones.

If there is a use-case for drill picking a large rectangle and still preferring a depth-first search, I am open to adding a flag which will early return after 1 entity is found. I do think breadth-first should be the default, because of its improved performance. If needed, I could add a warning in the changelog

@mzschwartz5
Copy link
Contributor

That's a pretty fair assessment on both fronts. I agree about the performance changes being a non-issue.

I'm going to seek input from my teammates on the selection ordering; I think it could be OK, and may just warrant a note in the change log as you said.

@Beilinson
Copy link
Author

Found this #3018 randomly looking through issues related to performance/memory leaks, realized that this PR implements the transition to Map as mentioned there, so this could close that I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants