Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchedMesh: Add support for Instanced rendering with sorting, frustum culling #28404

Merged
merged 12 commits into from
May 22, 2024

Conversation

gkjohnson
Copy link
Collaborator

@gkjohnson gkjohnson commented May 17, 2024

Related issue: #25059, #28103

Description

This PR makes some changes and additions to the BatchedMesh class to support rendering multiple copies of the same geometry without redundant memory usage. This just uses the regular multiDraw functions so we retain support for object sorting and frustum culling. Sorting is done by sorting the draw ranges to upload as well as a uint id buffer that indexes into the matrix texture. A batchId geometry attribute is no longer needed.

You can see a live demo from my prototype here. With three unique geometries and rendering 20,000 items the geometry memory usage goes down from ~192MB to ~23KB.

This is a proposed API and to demonstrate how this can be achieved. Once an API is settled on I can update documentation & any loader changes needed.

Usage
const mesh = new BatchedMesh( 100, 300, 300 );

// add new geometry and one instance of a sphere and cube geometry each
const id0 = mesh.addGeometry( sphereGeometry );
mesh.setMatrixAt( id0, ... );

const id1 = mesh.addGeometry( boxGeometry );
mesh.setMatrixAt( id1, ... );

// add new instances based on the above items
const id2 = mesh.addInstance( id0 );
mesh.setMatrixAt( id2, ... );

const id3 = mesh.addInstance( id1 );
mesh.setMatrixAt( id3, ... );

// 4 items, 2 geometries
API Description

addGeometry( geometry )

Inserts new geometry and associated draw range data into the mesh in addition to a new draw instance so it will render. The id of the new draw instance is returned.

addInstance( id )

Takes the id of an item that has already been inserted into the mesh either via addGeometry or addInstance and adds a new draw instance using the same geometry.

setGeometryAt( id, geometry )

Sets the geometry associated with the given instance id. All of the instances that are rendering with the same referenced geometry will be changed, as well.

deleteInstance( id )

Removes the instance from being drawn. Just like the previous deleteGeometry, though, this doesn't free up space in the buffer geometry or remove instance objects due to how ids work, yet, so this should remain undocumented. The optimize function would need to be implemented.

Things to Note
  • Because we use instance ids to add new instances that use the same geometry - once all instances that reference a specific geometry have been deleted, it will not be possible to add new instances of the geometry even though the data is still in the batched mesh. The future optimize function would remove the geometry if it's no longer referenced by any draw instances.

  • Another API option might be to enabling managing of geometry vs instances separately but this makes usage a bit more verbose and less clear if you don't care to use instances. I waiver between which I prefer:

const mesh = new BatchedMesh( ... );

// initialize geometries
const geometryId0 = mesh.addGeometry( sphereGeometry );
const geometryId1 = mesh.addGeometry( boxGeometry );

// initialize instances
const instancedId0 = mesh.addInstance( geometryId0 );
mesh.setMatrixAt( instanceId0, ... );

const instancedId1 = mesh.addInstance( geometryId1 );
mesh.setMatrixAt( instanceId1, ... );

const instancedId2 = mesh.addInstance( geometryId0 );
mesh.setMatrixAt( instanceId2, ... );

const instancedId3 = mesh.addInstance( geometryId1 );
mesh.setMatrixAt( instanceId3, ... );
todo
  • fix fallback case

cc @CodyJasonBennett @RenaudRohlinger @donmccurdy - very interested in your guys' thoughts on this

Copy link

github-actions bot commented May 17, 2024

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize dev Filesize PR Diff
679.1 kB (168.3 kB) 679.8 kB (168.5 kB) +787 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize dev Filesize PR Diff
457.1 kB (110.3 kB) 457.7 kB (110.5 kB) +612 B

@CodyJasonBennett
Copy link
Contributor

CodyJasonBennett commented May 17, 2024

I've only briefly looked through the code, mostly curious of how you accomplished this in the first place, but this seems to overcome the issues that was holding instanced multi-draw back prior. Namely sorting and material support. I'm not sure frustum culling is or even should be handled per instance without some drastic optimizations -- matrices and orientation are expensive.

For testing (I think you asked for this but I can't see your comment anymore), I had a model made using EXT_mesh_gpu_instancing which is a soup of instances. https://drive.google.com/file/d/1bCvYbtuwKANR1dOjs5GiYfNBjA9FvvL2/view?usp=sharing

https://gltf.report NVIDIA 2070 ANGLE/DirectX (similar results with OpenGL backend)
image

@RenaudRohlinger
Copy link
Collaborator

RenaudRohlinger commented May 17, 2024

Amazing work @gkjohnson! Looks good to me. I just played around with your code and, by adding about 80 lines, I have added support for drawInstances.

The FPS difference is quite noticeable in this example with multiDrawElementsInstancedWEBGL. The draw range has been reduced from almost a million for 20,000 instances to less than 1000.

The behaviors between non-instanced and instanced BatchMesh are quite different with regards to frustum culling, sort, and all. So I would recommend creating InstancedBatchedMesh in another PR if we want to add this support. Doing so will be quite time-consuming, but I'm willing to do it this year if I find the time and resources to push this change.

image

In the meantime I don't see any reason not to merge this PR in this state. 😊

@donmccurdy
Copy link
Collaborator

Awesome additions @gkjohnson! I feel good about the API you've suggested.

Rendering 20,000 items in the demo, my laptop spends about 1.5ms / frame iterating the draw list. That's far better than the overhead of 20,000 draw calls obviously, so no complaint. But if we did (at some point) want to have an optimized path that avoids iteration in the render loop, presumably without automatic frustum culling or sorting ... would we do anything differently in the exposed public API of this PR?

My hunch is no, this still works fine, but just floating the question. :)

@gkjohnson
Copy link
Collaborator Author

Cody -

I'm not sure frustum culling is or even should be handled per instance without some drastic optimizations -- matrices and orientation are expensive.

Sorting is generally more expensive than frustum culling in this case and when performing sorting, frustum culling is just one more operation per item since all the same calculations are required (transforming a sphere into camera space). Note that frustum culling is just using sphere bounds, in this case, not OBBs or anything.

Either way it's all disable-able depending on the use case.

For testing (I think you asked for this but I can't see your comment anymore),

Thanks! I posted that and then I realized I had a model I could use. I'll post another more practical demo later this week.

Renaud -

The FPS difference is quite noticeable in this example with multiDrawElementsInstancedWEBGL. The draw range has been reduced from almost a million for 20,000 instances to less than 1000.

I figured minimizing the amount of data uploaded was the only real benefit to the instanced function variant. I'd be interested to see the performance benefits but in my opinion it would be worth measuring the benefits in a less contrived example. Rendering 3 identical geometries over 6500 times each isn't the most common case 😅

Do you have a link showing the changes?

Don -

But if we did (at some point) want to have an optimized path that avoids iteration in the render loop, presumably without automatic frustum culling or sorting

Both sorting and per-object frustum culling can be disabled with the sortObjects and perObjectFrustumCulling flags just as they could in BatchedMesh. And if the fields are disabled then there's no iteration that happens. It all also occurs in the onBeforeRender function so people can remove override the behavior if they'd like

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented May 20, 2024

Here's another more practical demo of the instanced batch rendering using this model which reuses the "tiles" model, for example, a ton of times. Here's the live demo.

Here's a visualization show every unique mesh with a different color (798 total):

image

And here is every visualization where every instance uses the same color (767 meshes share geometry with another mesh):

image

The last thing that I think needs to be decided is what to do when the the multi draw extension is not supported. Currently BatchedMesh falls back to draw all the geometry with normal drawcalls but now that gl_DrawID is used we can't do that as easily. Two options:

  • Remove fallback which will break support in Firefox.
  • Use a define to change gl_DrawID to a _gl_DrawID uniform and change the value every draw, which is a more invasive change to WebGLBufferRenderer.

cc @mrdoob @Mugen87 any opinions here?

@gkjohnson
Copy link
Collaborator Author

The last thing that I think needs to be decided is what to do when the the multi draw extension is not supported.
...

Here's a diff showing a hacky version of what's needed for the fallback. The uniforms of the currently rendering program are cached and passed into the multi draw functions so the id can be updated.

gkjohnson/three.js@instanced-batching-update...gkjohnson:three.js:instanced-batching-fallback

@RenaudRohlinger
Copy link
Collaborator

The last thing that I think needs to be decided is what to do when the the multi draw extension is not supported. Currently BatchedMesh falls back to draw all the geometry with normal drawcalls but now that gl_DrawID is used we can't do that as easily. Two options:

  • Remove fallback which will break support in Firefox.
  • Use a define to change gl_DrawID to a _gl_DrawID uniform and change the value every draw, which is a more invasive change to WebGLBufferRenderer.

cc @mrdoob @Mugen87 any opinions here?

For Firefox couldn't you simply add a uint batchID attribute to each initialized geometry and use this attribute instead of gl_DrawID in the shader?

@Mugen87
Copy link
Collaborator

Mugen87 commented May 20, 2024

Breaking support in Firefox isn't an option, tbh. A fallback should be in place in any event.

#28404 (comment) sounds best. TBH, not so happy with the required changes in gkjohnson/three.js@instanced-batching-update...gkjohnson:three.js:instanced-batching-fallback.

@gkjohnson
Copy link
Collaborator Author

gkjohnson commented May 20, 2024

For Firefox couldn't you simply add a uint batchID attribute to each initialized geometry and use this attribute instead of gl_DrawID in the shader?

This is what was done previously when each geometry was only rendered once but it's not enough for this case now that we can render the same geometry range multiple times. If we render a single geometry 1000 times, for example, gl_DrawID will increment from 0 to 999 in the shader for each instance. But if you just add a batchId attribute to the geometry it will be set to 0 for every draw which means we will not be able to index into the indirect id texture.

Fundamentally we need to know which "render" number this is in the set (which is what gl_DrawID provides). I don't see a way around updating a uniform between draws or updating the geometry's batchId attribute with the same id (which is a lot more data to upload).

TBH, not so happy with the required changes

I agree it's not great but functionally it's what needs to happen to support browsers without multidraw.

@gkjohnson
Copy link
Collaborator Author

@Mugen87 Here's another alternative which moves the fallback logic to WebGLRenderer rather than then Buffer Renderer classes:

gkjohnson/three.js@instanced-batching-update...gkjohnson:three.js:instanced-batching-fallback-2

If this still isn't acceptable I'll need to ask for some more guidance on how this can be handled.

@prideout
Copy link
Contributor

Super cool Garrett. Does it ever make sense to pass InstancedBufferGeometry into the addGeometry method? It's easy to get confused by different types of "instancing", so I wonder if some types of instancing should be deprecated in the future.

@ShakhriddinAx
Copy link

This is great, I hope it will be released in the nearest versions.
Is the “optimize” function in progress? If not, how can I implement it, I mean how is this function supposed to free up memory.
@gkjohnson

@Mugen87
Copy link
Collaborator

Mugen87 commented May 21, 2024

I think gkjohnson/three.js@instanced-batching-update...gkjohnson:three.js:instanced-batching-fallback-2 goes in the right direction.

@gkjohnson Would it be an option to introduce WebGLBatchedRenderer and use it instead of WebGLBufferRenderer and WebGLIndexedRenderer when an instance of BatchedMesh needs to be rendered? In this way, we wouldn't add more complexity to the the existing modules and could move all batched mesh related render commands into one place. Even if this means some code duplication in WebGLBatchedRenderer, it seems this approach is eventually easier to read and maintain.

@gkjohnson
Copy link
Collaborator Author

Would it be an option to introduce WebGLBatchedRenderer and use it instead of WebGLBufferRenderer and WebGLIndexedRenderer when an instance of BatchedMesh needs to be rendered? In this way, we wouldn't add more complexity to the the existing modules and could move all batched mesh related render commands into one place.

I'd prefer not to start refactoring other parts of the code since this PR is already fairly large and I think that change will inevitably raise more questions (for me, at least). If you already have an idea of how you think this should be organized it might be best to do this after this PR is merged.

I'll merge the fallback from the other branch so this PR is complete as-is. If everything else looks good API-wise I can add doc updates in this PR or another.

@Mugen87
Copy link
Collaborator

Mugen87 commented May 21, 2024

Agreed! A potential refactoring can be done at a later point as well.

@mrdoob mrdoob added this to the r165 milestone May 22, 2024
@mrdoob mrdoob merged commit 02ed248 into mrdoob:dev May 22, 2024
12 checks passed
@gkjohnson
Copy link
Collaborator Author

gkjohnson commented May 22, 2024

@mrdoob - can we revert this. I would like to resolve discussions raised in the docs PR #28456 before this makes it into a release.

@Mugen87
Copy link
Collaborator

Mugen87 commented May 22, 2024

@gkjohnson As a collaborator, you should have the rights to revert PRs. Do you see the "Revert" button next to the merge commit?

image

@gkjohnson
Copy link
Collaborator Author

I've reverted in #28461 and opened a PR for all the same changes in #28462 to continue any discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants