-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Camera Vertex Transform to GPU #219
Comments
In order to perform a full GPU transform of vertex data, the following setup would be required in the vertex shader:
While Camera-local data is constant after setting up the Camera's RenderPass, object-specific data changes on average every four vertices. Without a very efficient way to store them, this will be the main culprit. Problems:
These problems require further consideration before this issue can be solved. /cc @BraveSirAndrew with a vague feeling that he might have a solid opinion or experience with this kind of thing. |
One way to solve these problems would be to store all object-local data in vertex attributes. This would heavily increase data load, but at the same time solve the batching problem and circumvent the API problem. As an optimization, object-local rotation could be performed on the CPU like it is implemented now. The default vertex format here would then be:
Which would be 12 bytes larger than before. Also, 36 bytes is quite a bit for this kind of simple 2D data. Potentially, it could be optimized to this:
|
Another problem with this approach and especially the above vertex shader is the fact that all existing transformations need to be expressed within its configuration. Both screen overlay and world rendering need to be able to take the same rendering path, because all Materials should be equally usable in both modes - without having all in two versions and picking "the right one". An updated version of the above shader (including the removed object rotation and the potential uniform to attribute change) could look like this:
Note that, in on-screen mode, none of the camera-related uniforms are used at all. |
Adding to the above (solved) problem, the same shader would also need to be configurable to support flat / non-parallax rendering:
In this setup, all projection / rendering modes are supported:
So, up to this point, the main issue of both shader- and Duality API considerations seems to be how to transfer object data to the shader in a generic, re-usable way, and how to do so most efficiently. |
With regard to the memory bandwith issue when storing object data in vertex attributes, here's a comparison to put it into perspective: Duality 2D Vertex Format:
Somewhat Minimal 3D Game Vertex Format:
It doesn't seem like that big of big deal in comparison. This might be the way to go here. Edit: Assuming a game scene with 10000 visible sprites, that would be 40000 vertices per frame. Even when assuming the uncompressed variant with 36 bytes per vertex, that would only be only around 82 MB per second bandwidth. |
Hi Adam I think that the correct way to handle the per-object data in this case would be to use separate streams of data. You could leave the existing vertex formats alone and add another stream of vertex data for object position, rotation, and scale. You can set a divisor on streams in OpenGL so you could say that the GL should only update its index into this second stream for every four vertices processed. That way you're reducing the extra load to only (12 bytes for position + 4 bytes for rotation + 4 bytes for scale) * 10000 = 200k on top of your normal data for 10000 sprites, which is nothing at all! I wouldn't even worry about that on mobile platforms. |
This is exactly the kind of thing that I was looking for - a hardcoded one-object-has-four-vertices solution probably won't suffice as a general-purpose method, but if there was a way to just specify an index per vertex, which could then be used to lookup some object data from a buffer, this would certainly reduce data load and provide an opportunity for specifying even more complex per-object data. I'm still doing some research on this, but do you happen to know what keyword I should be looking for? Edit: Actually, when modifying this to provide "per-primitive-data", telling OpenGL to update its index every X vertices would be kind of a general-purpose solution. All it would take would be to extend the Edit: After looking a bit into this, multiple sources tell me that specifying vertex data per-primitive or specifying distinct index buffers for different attributes isn't really possible unless using GL3.x buffer textures with negative performance implications. If nothing else turns up, I guess I'm back at the initial solution of specifying object data per vertex. :| Edit: Found the divisor command. It's only available in OpenGL 3.3 and ES 3.0. OpenGL 3.3 is fine for desktop machines, but ES 3.0 worries me a little. Using this as a base requirement, it would rule out most mobile devices. Edit: It also seems that the divisor feature is only available when doing instanced rendering, not on a regular / continuous stream of vertices (?), which might be an issue. |
When taking into account advanced shaders such as lighting, they also require information about an objects local rotation, so they can interpret its normalmap accordingly. In these cases, an advanced vertex format could be used, but incidentally, object rotation was also part of the initial vertex format draft. So, maybe it does have its place there, as would object-local rotation in the shader:
Updated shader:
With the vertex format growing again despite compression efforts, storing per-object / per-primitive data beside vertex data like this should be considered really, really carefully. Continuing to look out for alternatives. |
Can a TexCoord really be compressed using half floats? With that change, the only attribute left compressed is the object rotation, which only saves two bytes. Might as well use full precision and store rotations directly in radians then, with the added benefit of clarity and not requiring to introduce Half Float types to DualityPrimitives, as well as requiring OpenGL support for them.
Maybe I've just grown accustomed to this data growth, but 36 bytes per vertex doesn't seem that bad at this point. Feedback by graphics programmers appreciated. |
All this vertex format extension stuff doesn't sound that great. Let's take a step back:
|
So, since additional information is no longer required, here's the updated shader:
The Z offset in the above shader would be an optional vertex attribute, so non-parallax depth sorting offsets can still be added. If not specified in the vertex stream, its value would naturally fall back to zero. Note that New default vertex format which specifies them:
As an additional improvement, Duality shaders could be updated to feature builtin functions (besides the already existing builtin uniforms), which could provide a standard vertex transformation. This would add some more flexibility to change the exact transformation code later while still keeping old shader code working. |
Implications of the above draft:
Usability++ |
are still applied on the CPU but parallax scale and view transformations are done on the GPU, based on notes from AdamsLair#219
It should be possible to test the new transform and shader as a heads-up without changing anything in the core:
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
Immediate ToDo
|
Progress
|
Right now, parts of the vertex transformation in rendering happens on the CPU using
PreprocessCoords
or manually:This approach has several problems:
But it also solves the following problem:
If there is a way to solve this using a GPU vertex transform approach, there's no reason not to move all vertex transform calculations to the GPU for better shader support and performance. Customized solutions could still be implemented using custom shaders.
The text was updated successfully, but these errors were encountered: