Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Camera Vertex Transform to GPU #219

Closed
ilexp opened this issue Oct 9, 2015 · 30 comments
Closed

Move Camera Vertex Transform to GPU #219

ilexp opened this issue Oct 9, 2015 · 30 comments
Assignees
Labels
Backend Area: Backend core / editor plugins (OpenTK, etc.) Breaking Change Breaks binary or source compatibility Cleanup Improving form, keeping function Core Area: Duality runtime or launcher Feature It doesn't exist yet, but I want it Performance Related to runtime or editor performance Rendering Related to rendering / graphics Usability Related to API and UI usability
Milestone

Comments

@ilexp
Copy link
Member

ilexp commented Oct 9, 2015

Right now, parts of the vertex transformation in rendering happens on the CPU using PreprocessCoords or manually:

  • Transforming world coordinates to camera coordinates
  • Applying object-local scale depending on camera distance
  • Rotating the object locally around its center

This approach has several problems:

  • Typical GPU work is done on the CPU with poor performance.
  • Vertex Shaders never have access to the original world coordinates of a vertex and cannot adjust or react to them in a meaningful way.

But it also solves the following problem:

  • The fake perspective that is used in parallax 2D projection requires to scale objects around their local center / origin. No global transformation can properly transform all incoming vertices at once, so a per-object transformation is required. However, per-object uniform data would prevent efficient batch rendering and actually make performance worse.

If there is a way to solve this using a GPU vertex transform approach, there's no reason not to move all vertex transform calculations to the GPU for better shader support and performance. Customized solutions could still be implemented using custom shaders.

@ilexp ilexp added Feature It doesn't exist yet, but I want it Core Area: Duality runtime or launcher Usability Related to API and UI usability Performance Related to runtime or editor performance Arguable Benefit-to-cost ratio unclear labels Oct 9, 2015
@ilexp ilexp added this to the General Improvements milestone Oct 9, 2015
@ilexp
Copy link
Member Author

ilexp commented Oct 9, 2015

In order to perform a full GPU transform of vertex data, the following setup would be required in the vertex shader:

// Camera-constant data
uniform    vec3  camPos;       // Position of the camera in world coordinates
uniform    float camFocusDist; // FocusDist of the camera.
uniform    mat2  camRotation;  // Transformation matrix of the camera's Z rotation

// Object-local data
attribute  vec3  vertexLocal;  // Object-local vertex position
uniform    vec3  objPos;       // Position of the object in world coordinates
uniform    float objRotation;  // Object-local rotation

// Draft of the main operations to perform
void main()
{
    // Determine object scale based on camera properties and relative object position
    float objScale = camFocusDist / (objPos.z - camPos.z);

    // Transform local vertex coords to include rotation and scale
    float rotateSin = sin(objRotation);
    float rotateCos = cos(objRotation);
    vec3 localPos = vec3(
        vertexLocal.x * cos - vertexLocal.y * sin * objScale,
        vertexLocal.x * sin + vertexLocal.y * cos * objScale,
        vertexLocal.z);

    // Determine vertex world position
    vec3 worldPos = localPos + objPos;

    // Transform vertex to view coordinates and account for camera rotation
    vec3 viewPos = worldPos - camPos;
    viewPos = vec3(camRotation * viewPos.xy, viewPos.z);

    // Do OpenGL ortho projection
    gl_Position = gl_ProjectionMatrix * viewPos;
}

While Camera-local data is constant after setting up the Camera's RenderPass, object-specific data changes on average every four vertices. Without a very efficient way to store them, this will be the main culprit.

Problems:

  1. Calling glUniform a few times after every four vertices absolutely kills batching.
    • Investigate OpenGL Uniform buffers and similar concepts. If possible, limit this to OpenGL ES 2.0 supported features.
    • According to docs.gl, Uniform buffers are unavailable in OpenGL 2.1 and ES 2.0 and are first supported in OpenGL 3.0 and ES 3.0. Not supporting OpenGL 2.1 might not be a problem given its age, but OpenGL ES 3.0 seems like a "big" requirement...?
  2. Duality currently isn't very efficient in storing uniform data material-wise, especially not the kind of uniform data that changes per-object. Creating a new BatchInfo for every object is not a viable option. There needs to be a way to specify "temporary" uniform data per AddVertices call.
    • It needs to be super-fast. Seriously. If this should have a chance to become the new default for sprites (99% of objects), this needs to be lightspeed.

These problems require further consideration before this issue can be solved.

/cc @BraveSirAndrew with a vague feeling that he might have a solid opinion or experience with this kind of thing.

@ilexp
Copy link
Member Author

ilexp commented Oct 9, 2015

  1. Calling glUniform a few times after every four vertices absolutely kills batching.
    • Investigate OpenGL Uniform buffers and similar concepts. If possible, limit this to OpenGL ES 2.0 supported features.
    • According to docs.gl, Uniform buffers are unavailable in OpenGL 2.1 and ES 2.0 and are first supported in OpenGL 3.0 and ES 3.0. Not supporting OpenGL 2.1 might not be a problem given its age, but OpenGL ES 3.0 seems like a "big" requirement...?
  2. Duality currently isn't very efficient in storing uniform data material-wise, especially not the kind of uniform data that changes per-object. Creating a new BatchInfo for every object is not a viable option. There needs to be a way to specify "temporary" uniform data per AddVertices call.
    • It needs to be super-fast. Seriously. If this should have a chance to become the new default for sprites (99% of objects), this needs to be lightspeed.

One way to solve these problems would be to store all object-local data in vertex attributes. This would heavily increase data load, but at the same time solve the batching problem and circumvent the API problem. As an optimization, object-local rotation could be performed on the CPU like it is implemented now.

The default vertex format here would then be:

Vector3 LocalPosition; // 12 bytes
Vector3 ObjPosition;   // 12 bytes
Vector2 TexCoord;      // 8 bytes
ColorRgba Color;       // 4 bytes

// Total:  36 bytes per vertex
// Before: 24 bytes per vertex

Which would be 12 bytes larger than before. Also, 36 bytes is quite a bit for this kind of simple 2D data. Potentially, it could be optimized to this:

Vector2 LocalPosition; // 8 bytes
Vector3 ObjPosition;   // 12 bytes
Vector2h TexCoord;     // 4 bytes
ColorRgba Color;       // 4 bytes

// Total:      36 bytes per vertex
// Compressed: 28 bytes per vertex
// Before:     24 bytes per vertex

@ilexp
Copy link
Member Author

ilexp commented Oct 9, 2015

Another problem with this approach and especially the above vertex shader is the fact that all existing transformations need to be expressed within its configuration. Both screen overlay and world rendering need to be able to take the same rendering path, because all Materials should be equally usable in both modes - without having all in two versions and picking "the right one".

An updated version of the above shader (including the removed object rotation and the potential uniform to attribute change) could look like this:

// Camera-constant data
uniform    vec3  camPos;       // Position of the camera in world coordinates
uniform    float camFocusDist; // FocusDist of the camera.
uniform    mat2  camRotation;  // Transformation matrix of the camera's Z rotation
uniform    bool camOnScreen; // If true, screen transformation is used

// Object-local data
attribute  vec3  vertexLocal;  // Object-local vertex position
attribute  vec3  objPos;       // Position of the object in world coordinates

// Draft of the main operations to perform
void main()
{
    vec3 viewPos;

    if (!camOnScreen)
    {
        // Determine object scale based on camera properties and relative object position
        float objScale = camFocusDist / (objPos.z - camPos.z);

        // Transform local vertex coords to include local scale
        vec3 localPos = vec3(
            vertexLocal.xy * objScale,
            vertexLocal.z);

        // Determine vertex world position
        vec3 worldPos = localPos + objPos;

        // Transform vertex to view coordinates and account for camera rotation
        viewPos = worldPos - camPos;
        viewPos = vec3(camRotation * viewPos.xy, viewPos.z);
    }
    else
    {
        // In on-screen mode, just forward the raw positions into view space
        viewPos = objPos + vertexLocal;
    }

    // Do OpenGL ortho projection
    gl_Position = gl_ProjectionMatrix * viewPos;
}

Note that, in on-screen mode, none of the camera-related uniforms are used at all.

@ilexp
Copy link
Member Author

ilexp commented Oct 9, 2015

Adding to the above (solved) problem, the same shader would also need to be configurable to support flat / non-parallax rendering:

// Camera-constant data
uniform    vec3  camPos;       // Position of the camera in world coordinates
uniform    float camFocusDist; // FocusDist of the camera.
uniform    mat2  camRotation;  // Transformation matrix of the camera's Z rotation
uniform    bool  camParallax;  // If true, 2D parallax projection is applied by the camera

// Object-local data
attribute  vec3  objPos;       // Position of the object in world coordinates

// Vertex-local data
attribute  vec3  vertexLocal;  // Object-local vertex position

// Draft of the main operations to perform
void main()
{
    vec3 localPos = vertexLocal;

    // Apply parallax 2D projection
    if (camParallax)
    {
        // Determine object scale based on camera properties and relative object position
        float objScale = camFocusDist / (objPos.z - camPos.z);

        // Transform local vertex coords to include local scale
        localPos.xy *= objScale;
    }

    // Determine vertex world position
    vec3 worldPos = localPos + objPos;

    // Transform vertex to view coordinates and account for camera rotation
    vec3 viewPos = worldPos - camPos;
    viewPos = vec3(camRotation * viewPos.xy, viewPos.z);

    // Do OpenGL ortho projection
    gl_Position = gl_ProjectionMatrix * viewPos;
}

In this setup, all projection / rendering modes are supported:

  • "World-space" parallax 2D rendering is active by default.
  • "World-space" non-parallax / flat rendering is active when setting the camParallax uniform to false.
  • "Screen-space" rendering is active when setting the camParallax uniform to false and specifying camPos to be (0, 0, 0).
  • Also note that the above shader code is, in the context of the Duality rendering setup, can be configured to be 100% equivalent with the current minimal ftransform shader:
    • Camera Rotation can be applied (optional)
    • Projection is applied
    • All else can be disabled via parameter

So, up to this point, the main issue of both shader- and Duality API considerations seems to be how to transfer object data to the shader in a generic, re-usable way, and how to do so most efficiently.

@ilexp ilexp removed the Arguable Benefit-to-cost ratio unclear label Oct 9, 2015
@ilexp
Copy link
Member Author

ilexp commented Oct 9, 2015

With regard to the memory bandwith issue when storing object data in vertex attributes, here's a comparison to put it into perspective:

Duality 2D Vertex Format:

Vector2 LocalPosition; // 8 bytes
Vector3 ObjPosition;   // 12 bytes
Vector2h TexCoord;     // 4 bytes
ColorRgba Color;       // 4 bytes

// Total:      36 bytes per vertex
// Compressed: 28 bytes per vertex
// Before:     24 bytes per vertex

Somewhat Minimal 3D Game Vertex Format:

Vector3 Position;      // 12 bytes
Vector3h Normal;       // 6 bytes
Vector3h Tangent;      // 6 bytes
Vector2h TexCoord;     // 4 bytes

// Total:      44 bytes per vertex
// Compressed: 28 bytes per vertex

It doesn't seem like that big of big deal in comparison. This might be the way to go here.


Edit: Assuming a game scene with 10000 visible sprites, that would be 40000 vertices per frame. Even when assuming the uncompressed variant with 36 bytes per vertex, that would only be only around 82 MB per second bandwidth.
However, even the old PCI Express 2.0 has a total bandwidth between 500 MB and 8 GB per second. The above vertex size seems to be totally manageable. Not sure about mobile platforms though - any insight appreciated.

@ilexp ilexp modified the milestones: The Future Is Now, General Improvements Oct 9, 2015
@ilexp ilexp changed the title Move Camera Vertex Transform to GPU? Move Camera Vertex Transform to GPU Oct 9, 2015
@BraveSirAndrew
Copy link
Contributor

Hi Adam

I think that the correct way to handle the per-object data in this case would be to use separate streams of data. You could leave the existing vertex formats alone and add another stream of vertex data for object position, rotation, and scale. You can set a divisor on streams in OpenGL so you could say that the GL should only update its index into this second stream for every four vertices processed. That way you're reducing the extra load to only (12 bytes for position + 4 bytes for rotation + 4 bytes for scale) * 10000 = 200k on top of your normal data for 10000 sprites, which is nothing at all! I wouldn't even worry about that on mobile platforms.

@ilexp
Copy link
Member Author

ilexp commented Oct 10, 2015

You can set a divisor on streams in OpenGL so you could say that the GL should only update its index into this second stream for every four vertices processed.

This is exactly the kind of thing that I was looking for - a hardcoded one-object-has-four-vertices solution probably won't suffice as a general-purpose method, but if there was a way to just specify an index per vertex, which could then be used to lookup some object data from a buffer, this would certainly reduce data load and provide an opportunity for specifying even more complex per-object data.

I'm still doing some research on this, but do you happen to know what keyword I should be looking for?


Edit: Actually, when modifying this to provide "per-primitive-data", telling OpenGL to update its index every X vertices would be kind of a general-purpose solution. All it would take would be to extend the AddVertices and IDrawBatch / internal DrawBatch<T> API to include a second per-primitive stream and all the rest could be done by the backend.
Not sure how that would affect vertex upload performance though, since every batch would then require a binding swap and two consecutive uploads - I suppose this shouldn't have a noticeable effect.

Edit: After looking a bit into this, multiple sources tell me that specifying vertex data per-primitive or specifying distinct index buffers for different attributes isn't really possible unless using GL3.x buffer textures with negative performance implications. If nothing else turns up, I guess I'm back at the initial solution of specifying object data per vertex. :|

Edit: Found the divisor command. It's only available in OpenGL 3.3 and ES 3.0. OpenGL 3.3 is fine for desktop machines, but ES 3.0 worries me a little. Using this as a base requirement, it would rule out most mobile devices.
Fallback code in the backend could upload the same vertex data N times, but I'm not sure if it's a great idea to spam OpenGL calls like that, so that fallback probably isn't that good. Another one might be to expand that vertex data on the CPU before submitting it, which isn't that great either, especially when this is only done on devices that aren't very powerful in the first place.

Edit: It also seems that the divisor feature is only available when doing instanced rendering, not on a regular / continuous stream of vertices (?), which might be an issue.


@ilexp
Copy link
Member Author

ilexp commented Oct 10, 2015

When taking into account advanced shaders such as lighting, they also require information about an objects local rotation, so they can interpret its normalmap accordingly.

In these cases, an advanced vertex format could be used, but incidentally, object rotation was also part of the initial vertex format draft. So, maybe it does have its place there, as would object-local rotation in the shader:

// Per-Object / Per-Primitive data
Vector3 ObjPosition;   // 12 bytes
Half ObjRotation       // 2 bytes

// Actual Per-Vertex data
Vector2 LocalPosition; // 8 bytes
Vector2h TexCoord;     // 4 bytes
ColorRgba Color;       // 4 bytes

// Total:      40 bytes per vertex
// Compressed: 30 bytes per vertex
// Before:     24 bytes per vertex

Updated shader:

// Camera-constant data
uniform    vec3  camPos;       // Position of the camera in world coordinates
uniform    float camFocusDist; // FocusDist of the camera.
uniform    mat2  camRotation;  // Transformation matrix of the camera's Z rotation
uniform    bool  camParallax;  // If true, 2D parallax projection is applied by the camera

// Object-local data
attribute  vec3  objPos;       // Position of the object in world coordinates
attribute  float objRot;       // Rotation of the object in degree (to better use Half Float precision)

// Vertex-local data
attribute  vec3  vertexLocal;  // Object-local vertex position

// Draft of the main operations to perform
void main()
{
    vec3 localPos = vertexLocal;

    // Apply parallax 2D projection
    if (camParallax)
    {
        // Determine object scale based on camera properties and relative object position
        float objScale = camFocusDist / (objPos.z - camPos.z);

        // Transform local vertex coords according to parallax scale
        localPos.xy *= objScale;
    }

    // Apply local object rotation to vertex coords 
    float objRotRadians = radians(objRot);
    float rotSin = sin(objRotRadians);
    float rotCos = cos(objRotRadians);
    vec3 localPos = vec3(
        vertexLocal.x * rotCos - vertexLocal.y * rotSin,
        vertexLocal.x * rotSin + vertexLocal.y * rotCos,
        vertexLocal.z);

    // Determine vertex world position
    vec3 worldPos = localPos + objPos;

    // Transform vertex to view coordinates and account for camera rotation
    vec3 viewPos = worldPos - camPos;
    viewPos = vec3(camRotation * viewPos.xy, viewPos.z);

    // Do OpenGL ortho projection
    gl_Position = gl_ProjectionMatrix * viewPos;
}

With the vertex format growing again despite compression efforts, storing per-object / per-primitive data beside vertex data like this should be considered really, really carefully. Continuing to look out for alternatives.

@ilexp
Copy link
Member Author

ilexp commented Oct 10, 2015

Can a TexCoord really be compressed using half floats?
A Half Float has a precision of three decimals between zero and one. However, when assuming a sprite sheet > 1024², the required precision to address each texel is clearly higher than three decimals. In 2D games, where some of them will require pixel-perfect rendering, this is not viable. Therefore, TexCoord needs to use a higher precision.

With that change, the only attribute left compressed is the object rotation, which only saves two bytes. Might as well use full precision and store rotations directly in radians then, with the added benefit of clarity and not requiring to introduce Half Float types to DualityPrimitives, as well as requiring OpenGL support for them.

// Per-Object / Per-Primitive data
Vector3 ObjPosition;   // 12 bytes
float ObjRotation      // 4 bytes

// Actual Per-Vertex data
Vector2 LocalPosition; // 8 bytes
Vector2 TexCoord;      // 8 bytes
ColorRgba Color;       // 4 bytes

// Total:      36 bytes per vertex
// Before:     24 bytes per vertex

Maybe I've just grown accustomed to this data growth, but 36 bytes per vertex doesn't seem that bad at this point. Feedback by graphics programmers appreciated.

@ilexp ilexp added the Backend Area: Backend core / editor plugins (OpenTK, etc.) label Oct 11, 2015
@ilexp
Copy link
Member Author

ilexp commented Oct 14, 2015

All this vertex format extension stuff doesn't sound that great. Let's take a step back:

  • When applying object-local rotation and scale in software in the ICmpRenderer, all that's left to transform is everything relative to the Camera: Position, parallax scale and rotation.
  • All of this can be applied in the vertex shader without additional information. So why not do it this way? Keeping object-local transforms on the software side, but doing all Camera- and Perspective-related stuff clearly in hardware.
  • This sounds much better. Leave the old vertex format alone.

@ilexp
Copy link
Member Author

ilexp commented Oct 15, 2015

So, since additional information is no longer required, here's the updated shader:

// Camera-constant data
uniform    vec3  camPos;         // Position of the camera in world coordinates
uniform    float camZoom;        // Zoom factor of the camera.
uniform    mat2  camRotation;    // Transformation matrix of the camera's Z rotation
uniform    bool  camParallax;    // If true, 2D parallax projection is applied by the camera

// Vertex data
attribute  vec3  vertexWorldPos; // The world position of the vertex
attribute  float vertexZOffset;  // Optional: The (sorting) Z offset that shouldn't affect parallax scale

// Draft of the main operations to perform
void main()
{
    // This could be moved to a Duality-builtin vertex shader function which
    // transforms a world coordinate into a view coordinate.
    {
        // Apply parallax 2D projection
        float parallaxScale;
        if (camParallax)
        {
            // Determine object scale based on camera properties and relative vertex position
            parallaxScale = camZoom / (vertexWorldPos.z - camPos.z);
        }
        else
        {
            // Apply a global scale factor
            parallaxScale = camZoom;
        }

        // Transform vertex to view coordinates and account for parallax scale, 
        // camera rotation and Z-offset
        vec3 viewPos = worldPos - camPos;
        viewPos.xy *= parallaxScale;
        viewPos = vec3(camRotation * viewPos.xy, viewPos.z + vertexZOffset);
    }

    // Do OpenGL ortho projection
    gl_Position = gl_ProjectionMatrix * viewPos;
}

The Z offset in the above shader would be an optional vertex attribute, so non-parallax depth sorting offsets can still be added. If not specified in the vertex stream, its value would naturally fall back to zero.

Note that IVertexData and DrawBatch<T> will need to be adjusted to account for the fact that Z offset is now a distinct attribute, and no longer included in the Pos.Z coordinate. The Canvas class might need to be adjusted as well.

New default vertex format which specifies them:

Vector3   Position; // 12 bytes
Vector2   TexCoord; // 8 bytes
ColorRgba Color;    // 4 bytes
float     Offset;   // 4 bytes  [Optional]

// Total:  28 bytes per vertex
// Before: 24 bytes per vertex

As an additional improvement, Duality shaders could be updated to feature builtin functions (besides the already existing builtin uniforms), which could provide a standard vertex transformation. This would add some more flexibility to change the exact transformation code later while still keeping old shader code working.

@ilexp
Copy link
Member Author

ilexp commented Oct 15, 2015

Implications of the above draft:

  • No more PreprocessCoords: Improved usability. Just specify an objects vertices in world coordinates and be done with it.
  • Shader access to world coordinates: Will make a lot of (vertex) shader operations easier to use and more intuitive.
  • Improved performance: Less work done on the CPU in ICmpRenderer Components, more work done by the GPU, which doesn't really mind anyway here.
  • The transformation function in the vertex shader decides how exactly coordinate transformation is done. More flexibility!

Usability++
Performance++
Cleanliness+

@ilexp ilexp added the Breaking Change Breaks binary or source compatibility label Nov 12, 2015
BraveSirAndrew added a commit to batbuild/duality that referenced this issue Nov 29, 2015
are still applied on the CPU but parallax scale and view transformations
are done on the GPU, based on notes from AdamsLair#219
@ilexp ilexp modified the milestones: The Future Is Now, v3.0 Feb 13, 2016
@ilexp
Copy link
Member Author

ilexp commented Sep 17, 2017

It should be possible to test the new transform and shader as a heads-up without changing anything in the core:

  • Define a custom ICmpRenderer that submits vertices in world space. Don't do any software transform.
  • Use a special material that has all the required uniforms and set them in the renderer.
  • Use a special shader that does GPU-side vertex transform.

@ilexp
Copy link
Member Author

ilexp commented Sep 22, 2017

Progress

Immediate ToDo

@ilexp
Copy link
Member Author

ilexp commented Nov 26, 2017

Progress

Immediate ToDo

  • Investigate this issue again with the updated codebase in order to identify all the things that will have to be done.

@ilexp
Copy link
Member Author

ilexp commented Dec 9, 2017

Immediate ToDo

  • Adjust the default vertex formats to split position and offset.
    • Adjust all code interacting with vertices directly to mind that split and add offsets to the offset field, not position.
  • Extend builtin shader variables to include all that are required for shader vertex transform.
  • Adjust vertex shaders as described above
    • Investigate whether there is a good way to not copy-paste the default transform into every vertex shader and instead have it as some sort of a builtin function they can call. Take a brief look at Refactor Shader / DrawTechnique Resources #489 and make sure we're heading roughly in the same general direction. No need to implement the issue yet.
    • Minimal vertex shader.
    • All other vertex shaders in Duality and samples.
  • Remove all usages of PreprocessCoords and submit vertices in world space instead.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Run benchmarks and compare before / after results.
  • Investigate potential optimizations on the CPU (code) and GPU (shader) side.
    • Check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.

@ilexp ilexp self-assigned this Dec 10, 2017
@ilexp
Copy link
Member Author

ilexp commented Dec 13, 2017

Progress

  • Created a new develop-3.0-cam-vertex-transform branch to work on this.
  • Added initial support for multi-shader programs where any number of shader parts are linked into a single program, rather than just one vertex and one fragment shader.
  • Introduced builtin shader functions that are made available by linking an additional, common shader with every loaded shader program.
  • Builtin vertex transform functions are now doing all vertex transformation instead of the old ftransform(). They can be replaced with the new transformation code over the course of implementing this issue.

Immediate ToDo

  • Adjust the default vertex formats to split position and offset.
    • Adjust all code interacting with vertices directly to mind that split and add offsets to the offset field, not position.
  • Adjust the shared vertex transform functions as described above.
  • Remove all usages of PreprocessCoords and submit vertices in world space instead.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Run benchmarks and compare before / after results.
  • Investigate potential optimizations on the CPU (code) and GPU (shader) side.
    • Check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.

@ilexp
Copy link
Member Author

ilexp commented Dec 13, 2017

Progress

Immediate ToDo

  • Replace previously introduced multi-shader support.
    • Prepare AbstractShader for shader source preprocessing.
    • Consider introducing a utility class that allows to merge a primary source file with multiple others.
      • Extend the existing source at the top (added declarations) and the bottom (added implementation).
      • Keep existing #version directive at the top.
      • Avoid variable declaration collisions.
      • Use #line directives to keep compiler error messages useful.
      • Add unit tests for this where possible.
    • Use that utility class to merge builtin with actual shader source as part of preprocessing, or do so directly in AbstractShader.
    • Remove embedded vertex and fragment shader resources for the builtin shader functions source, and keep it stored internally as a string instead.
    • Remove external shader function declarations from Minimal and sample shaders, as they will no longer be needed and in fact now cause errors.
    • Throw an exception when attempting to compile a shader program that has multiple shader parts of the same type, so behavior will be equal on GL and GLES.
  • Adjust the default vertex formats to split position and offset.
    • Adjust all code interacting with vertices directly to mind that split and add offsets to the offset field, not position.
  • Adjust the shared vertex transform functions as described above.
  • Remove all usages of PreprocessCoords and submit vertices in world space instead.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Run benchmarks and compare before / after results.
  • Investigate potential optimizations on the CPU (code) and GPU (shader) side.
    • Check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.

@ilexp
Copy link
Member Author

ilexp commented Dec 14, 2017

Progress

  • Outlined API, implementation and a first test case of the new ShaderSourceBuilder utility class, which will be used to merge shader source with various chunks of shared code.
  • Smaller tweaks and fixes.

Immediate ToDo

  • Replace previously introduced multi-shader support.
    • Implement and test the new ShaderSourceBuilder utility class.
      • Avoid variable declaration collisions by commenting out subsequent declarations.
      • Keep existing #version directive at the top.
      • Add more unit tests.
    • Use ShaderSourceBuilder to merge builtin with actual shader source as part of preprocessing, or do so directly in AbstractShader.
    • Remove embedded vertex and fragment shader resources for the builtin shader functions source, and keep it stored internally as a string instead.
    • Remove external shader function declarations from Minimal and sample shaders, as they will no longer be needed and in fact now cause errors.
    • Throw an exception when attempting to compile a shader program that has multiple shader parts of the same type, so behavior will be equal on GL and GLES.
  • Adjust the default vertex formats to split position and offset.
    • Adjust all code interacting with vertices directly to mind that split and add offsets to the offset field, not position.
  • Adjust the shared vertex transform functions as described above.
  • Remove all usages of PreprocessCoords and submit vertices in world space instead.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Run benchmarks and compare before / after results.
  • Investigate potential optimizations on the CPU (code) and GPU (shader) side.
    • Check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.

@ilexp
Copy link
Member Author

ilexp commented Dec 15, 2017

Progress

  • Implemented ShaderSourceBuilder in first iteration.
  • Added tests regarding comment and version directive handling.

Immediate ToDo

  • Replace previously introduced multi-shader support.
    • Use ShaderSourceBuilder to merge builtin with actual shader source as part of preprocessing.
    • Remove external shader function declarations from Minimal and sample shaders, as they will no longer be needed and in fact now cause errors.
    • Throw an exception when attempting to compile a shader program that has multiple shader parts of the same type, so behavior will be equal on GL and GLES.
  • Adjust the default vertex formats to split position and offset.
    • Adjust all code interacting with vertices directly to mind that split and add offsets to the offset field, not position.
  • Adjust the shared vertex transform functions as described above.
  • Remove all usages of PreprocessCoords and submit vertices in world space instead.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Run benchmarks and compare before / after results.
  • Investigate potential optimizations on the CPU (code) and GPU (shader) side.
    • Check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.

@ilexp
Copy link
Member Author

ilexp commented Dec 16, 2017

Progress

  • Replaced previous multi-shader support with the new shader merge preprocessing.
  • Added shader count validation in the OpenGL backend to enforce the stricter GL ES rules on desktop as well.
  • Refactored how vertex elements are mapped to shader fields.
  • Introduced name-based vertex element to shader field mapping.

Immediate ToDo

  • Adjust the default vertex formats to split position and offset.
    • Adjust all code interacting with vertices directly to mind that split and add offsets to the offset field, not position.
  • Adjust the shared vertex transform functions as described above.
  • Remove all usages of PreprocessCoords and submit vertices in world space instead.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Run benchmarks and compare before / after results.
  • Investigate potential optimizations on the CPU (code) and GPU (shader) side.
    • Check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.

@ilexp
Copy link
Member Author

ilexp commented Dec 16, 2017

Progress

  • Added a DepthOffset vertex attribute to all vertex formats in Duality and sample projects.
  • Added a builtin "default vertex transform" shader function, which is now used in all vertex shaders and supplied with position and depth offset.
  • Adjusted all renderers to use the new DepthOffset attribute instead of adding it to their position. Also adjusted Canvas accordingly.

Immediate ToDo

  • Get a clear picture on what (OpenGL) matrix does what and how each uniform parameter is used.
    • Investigate how exactly screen space rendering would be handled.
    • Also check whether all transformation can be expressed as a regular matrix multiplication and implement it as such, if possible.
    • See if this knowledge can be used to improve / unify Camera or DrawDevice visibility check and transformation API.
  • Adjust all renderers and rendering classes to make sure they submit vertices in world space coordinates.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Adjust the shared vertex transform functions as described above.
    • Introduce builtin uniforms where required.
    • Use existing matrices where it makes sense.
  • Run benchmarks and compare before / after results.

@ilexp
Copy link
Member Author

ilexp commented Dec 17, 2017

Progress

  • Renamed ModelViewMatrix to ViewMatrix on most occurrences to reflect what Duality is actually doing.
  • Investigated OpenGL view and projection matrices in detail and found a way to describe the entire transformation shader code above in terms of a view and a projection matrix, plus a manually added depth offset post-projection.
  • Created a first, very WiP implementation of vertex shader transformation. Parallax and flat projection behave as would be expected. There are severe glitches and issues so far, but the general direction doesn't seem to be too bad.
  • Sidenote: Test code for custom projection matrices.

Immediate ToDo

  • Investigate and fix all the rendering glitches.
    • Editor grid rendering
    • Weird disappearance of objects way sooner than expected
    • Offset in editor selection markers.
    • Double-check the *= clampedNear / focusDist equation whether the near dist really needs to be part of that, or is actually destructive in cases where it's not 1.0f. Tests seem to indicate that it's correct, but need to verify.
    • Whatever else comes up.
  • Fix DrawDevice visibility check and transformation methods.
    • Maybe actually write some unit tests for them.
  • Implement the depth offset on the shader side and test it once the most glitches are fixed.
  • Take a brief look at depth buffer precision in a common arrangement of objects (background, foreground, playground)
  • Adjust all renderers and rendering classes to make sure they submit vertices in world space coordinates.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Adjust samples to match the new world space vertex input.
    • Shaders (vertex shader one relying on world position)
    • DynamicLighting (reverse engineering world position)
  • Run benchmarks and compare before / after results.

@ilexp
Copy link
Member Author

ilexp commented Dec 18, 2017

Progress

  • Fixed editor grid rendering by emitting world space vertices as expected by the new transformation setup.

Immediate ToDo

  • Investigate and fix all the rendering glitches.
    • Editor overlays not showing up at all. Drawcalls look okay, and the overlays did show up in some earlier iteration. Double-check vertex transformation in orthographic projection mod, render some test rects.
    • Weird disappearance of objects way sooner than expected. Probably the now faulty visibility check methods.
    • Offset in editor selection markers.
    • Double-check the *= clampedNear / focusDist equation whether the near dist really needs to be part of that, or is actually destructive in cases where it's not 1.0f. Tests seem to indicate that it's correct, but need to verify.
    • Whatever else comes up.
  • Fix DrawDevice visibility check and transformation methods.
    • Maybe actually write some unit tests for them.
  • Implement the depth offset on the shader side and test it once the most glitches are fixed.
  • Take a brief look at depth buffer precision in a common arrangement of objects (background, foreground, playground)
  • Adjust all renderers and rendering classes to make sure they submit vertices in world space coordinates.
    • Special consideration might be required for view-dependent renderers such as tilemap renderers or lighting enabled renderers.
    • Remove PreprocessCoords method from API, investigate whether other methods are now no longer necessary as well.
  • Adjust samples to match the new world space vertex input.
    • Shaders (vertex shader one relying on world position)
    • DynamicLighting (reverse engineering world position)
  • Run benchmarks and compare before / after results.

@ilexp
Copy link
Member Author

ilexp commented Dec 19, 2017

Progress

  • Fixed editor overlays not showing up.
  • Renamed DrawDevice IsCoordInView to IsSphereVisible and added a draft implementation - doesn't seem to work yet though.
  • Renamed RenderMatrix to RenderMode and its fields to World and Screen.
  • Renamed PerspectiveMode to ProjectionMode and its fields to Orthographic and Prespective.
  • Implemented orthographic projection matrix generation.
  • Removed PreprocessCoords from DrawDevice API entirely.
  • Simplified Canvas code a bit.

Immediate ToDo

  • Fix IsSphereVisible method. It doesn't seem to do proper culling so far. Optimize as soon as it works.
    • The clip space radius calculation is wrong. Might need to be split up into a Vector2 and somehow adjusted with projection matrix.
    • Also seems to be slower.
      • Consider re-implementing it with a specialized if-else and exploit knowledge about each case for optimizations. Google frustum sphere intersection.
      • Consider providing a batch-processed variant of the method and see if that can speed things up.
  • Update matrices when changing reference position or angle.
  • Write unit tests for DrawDevice transformation and visibility check methods.
  • Fix the slight selection marker offset (zoom in on selected objects center)
  • Fix failing Canvas rendering tests. Seems to be related to text rendering.
  • Investigate and fix all upcoming rendering glitches.
    • Test SmoothAnimation, DynamicLighting, Tilemaps, DualStickSpaceShooter
  • Fix remaining DrawDevice transformation methods.
    • GetScaleAtZ
    • GetSpaceCoord
    • GetScreenCoord
  • Implement the depth offset on the shader side.
  • Take a brief look at depth buffer precision in a common arrangement of objects (background, foreground, playground)
  • Adjust samples to match the new world space vertex input.
    • Shaders (vertex shader one relying on world position)
    • DynamicLighting (reverse engineering world position)
  • Run benchmarks and compare before / after results.

@ilexp
Copy link
Member Author

ilexp commented Dec 20, 2017

Progress

  • Tweaked DrawDevice API.
  • Added unit tests for IsSphereInView method in all projection and render modes.
  • Fixed IsSphereInView implementation.
  • DrawDevice now auto-updates its internal matrices whenever related properties change.
  • Fixed failing Canvas rendering tests.

Immediate ToDo

  • Fix the slight selection marker offset (zoom in on selected objects center)
  • Write unit tests for DrawDevice transformation methods.
  • Fix and potentially rename remaining DrawDevice transformation methods.
    • GetScaleAtZ
    • GetSpaceCoord
    • GetScreenCoord
  • Implement the depth offset on the shader side.
  • Take a brief look at depth buffer precision in a common arrangement of objects (background, foreground, playground)
  • Adjust samples to match the new world space vertex input.
    • Shaders (vertex shader one relying on world position)
    • DynamicLighting (reverse engineering world position)
  • Investigate and fix all upcoming rendering glitches.
    • Test SmoothAnimation, DynamicLighting, Tilemaps, DualStickSpaceShooter
  • Run benchmarks and compare before / after results.
  • Optimize IsSphereInView / object culling if necessary.
    • Consider re-implementing it with a specialized if-else and exploit knowledge about each case for optimizations. Google frustum sphere intersection.
    • Consider providing a batch-processed variant of the method and see if that can speed things up.

@ilexp
Copy link
Member Author

ilexp commented Dec 21, 2017

Progress

  • Wrote unit tests for DrawDevice transformation methods.
  • Renamed DrawDevice transformation and related methods to get rid of the old "coord" terminology.
  • Re-implemented the DrawDevice transformation methods using the new matrix stack.
  • Implemented depth offset in the builtin vertex transform shader function.

Immediate ToDo

  • Fix depth sorting for alpha materials to take depth offset values into account.
  • Fix the slight selection marker offset (zoom in on selected objects center)
  • Take a brief look at depth buffer precision in a common arrangement of objects (background, foreground, playground)
  • Adjust samples to match the new world space vertex input.
    • Shaders (vertex shader one relying on world position)
    • DynamicLighting (reverse engineering world position)
  • Investigate and fix all upcoming rendering glitches.
    • Test SmoothAnimation, DynamicLighting, Tilemaps, DualStickSpaceShooter
    • Rotate editor camera 90° and check if all still works, especially culling.
  • Run benchmarks and compare before / after results.
  • Optimize IsSphereInView / object culling if necessary.
    • Consider re-implementing it with a specialized if-else and exploit knowledge about each case for optimizations. Google frustum sphere intersection.
    • Consider providing a batch-processed variant of the method and see if that can speed things up.

@ilexp
Copy link
Member Author

ilexp commented Dec 22, 2017

Progress

  • Fixed depth sorting for alpha materials taking depth offset values into account.
  • Fixed various minor editor selection marker rendering offset issues. Might actually have been there all along.
  • Investigated depth precision, seems to be alright. Z fighting doesn't seem to be an issue in any case, since all the sprites are perfectly aligned, lying flat on the XY plane.
  • Fixed custom rendering setup resources and shaders.
  • Updated shader sample and dynamic lighting sample to use the new world space vertex input instead of trying to reverse-engineer it.
  • Checked editor camera rotation, works as expected.
  • Checked SmoothAnimation and DualStickSpaceShooter samples, work as expected.

Immediate ToDo

  • Fix Grid overlay in editor printing screen space mouse coordinates, rather than world space coordinates.
    • This is due to the drawing device being in screen mode when they are rendered.
    • Separate camera transform methods from drawdevice rendering transform.
  • Fix TilemapRenderer not using depth offsets.
  • Fix Tilemaps sample CameraController not being limited to map extents.
  • Run benchmarks and compare before / after results.
  • Optimize IsSphereInView / object culling if necessary.
    • Consider re-implementing it with a specialized if-else and exploit knowledge about each case for optimizations. Google frustum sphere intersection.
    • Consider providing a batch-processed variant of the method and see if that can speed things up.

@ilexp
Copy link
Member Author

ilexp commented Dec 22, 2017

Progress

  • Checked all sample projects for errors and things that need to be updated to the new setup.
  • Cameras now have two internal DrawDevice instances: One for rendering and one for coordinate transformation / projection calculations only. This way the current rendering state of a camera does not affect the results of any of its transformation methods.
  • Fixed grid overlay coordinate display.
  • Updated TilemapRenderer and ActorRenderer to use depth offsets.
  • Updated TilemapCulling to operate using the new world space vertex setup and simplified its output struct, which no longer needs any view space data.
  • Updated NearZ values in all samples to use the new default of 50.
  • Fixed CamView object visibility being ignored during editor picking operations.
  • Camera ShaderParameters property is now hidden in the Object Inspector.

Immediate ToDo

  • Run benchmarks and compare before / after results.
  • Optimize IsSphereInView / object culling if necessary.
    • Consider re-implementing it with a specialized if-else and exploit knowledge about each case for optimizations. Google frustum sphere intersection.
    • Consider providing a batch-processed variant of the method and see if that can speed things up.

@ilexp
Copy link
Member Author

ilexp commented Dec 22, 2017

Progress

  • Ran benchmarks and found the application to be slightly faster overall, with the impact of IsSphereInView being somewhat negligible compared to other factors.
  • Benchmark results:
    • Pre-V3
    • Pre-Issue
    • Pre-Issue 2, re-done due to my machine somehow being 1ms slower on the same version from the older benchmark.
    • New Results
    • Slightly reduced render time for large numbers of renderers.
  • Merged feature branch into develop-3.0.

@ilexp ilexp closed this as completed Dec 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Area: Backend core / editor plugins (OpenTK, etc.) Breaking Change Breaks binary or source compatibility Cleanup Improving form, keeping function Core Area: Duality runtime or launcher Feature It doesn't exist yet, but I want it Performance Related to runtime or editor performance Rendering Related to rendering / graphics Usability Related to API and UI usability
Development

No branches or pull requests

2 participants