Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a DirectX 11 renderer #675

Open
wants to merge 127 commits into
base: master
Choose a base branch
from

Conversation

hyblocker
Copy link

@hyblocker hyblocker commented Sep 23, 2024

I've implemented a DirectX 11 backend for RmlUI. Currently this has feature parity with the SDL renderer, but I plan on extending it to match the GL3 renderer in the near future (hence I'm marking this as a draft PR).

As for backends, currently this only has an implementation with the Win32 backend. I'll also be providing a GLFW backend at a later point.

I've introduced a pair of functions pointers only in the DX11 RenderInterface for this backend to make it more flexible and easily consumable by end users. As the current backends only support uncompressed targa files for images, and in practice one would likely want to connect it to a texture loader or something more sophisticated such as stb_image, I defined the following functions:

// Use these typedefs to overload the default implementation with another image loader so that you can handle more than uncompressed TGA if you wish.
// Must define both the load and free functions. See RenderInterface_DX11::
typedef void (*pfnLoadTextureRaw)(const Rml::String& filename, int* pWidth, int* pHeight, uint8_t** pData, size_t* pDataSize);
typedef void (*pfnFreeTextureRaw)(uint8_t* pData);

Where a user may override these as follows for example:

// Initialize the window but don't show it yet.
HWND window_handle = InitializeWindow(data->instance_handle, name, width, height, allow_resize);
if (!window_handle)
    return false;

if (!CreateDeviceD3D(window_handle)) {
    ::CloseWindow(window_handle);
    return false;
}

// Assign STB image loader to render interface
data->render_interface.LoadTextureFromFileRaw = &LoadTexture;
data->render_interface.FreeTextureFromFileRaw = &FreeTexture;
stbi_set_unpremultiply_on_load(true);
stbi_convert_iphone_png_to_rgb(true);

data->render_interface.Init(data->d3d_resources.pd3dDevice, data->d3d_resources.pd3dDeviceContext);

data->window_handle = window_handle;
data->system_interface.SetWindow(window_handle);

// Now we are ready to show the window.
::ShowWindow(window_handle, SW_SHOW);
::SetForegroundWindow(window_handle);
::SetFocus(window_handle);

The renderer will fallback to the Targa image loader if the user doesn't override the loader with their own one.

For completeness, below is the entire STB image loader I've been testing with. I have converted the sample assets to PNG to verify that it works correctly.

// Add somewhere at the top of the backend
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#include <RmlUi/Core/FileInterface.h>

// STB image based loaders
void LoadTexture(const Rml::String& filename, int* pWidth, int* pHeight, uint8_t** pData, size_t* pDataSize);
void FreeTexture(uint8_t* pData);

// STB loader implementation

void LoadTexture(const Rml::String& filename, int* pWidth, int* pHeight, uint8_t** pData, size_t* pDataSize)
{
    Rml::FileInterface* file_interface = Rml::GetFileInterface();
    Rml::FileHandle file_handle = file_interface->Open(filename);
    if (!file_handle)
    {
        *pData = nullptr;
        return;
    }

    int texture_width, texture_height, num_channels;
    *pData = stbi_load_from_file((FILE*)file_handle, &texture_width, &texture_height, &num_channels, 0);

    if (*pData != nullptr)
    {
        *pWidth = texture_width;
        *pHeight = texture_height;

        // Compute number of elements in texture
        *pDataSize = texture_width * texture_height * num_channels;

        uint8_t* dataView = *pData;

        // Pre-multiply the data, otherwise it will be loaded incorrectly
        for (int i = 0; i < *pDataSize; i += 4)
        {
            dataView[i + 0] = (uint8_t)((dataView[i + 0] * dataView[i + 3]) / 255);
            dataView[i + 1] = (uint8_t)((dataView[i + 1] * dataView[i + 3]) / 255);
            dataView[i + 2] = (uint8_t)((dataView[i + 2] * dataView[i + 3]) / 255);
        }
    }
}

void FreeTexture(uint8_t* pData)
{
    stbi_image_free(pData);
}

Not usable in current state.
This allows users to render to a render target which may either be the swapchain (in most cases it will be) or a render texture which may then be used elsewhere (eg in a 3D scene).
We should provide a image loader (in this case the uncompressed TGA loader), but should also allow users using this renderer backend to overload it with their own one. E.g. a user may have libpng in their project and they want to hook libpng into the load image function to load PNGs.
@igorsegallafa
Copy link
Contributor

Ohh nice PR @hyblocker, I was also trying to implement DX11 renderer by myself.
Do you know exactly what is missing to finish this PR or is there anything I could help with? Thank you!

@hyblocker
Copy link
Author

The PR is practically ready for merging, I'm waiting for the weekend to find the time to implement the GLFW backend and then I'll move this from draft status.

Transient buffers should still be investigated if possible but my schedule has been getting busier lately, so I think it makes more sense to merge it as a separate PR later.

@hyblocker hyblocker marked this pull request as ready for review October 29, 2024 18:05
@hyblocker
Copy link
Author

@mikke89 The DirectX11 renderer is ready for review.

@mikke89
Copy link
Owner

mikke89 commented Oct 30, 2024

Awesome, that's great to hear! Very impressive, I'll try to get around to it soon.

Would you consider it feature complete compared to the OpenGL 3 renderer? Would be nice to add an entry into the feature table of the renderers in the readme.

@hyblocker
Copy link
Author

The DirectX 11 renderer has feature parity with the OpenGL 3 renderer, including all effects, shaders, etc. (the effects sample appears to function identically between the two). I've already added the entry to the readme in the PR (see 8ed4753). It also comes with a GLFW and Win32 backend.

I've been using the DirectX11 renderer for my own project for around a month and it appears to be stable and function well.

@wh1t3lord
Copy link
Contributor

I will say some words about this implementation and here they're:

  1. There's no point in m_geometry_cache and you can just use some struct for defining your GeometryHandle and have fields for VB and IB native pointers of DirectX 11 types, you just allocate it with new operator.

It is a very bad practice using .find on render callings. Searching through hash map container without any specific purpose is still slow operation, on real-time rendering applications hash-maps used not often by frequency calling otherwise it will reduce your performance on much complex scenes immediately.

You won't even get invalid buffer if you get just don't render it but instead you do .find operation every frame, why? All information about buffers stored and provided by Rml.

See: https://github.com/mikke89/RmlUi/blob/master/Backends/RmlUi_Renderer_GL3.cpp#L1023

  1. You have really naive implementation of buffer allocation. Why do not to use one buffer for vertices & indices but sub-allocating them with some offset-allocator? Some people in this thread suggesting some transient buffers (for what?). In terms of handling resources for GPU it is better to create something once big and re-use it not even deleting (deletion when RmlUI is going to shutdown) in your case it is important to bind one buffer for vertices and indices but specify offset where your data located for current GeometryHandle.

So you allocate single buffer on init stage say like 1 Mb/512 Kb buffer that's for VB and IB and try to use offset-allocator like this https://github.com/sebbbi/OffsetAllocator (I suggest it because I will use it in my implementations)

If you want to write your own you can try why not or you can implement Transient buffers, it will improve performance but at some point using one buffer that stores all things could be a little better I didn't compare a such implementation, but I think it could be just equal.

As I understood you're a beginner and it is fine that you have some mistakes 😃, but I expect that you will fix them. Your implementation for many people exists as 'reference' implementation and it is supposed (people expect that at some point) the implementation is done well but in reality if we accept it now after some time some people from community will ask these questions what I state here and will ask: "why did you A instead of B, B is effective and please fix it". I am pretty sure that it might happen so I don't have any choice for staying in shadows and ignoring this review.

For you it is improving your understanding of real-time graphics :D, for me I did a review and I do them not often tbh.

If you still have some questions or you disagree with my opinions, feel free to write your thoughts.

@hyblocker
Copy link
Author

I've removed m_geometry_cache, I'll try improving the buffer allocation later when I have some more free time.

Copy link
Owner

@mikke89 mikke89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I am really impressed. The results look very solid overall, and it's no small feat to get all of this working, especially with all of our new renderer features too.

I haven't taken a detailed dive into the implementation itself, I want to leave you some ownership over this code as well. It would be great if you can help maintain it a bit occasionally in the future, but no obligations required. Looks like it follows the overall structure of the GL3 renderer, which is very nice and helps maintain them together.

I did some quick benchmarks on our benchmark sample:

Renderer Idle FPS Active FPS
GL2 1400 83
GL3 800 84
DX11 1500 82
VK 1200 134

Overall, it's on par with our other renderers. Performance-wise, it is not problematic in my view. It would be nice with some improved allocation techniques as mentioned above, I think that could help with the "active FPS" here. While I would encourage that, it's not a blocker for a first iteration in my view. And we probably want to improve that for our GL3 renderer too.

I implemented the screenshot feature locally and ran a comparison against the GL3 renderer on all our visual tests. It was pixel-perfect equal in 81/102 tests. Most of the remaining ones are only slightly pixel-different along rounded edges and for color interpolation, things that one would expect to be different between renderers. So that's a solid job!

There are a few tests I would like to highlight:

# similarity scores (%) max pixel difference filename Comment
13 30.5% 86 filter_blur_orb.rml Looks good statically, but there's quite a lot of flickering when moving the blur orb. This is very understandable, and quite tricky to get stable. I remember spending a lot of effort with this on the GL3 renderer.
45 29.3% 238 ink_overflow_bounding_box.rml Colors are slightly more saturated for some reason?
58 44.9% 136 mask_image.rml The heart image is slightly more blurry, as if the vertex positions or UV coords are just slightly off.
84 13.5% 528 shader_conic_gradient.rml Looks like the directions are rotated, see the screenshot below.

Out of these, I would only consider the conic gradient as an actual issue. The other ones are reasonable in a first iteration, not considered blockers, but if you feel like taking a closer look then that would be nice.

Could you format the files according to our .clang-format and .editorconfig? It looks kind of close to our clang-format, but also slightly off, and the indentations are not correct.

Again, great effort! I'm very happy to see some DirectX love, so thank you for that. Let me know if there is anything I can clarify or help out with.


image


#include <RmlUi/Core/Platform.h>

#if defined RMLUI_PLATFORM_WIN32
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping everything in a big #ifdef seems unnecessary and error prone to me, and it also messes a bit with formatting. As the renderer is only intended for Win32, I suggest instead that we make it an #error if that's not the case.

float4x4 m_transform;
float2 m_translate;

// One to one translation of the OpenGL uniforms results in a LOT of wasted space due to CBuffer alignment rules.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

Comment on lines +50 to +56
// Helper for a common cleanup pattern in DX11
#define DX_CLEANUP_RESOURCE_IF_CREATED(resource) \
if (resource) \
{ \
resource->Release(); \
resource = nullptr; \
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to replace this with a templated function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are okay with including other headers in this header file, I would suggest using CComPtr<> from atlbase.h; smart pointers over raw pointers would definitely be an improvement to avoid any possible memory leakage.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the headers are commonly available, that should be fine. This is not one of those microsoft language extensions is it, does it require any special compiler flags? Is it compatible with MinGW & co?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, MinGW might become a problem. I just feel like such a smart pointer (or any way to clean up the resources, really) would be a neater way to handle them, given how many times the helper is used.

Comment on lines +44 to +45
// Uncomment the following line to enable additional DirectX debugging.
#define RMLUI_DX_DEBUG
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this supposed to be commented out?


#include <RmlUi/Core/Platform.h>

#if defined RMLUI_PLATFORM_WIN32
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I would remove this #if wrapper (or make it an error).

Backends/RmlUi_Renderer_DX11.h Show resolved Hide resolved
Comment on lines +1368 to +1380
if (window_flipped.Top() == window_flipped.Bottom() || window_flipped.Left() == window_flipped.Right())
{
// In some cases the bounds are 0 width, so early exit since there's nothing to blur
return;
}

int pass_level = 0;
SigmaToParameters(sigma, pass_level, sigma);
if (sigma == 0)
{
// If sigma is zero we don't have anything to blur, so early exit since blurring will have no effect on the final image
return;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made sure that the method still works even if these conditions come through here. And if this is a question of optimization, I feel like all of these conditions should normally be handled at a higher level, so based on that I would rather not add more conditions here.

Comment on lines +474 to +477
#ifdef RMLUI_DEBUG
// Enable debug layer
createDeviceFlags |= D3D11_CREATE_DEVICE_DEBUG;
#endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should this be RMLUI_DX_DEBUG?

I guess if the performance implication of the debug device is trivial, then we can leave it as is. Otherwise, it's nice if performance is reasonable also in debug mode, so we can make this opt-in with that flag.


enum class ShaderGradientFunction { Linear, Radial, Conic, RepeatingLinear, RepeatingRadial, RepeatingConic }; // Must match shader definitions below.

// We need to round up at compile-time so that we can embed the
Copy link
Owner

@mikke89 mikke89 Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very suspenseful end to that sentence! ;)

| SFML | ✔️ | | | |
| GLFW | ✔️ | ✔️ | ✔️ | |
| SDL¹ | ✔️ | ✔️² | ✔️ | ✔️ |
| Platform \ Renderer | OpengGL 2 | OpengGL 3 | Vulkan | SDLrenderer | DirectX11 |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this readme: We also have a table over here with all the renderers and a feature matrix, so it would be great if you could add it there as well.

@mikke89
Copy link
Owner

mikke89 commented Nov 2, 2024

I managed to get these error messages somehow:

image

Not sure exactly how, and haven't been able to reproduce it yet. I was messing around with this sample and its settings, and doing some window resizing, minimizations and such.

@hyblocker
Copy link
Author

I'll tackle the issues you've found throughout the week. If I have the time I'll also investigate the tests with poor accuracy to see if I could improve them.

@mikke89
Copy link
Owner

mikke89 commented Dec 12, 2024

Hey, just checking in, how is it going with this feature? I'd love to see this one merged into the library. Let me know if there is anything I can help out with :)

@hyblocker
Copy link
Author

Hey, I have been a little busy with some IRL things and not had the time to address the issues that were brought up. In around a month I think I should have some time to address them.

@mikke89
Copy link
Owner

mikke89 commented Dec 12, 2024

No worries, I understand. Glad to hear you're planning to continue it, cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backends Platforms and renderers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants