NVRHI programming model is mostly a blend between DX11 and DX12, with a flavor of Vulkan. Unlike the modern GAPIs, the library tracks the resources created by the application, where they are used, when the GPU work using the resources finishes, and when it's safe to release the resources. The library also implements resource state tracking and automatic barrier placement, although it needs some hints to do so and maintain support for multiple command lists that might use the same resource. Finally, the library implements automatic handling of GPU data uploads and ray tracing acceleration structure build scratch buffers.
Compared to DX11, the API is more coarse-grained, more like DX12 or Vulkan: there are pipeline state objects (PSOs), graphics state structures, and binding layouts and sets. Issuing some draws or dispatches takes fewer API calls than it does on any of the three supported GAPIs, and more structure filling. This allows the library to implement efficient state caching and reduce the number of calls to the underlying GAPIs.
All resources (pipelines, textures, etc.) provided by NVRHI are descendants of the IResource
class, which implements the AddRef
and Release
methods, following the COM model used in DX11/12. Resources implement reference counting and are destroyed when the reference count reaches zero. Note that there are often internal references from the Device or the CommandList to the resources: these references are used to defer destruction of the resources until they are no longer used by the GPU. Actual destruction of the resources released by the application that were in use at the time of final release is performed in IDevice::runGarbageCollection()
, which is supposed to be called at least once per frame.
To automate the process of reference counting, NVRHI provides a template class RefCountPtr<T>
, same as ComPtr
provided by WRL (Windows Runtime C++ Template Library). All resource types have "handles" defined as reference counting pointers to those types, such as typedef RefCountPtr<ITexture> TextureHandle
.
As a consequence of this model, any function that accepts a resource pointer is able to convert such pointer into a handle and keep a strong reference to the resource. This is in contrast with std::shared_ptr
and std::weak_ptr
or raw pointers, where a function needs to accept a shared_ptr
to keep a strong reference.
Combined with automatic resource lifetime tracking, the typical usage for resource handles is to make them members of a class that represents a render pass or a similar entity. When the pass is no longer needed, it's destroyed by the application, automatically releasing all resources that it owns, and these resources are destroyed by the library at a later time. Similarly, NVRHI supports a "fire and forget" model: when some render pass only needs to happen once, it is valid to create resources and even pipelines in local scope, record the draw commands into a command list, maybe execute that command list, and just exit the scope.
Another important method of IResource
is getNativeObject
. This method returns any underlying GAPI pointer or handle for the specified resource and object type, if it's applicable and available. For example, getNativeObject
can be used to get ID3D12Device
from IDevice
, or VkImage
from ITexture
.
The IDevice
interface provides methods that operate on the underlying device or command queues. This includes things like creating textures, buffers, pipelines and binding sets, executing command lists, and performing timer and event queries. There are no draw or dispatch methods in IDevice
, those are all in ICommandList
. Also note that there are no methods to destroy resources: as explained above, resources are destroyed when their reference count (both internal and external) reaches zero.
NVRHI does not provide any functionality to create the underlying GAPI device(s); that is the responsibility of the application. To create an IDevice
interface over the existing device, use the GAPI-specific functions provided by the backends:
nvrhi::d3d11::createDevice
defined in<nvrhi/d3d11.h>
,nvrhi::d3d12::createDevice
defined in<nvrhi/d3d12.h>
,nvrhi::vulkan::createDevice
defined in<nvrhi/vulkan.h>
,nvrhi::validation::createDevice
defined in<nvrhi/validation.h>
.
As there is no separate abstraction for command queues, up to 3 queues must be provided at the time of IDevice
creation on DX12 or Vulkan: the graphics, compute, and copy queues. The graphics queue is required, the rest are optional.
The validation device is a wrapper that can be created around another IDevice
instance. It implements the same interface and essentially intercepts and validates all NVRHI API calls before executing them. The validation device will also wrap every command list that the application creates with a similar validation layer. All messages from the validation layer are passed to the messageCallback
interface provided to the underlying device at the time of its creation.
The ICommandList
interface provides methods that go directly into a command list, such as state manipulation and draw or dispatch commands. Command lists are created using IDevice::createCommandList
and executed using IDevice::executeCommandList
. The command list must be opened with open()
before recording any commands, and closed with close()
before being executed. It is valid (though not really tested) to close the command list and then open it again without executing. It is also valid to record multiple command lists concurrently and then execute them in any order.
Since this model is not well supported on DX11, there is a special kind of command list called "immediate" that maps to the immediate command list on DX11. The application may create multiple immediate command lists, but only one of them may be open at a time. The immediate command lists still need to be explicitly executed to function on DX12 and Vulkan.
On DX12 and Vulkan, NVRHI command lists do not map to GAPI command lists 1:1, they aggregate more resources in order to make the programming model easier to use. One command list will typically keep multiple GAPI command lists and use them in a round-robin fashion if the previously recorded instance of the command list is still being executed when the command list is re-opened. Therefore, it is valid to record and execute a command list, then immediately open it again and start recording new commands. Additionally, the command lists handle texture and buffer writes: the writeTexture
and writeBuffer
methods behave similarly to DX11's UpdateSubresource
through an upload manager that keeps a set of upload buffers and tracks their usage. In a similar fashion, the command lists also manage scratch buffers for ray tracing acceleration structure builds. Note that these upload and scratch managers never shrink their working set, so if it's necessary to release the memory after uploading a large set of textures or building many BLAS'es, the only option is to release the command list that was used for that activity, and create a new one.
NVRHI command lists implement resource state tracking and barrier placement. Since a command list may be recorded in parallel with other command lists, and then executed out of order, tracking resource states across command list boundaries at record time is impossible. The command list must know which state each referenced resource is in when it enters the command list, and which state to leave each resource in when exiting the command list. There are 3 ways to achieve that:
- Use the
beginTrackingTexture/BufferState
methods to provide the prior state information to the command list per-resource explicitly, andsetTexture/BufferState
at the end of the command list to transition the resources to the desired states. - Create textures and buffers with the
keepInitialState
descriptor field set totrue
. For such resources, a command list will assume that the resource enters the command list in itsinitialState
, and will transition the resource back to itsinitialState
at the end. - Make some resources' states permanent by calling
setPermanentTexture/BufferState
. This is useful for static resources like material textures and vertex buffers: after initialization, their contents never change, and they can be kept in aShaderResource
or similar state without ever being transitioned. Permanent resources do not require any state tracking and are therefore cheaper on the CPU side.
As part of state tracking, NVRHI will place UAV barriers between successive uses of the same resource in UnorderedAccess
state. That might not always be desired: for example, some rendering methods address the same texture as a UAV from the pixel shader, and do not care about ordering of accesses for different meshes. For such use cases, the command list provides the setEnableUavBarriersForTexture/Buffer(bool enable)
methods that can be used to temporarily remove such UAV barriers. On DX11, these methods map to NVAPI_D3D11_Begin/EndUAVOverlap
calls. Conversely, it is sometimes necessary to place UAV barriers more often than NVRHI would do it, which is at every setGraphicsState
or similar call. For example, there may be a sequence of compute passes operating on a buffer that use the same shader but different constants. As updating constants does not require a call to one of those state setting functions, an automatic barrier will not be placed. To place a UAV barrier manually, use the nvrhi::utils::texture/bufferUavBarrier
functions.
Additionally, automatic barrier placement can be disabled completely using the setEnableAutomaticBarriers
method. Command lists are initialized with automatic barriers enabled, and the state set using this method is kept when the command list is closed and re-opened. So, applications may disable automatic barriers in a command list upon creation and for its lifetime, or disable it temporarily for performance-critical sections. When running in this "manual" mode, all resource state manipulation must be performed by the application. The basic functions provided for that are setTexture/BufferState
. NVRHI also provides some convenience functions: setResourceStatesForBindingSet
and setResourceStatesForFramebuffer
. Note that all these functions only place barriers into an internal accumulator; in order to push them onto the GAPI command list, call the commitBarriers
method.
Although NVRHI manages resource lifetime and reuse hazards, sometimes it is necessary to explicitly synchronize the CPU with the GPU, or synchronize GPU workloads on different queues. There are a few synchronization features available:
-
Event queries - these are objects created using
IDevice::createEventQuery
. Event queries can be set on a command queue using theIDevice::setEventQuery
method, and then waited for on the CPU using theIDevice::waitEventQuery
method. -
Wait for Idle. Use the
IDevice::waitForIdle
method which waits for all queues to finish executing their commands. -
Inter-queue synchronization, which is provided using the
IDevice::queueWaitForCommandList
method. That method accepts an "instance" parameter, which should receive the value previously returned byIDevice::executeCommandList
.
There are two kinds of buffers, both represented by the same IBuffer
interface: regular buffers and volatile constant buffers. These are differentiated by the isVolatile
flag in the BufferDesc
structure. All buffers are created using the IDevice::createBuffer
method. To use a buffer created outside NVRHI, call IDevice::createHandleForNativeBuffer
.
Regular buffers directly map to GAPI buffer objects, and they can be CPU-accessible too. Such buffers can be used as shader resources, UAVs, index or vertex buffers, etc., can be copied to and from, and written into from the CPU using the ICommandList::writeBuffer
method. Normally buffers are created as committed resources (using the DX12 terminology), but they can also be placed into a heap if created with the isVirtual
flag. In the latter case, the application must call IDevice::bindBufferMemory
before using the buffer in any way.
Volatile constant buffers (VCB) are a special feature of NVRHI. They provide an easy and lightweight interface to upload constant buffers that are used for one or a few draw calls per frame. Semantically, a VCB is considered non-existent until the first call to writeBuffer
is made for this VCB; then it exists until the command list is closed, and the new instance of a command list needs to write into the VCB again. A command list may write into the same VCB multiple times, and the writes will update the VCB contents visible by shaders in API order. Volatile constant buffers can only be bound to binding layout items of the same kind, i.e. ResourceType::VolatileConstantBuffer
.
The implementation of VCBs is different for each GAPI:
- On DX11, VCBs directly map to buffers with the
D3D11_USAGE_DYNAMIC
usage. This means the contents of a VCB will persist over command list instances, but clearing them would be doing extra work. - On DX12, VCB instances are sub-allocated from the command list upload buffer on each call to
writeBuffer
, and bound to the shaders using root constant buffer views. - On Vulkan, VCBs are regular buffers that keep multiple versions of the data, and therefore are multiple times larger than the buffer size declared in
BufferDesc
. On each call towriteBuffer
, a new version is written; the use of each version by the GPU is tracked, and old versions are reused automatically. Since the buffer is statically sized, the number of versions to use must be predicted by the application asnumberOfUsesPerFrame * numberOfFramesInFlight
and provided to NVRHI throughBufferDesc::maxVersions
at buffer creation time. VCBs with insufficient version counts will generate a runtime error. VCBs are bound to the shaders using dynamic descriptors with offsets.
Push constants are a more lightweight version of volatile constant buffers that is useful for very small amounts of data. They have a size limit of 128 bytes, and no resource object. To use push constants, the application must declare them both in the binding layout and the binding set, and then call ICommandList::setPushConstants
after setting the graphics or compute state.
Push constants are implemented as a single small dynamic buffer on DX11, as root constants on DX12, and as push constants on Vulkan.
NVRHI supports textures of any dimension, i.e. 1D, 1DArray, 2D, 2DArray, 2DMS, Cube, CubeArray, 3D - which is specified using the TextureDesc::dimension
field. Textures are GPU-only, they cannot be read or written by the CPU; for that purpose, there is a special object called "Staging texture". All regular (non-staging) textures are created using the IDevice::createTexture
method. To use a texture create outside NVRHI, call IDevice::createHandleForNativeTexture
.
Like buffers, textures can be committed or virtual. Virtual textures are created without backing memory and must be bound to a region of a heap using the IDevice::bindTextureMemory
method before they are used in any way. Tiled or sparse textures are not supported.
Textures can be written into without the use of a staging texture, using the ICommandList::writeTexture
method, which behaves similarly to DX11's UpdateSubresource
.
Staging texture is a special primitive that allows uploading and downloading texture data to and from the GPU, respectively. It can be created using the IDevice::createStagingTexture
method. Staging textures cannot be used by shaders in any capacity, they can only be copied to and from. On DX11, staging textures are implemented as regular textures with the right usage mode. On DX12 and Vulkan, staging textures are implemented as mappable buffers.
The NVRHI resource binding system is unlike any of the GAPIs. It has two symmetrical parts: layouts and sets. Binding layouts declare which binding slots will be used by the shaders, and for what resource type. Binding sets are created using an existing layout and provide actual resources to be used by shaders.
Each binding layout consists of a number of items, instances of the BindingLayoutItem
structure. Each layout item specifies the resource type, such as Texture_SRV
, TypedBuffer_UAV
, or VolatileConstantBuffer
; and the binding slot. The binding slot directly maps to the slot used in HLSL, such as t#
slots for SRV
type bindings, u#
slots for UAV
type bindings, and so on. For push constants, the layout item must also specify the size of the constant data. Besides the layout items, the binding layout also specifies the shader visibility mask, which is shared for all resources in the layout, and the register space (only valid on DX12).
Binding layouts are used to create pipelines - graphics, compute, meshlet, or ray tracing. When creating a pipeline, a few binding layouts can be specified as part of its descriptor. The layouts are used to build the root signature on DX12 or the pipeline layout on Vulkan. For ray tracing pipelines on DXR, a single local binding layout may also be specified for each generic shader or hit group, and it will translate to the local root signature.
Each binding set consists of the same number of items as the corresponding layout, in the same order. These items are instances of the BindingSetItem
structure. They also specify the resource type and binding slot - which may seem redundant because this information is already present in the layout, but it improves code readability and bug detection. Besides the binding information, set items also provide pointers to the resources that need to be bound, and additional parameters, depending on the binding type: subresource sets and formats. For example, a Texture_UAV
type binding may bind only a subset of array slices at a single mip level, and apply a format specification if the texture is typeless. The application does not need to create any view objects for such subresource bindings, that is handled automatically by NVRHI.
Binding sets are used to issue draw and dispatch commands. Before any draw calls can be made, the application must call ICommandList::setGraphicsState
, ICommandList::setComputeState
etc. with a state description structure. That description structure includes references to a few binding sets. These binding sets must be in the same order as the layouts that were used to create the pipeline. Binding the resources to the GAPI is very cheap with this model, because all descriptors are pre-filled at the time each binding set is created.
Both binding layouts and sets are immutable, i.e. cannot be modified once they are created.
Binding sets keep strong references to all resources that are used in them. When a binding set is used in a command list, the command list will optionally keep a strong reference to the binding set (which transitively means referencing all resources in the set, but cheaply) until that instance of the command list has finished executing on the GPU, which makes sure that the resources are not destroyed while the GPU is using them. If the application knows that a certain binding set will not be released for a long time, it may set BindingSetDesc::trackLiveness
to false
, which makes the command lists not keep any strong references to such binding sets, improving CPU performance. In this case, the application must take care to wait until all GPU commands referencing the binding set are finished before releasing the set.
Modern rendering techniques, most notably ones using ray tracing, need to address resources using dynamic indices, without any static binding layout known beforehand. To support bindless rendering, NVRHI implements bindless layouts and descriptor tables.
Bindless layouts implement the IBindingLayout
interface, just like regular binding layouts, and can be used in their place when creating pipelines. A bindless layout specifies the shader visibility mask, the register spaces that will be used to bind the same array of the resources (on DX12), and the maximum capacity of the bound descriptor table (on Vulkan). To create a bindless layout, use the IDevice::createBindlessLayout
method.
Instead of "bindless sets", the runtime counterpart of a bindless layout is a descriptor table. A descriptor table is an untyped array of resource bindings, which has variable size (on DX12) and can be modified after creation. To write a binding into a descriptor table, use the IDevice::writeDescriptorTable
method; to erase a binding, use the same method with the resource type set to None
.
Descriptor tables also implement the IBindingSet
interface and can be used in place of regular binding sets when setting state.
Note: descriptor tables do not keep strong references to their resources, and therefore provide no resource lifetime tracking or automatic barrier placement. Applications must take care to synchronize descriptor table writes with GPU work and to ensure the correct state of each referenced resource - most likely, by using only permanent resources in descriptor tables.
To enable using the same HLSL shaders on both DX and Vulkan, applications typically cross-compile the shaders into SPIR-V using DXC and apply some offsets to the binding slots, depending on the resource type. The reason to apply these offsets is that Vulkan does not have separate namespaces for different resource types, such as SRVs or UAVs.
NVRHI provides a way to automatically apply these binding offsets when creating the binding layout, using the VulkanBindingOffsets
structure. The offsets are shared between all shader stages within the layout, but if different offsets are desired per-stage, multiple layouts with disjoint shader visibility masks can be used.
The default offsets used in this structure match the offsets used by the NVRHI shader compiler when targeting SPIR-V.
Like DX12 and Vulkan, NVRHI requires that applications create pipeline state objects that include all shaders, binding layouts, and some other bits of rendering state, such as rasterizer and ROP settings. There are 4 kinds of pipelines supported by NVRHI, ordered by increasing complexity:
- Compute (
IComputePipeline
). Created withIDevice::createComputePipeline
, includes a compute shader and binding layouts. - Meshlet (
IMeshletPipeline
). Created withIDevice::createMeshletPipeline
, includes up to 3 shaders - amplification, mesh, and pixel, and binding layouts. Also includes the rasterizer state (minus the viewports and stencil), depth-stencil state, and blend state. - Graphics (
IGraphicsPipeline
), created withIDevice::createGraphicsPipeline
, includes up to 5 shaders - vertex, hull, domain, geometry, pixel; binding layouts, and the same rendering state as a meshlet pipeline. - Ray tracing (
rt::IPipeline
), created withIDevice::createRayTracingPipeline
, includes many shaders and shader groups, global and local binding layouts, and pipeline settings like maximum recursion depth.
When the pipeline is created, it is immutable. It can only be used to set the rendering state on a command list and issue rendering commands:
ICommandList::setComputeState
, followed bydispatch
ordispatchIndirect
.ICommandList::setMeshletState
, followed bydispatchMesh
.ICommandList::setGraphicsState
, followed bydraw
,drawIndexed
,drawIndirect
.ICommandList::setRayTracingState
, followed bydispatchRays
.
Note that setting the state of one kind invalidated all other kinds of state, e.g. setComputeState
invalidates the previously set graphics, meshlet, or ray tracing state. The only commands that are safe to use on the command list between state setting and draw or dispatch are writeBuffer
on volatile constant buffers and setPushConstants
. Also note that VCBs must be written before they are used in any of the setState
calls, and writing them after setting the state has an extra cost; in contrast with that, push constants can only be set after the setState
call.
Following the Vulkan API for creating graphics pipelines, NVRHI has a concept of a framebuffer. A framebuffer is a collection of render targets, up to 8, and a depth target, each with its subresource set. Framebuffers hold strong references to their textures and are immutable.
A valid framebuffer is necessary to create a graphics or meshlet pipeline. A pipeline created with a certain framebuffer can then be used with the same framebuffer, or with any other framebuffer which is compatible. Two framebuffers are considered compatible when they have the same number and formats of the render targets, and the width and height of the render targets. These parameters are grouped into the FramebufferInfo
structure, which is accessible through the IFramebuffer::getFramebufferInfo
method. If two FramebufferInfo
structures are equal, their framebuffers are compatible.
NVRHI implements support for DXR 1.0 pipelines and acceleration structure handling, and supports DXR 1.1 ray queries through binding acceleration structures to any shader stage. It also implements KHR_acceleration_structure
and KHR_ray_tracing_pipeline
support on Vulkan, and supports KHR_ray_query
in the same way. The API coverage is not 100% at this time, most notably, acceleration structure compaction, growing shader objects, indirect dispatch, and Vulkan shader record buffer are not supported. It should also be noted that local binding layouts and sets are not supported on Vulkan because Vulkan does not have a concept of local bindings, besides the generic SRB.
Acceleration structures (AS) are represented by IAccelStruct
and created using the IDevice::createAccelStruct
method. The type of the AS - top or bottom - and its capacity are specified at creation time by providing a descriptor that lists the geometries that will be placed into the AS. Acceleration structure objects hold a data buffer that stores the AS itself, and use a scratch buffer managed by the command list to build. Acceleration structure buffers can be committed or virtual, just like regular buffers, and that is specified using the isVirtual
flag in the descriptor. Virtual AS have no backing memory initially, and have to be bound to a heap region by calling IDevice::bindAccelStructMemory
before they can be used.
Since the acceleration structures have an associated buffer whose size is fixed, they cannot be rebuilt to a size that exceeds the limits specified at AS creation; NVRHI will issue an error if such a build is attempted. They can, however, be rebuilt or updated in place, with updates requiring that the AllowUpdate
build flag is specified at AS creation. The NVRHI validation layer checks this and several other usage constraints, some of which would be missed by the DX12 or Vulkan validation layers, and violating those constraints would lead to GPU page faults.
Acceleration structures are bound to shaders using the same binding layout and set concept as any other resource type, using ResourceType::AccelStruct
.
Ray tracing pipelines correspond directly to DXR or Vulkan RT pipelines. They consist of a number of generic shaders, each of them named; and a number of hit groups, each of them also named and consisting of any or all of: closest hit shader, any hit shader, and intersection shader. Each generic shader and hit group can have its own (single) local binding layout on DX12; attempting to use local bindings on Vulkan will result in an error. The shaders used in RT pipelines should come from shader libraries, which are created using IDevice::createShaderLibrary
and then specialized to a particular entry using IShaderLibrary::getShader
. On Vulkan, shader specializations are also supported.
Once a pipeline is created, a shader table also needs to be created through rt::IPipeline::createShaderTable
in order to use the pipeline. The shader table object is used to build the SBT at dispatch time, in every instance of the command list that uses the shader table. Unlike the GAPI interfaces that use shader handles or pointers, NVRHI makes the shader table reference shaders and hit groups by their name - which is admittedly slower but is sufficient for the typical use case. The shader table is mutable and versioned, so it is valid to use one version of the shader table in a dispatch command, then modify the shader table, and use it again in the same command list.
To trace some rays using the pipeline method, use ICommandList::setRayTracingState
, which includes a reference to a shader table; and then use ICommandList::dispatchRays
.