Copyright (c) 2015-2024 The Brenwill Workshop Ltd.
MoltenVK provides the ability to configure and optimize MoltenVK for your particular application runtime requirements and development-time needs.
At runtime, configuration can be helpful in situtations where Metal behavior is different than Vulkan behavior, and the results or performance you receive can depend on how MoltenVK works around those differences, which, in turn, may depend on how you are using Vulkan. Different apps might benefit differently in this handling.
Additional configuration parameters can be helpful at development time by providing you with additional tracing, debugging, and performance measuring capabilities.
Each configuration parameter has a name and value, and can be passed to MoltenVK via any of the following mechanisms:
- The standard Vulkan
VK_EXT_layer_settings
extension. - Application runtime environment variables.
- Build settings at MoltenVK build time.
Parameter values configured by build settings at MoltenVK build time can be overridden
by values set by environment variables, which, in turn, can be overridden during VkInstance
creation via the Vulkan VK_EXT_layer_settings
extension.
Using the VK_EXT_layer_settings
extension is the preferred mechanism, as it is a standard
Vulkan extension, and is supported by the Vulkan loader and layers. When using the
VK_EXT_layer_settings
extension, set VkLayerSettingEXT::pLayerName
to the value of
kMVKMoltenVKDriverLayerName
found in the mvk_vulkan.h
header (or simply to "MoltenVK"
).
Using environment variables can be a convinient mechanism to modify configuration parameters
during runtime debugging in the field (if the settings are not overridden during VkInstance
creation via the Vulkan VK_EXT_layer_settings
extension).
0
: Log repeatedly every number of frames configured by theMVK_CONFIG_PERFORMANCE_LOGGING_FRAME_COUNT
parameter.1
: Log immediately after each performance measurement.2
: Log at the end of theVkDevice
lifetime. This is useful for one-shot apps such as testing frameworks.3
: Log at the end of theVkDevice
lifetime, but continue to accumulate across mulitipleVkDevices
throughout the app process. This is useful for testing frameworks that create manyVkDevices
serially.
If the MVK_CONFIG_PERFORMANCE_TRACKING
parameter is enabled, this parameter controls
when MoltenVK should log activity performance events.
Controls which extensions MoltenVK should advertise it supports in vkEnumerateInstanceExtensionProperties()
and vkEnumerateDeviceExtensionProperties()
. This can be useful when testing MoltenVK against specific
limited functionality. The value of this parameter is a Bitwise-OR
of the following values:
1
: All supported extensions.2
: WSI extensions supported on the platform.4
: Vulkan Portability Subset extensions.
Any prerequisite extensions are also advertised. If bit 1
is included, all supported
extensions will be advertised. A value of zero means no extensions will be advertised.
Controls the Vulkan API version that MoltenVK should advertise in vkEnumerateInstanceVersion()
,
after MoltenVK adds the VK_HEADER_VERSION
component.
Set this value to one of:
4202496
(decimal number forVK_API_VERSION_1_2
)4198400
(decimal number forVK_API_VERSION_1_1
)4194304
(decimal number forVK_API_VERSION_1_0
)
(The default value is an empty string).
If MVK_CONFIG_AUTO_GPU_CAPTURE_SCOPE
is any value other than 0
, this is the path to a
file where the automatic GPU capture will be saved. If this parameter is an empty string
(the default), automatic GPU capture will be handled by the Xcode user interface.
If this parameter is set to a valid file path, the Xcode scheme need not have Metal GPU capture enabled, and in fact the app need not be run under Xcode's control at all. This is useful in case the app cannot be run under Xcode's control. A path starting with '~' can be used to place it in a user's home directory. This feature requires Metal 2.2 (macOS 10.15+, iOS/tvOS 13+).
0
: No automatic GPU capture.1
: Automatically capture all GPU activity during the lifetime of aVkDevice
.2
: Automatically capture all GPU activity during the rendering and presentation of the first frame.3
: Automatically capture all GPU activity while signaled on a temporary named pipe. Automatically begins recording whenever the pipe is not empty, and records as many frames as the pipe contains bytes.
Controls whether Metal should run an automatic GPU capture without the user having to trigger it manually via the Xcode user interface, and controls the scope under which that GPU capture will occur. This is useful when trying to capture a one-shot GPU trace, such as when running a Vulkan CTS test case, or for triggering the capture via an IPC on a temporary named pipe.
For values 2
and 3
, the queue for which the frames are captured is identifed by
the values of the MVK_CONFIG_DEFAULT_GPU_CAPTURE_SCOPE_QUEUE_FAMILY_INDEX
and
MVK_CONFIG_DEFAULT_GPU_CAPTURE_SCOPE_QUEUE_INDEX
configuration parameters.
For the automatic GPU capture to occur, the environment variable MTL_CAPTURE_ENABLED
must be enabled,
or, if running the app from Xcode, the GPU Frame Capture option can be set to Metal.
To manually trigger a GPU capture via the Xcode user interface, leave this parameter at 0
.
(The default value is 1
if MoltenVK was built in Debug mode).
If enabled, debugging capabilities will be enabled, including logging shader code during runtime shader conversion.
The index of the queue family whose presentation submissions will be used as the default GPU Capture Scope, when GPU Capture is active.
The index of the queue, within the queue family identified by the
MVK_CONFIG_DEFAULT_GPU_CAPTURE_SCOPE_QUEUE_FAMILY_INDEX
parameter, whose presentation
submissions will be used as the default GPU Capture Scope, when GPU Capture is active.
If enabled, a MoltenVK logo watermark will be rendered on top of the scene. This can be enabled for publicity during demos.
0
: Metal shaders will never be compiled with the fast math option.1
: Metal shaders will always be compiled with the fast math option.2
: Metal shaders will be compiled with the fast math option, unless the shader includes execution capabilities, such asSignedZeroInfNanPreserve
, that require it to be compiled without fast math.
Identifies when Metal shaders will be compiled with the Metal fast math option enabled.
Shaders compiled with the Metal fast math option enabled perform floating point math significantly faster, but may optimize floating point operations in ways that violate the IEEE 754 standard.
Enabling Metal fast math can dramatically improve shader performance, and has little practical effect on the numerical accuracy of most shaders. As such, disabling fast math should be done carefully and deliberately. For most applications, always enabling fast math is the preferred choice.
Apps that have specific accuracy and handling needs for particular shaders, may elect to set
the value of this property to 2
, so that fast math will be disabled when compiling shaders
that request specific math accuracy and precision capabilities, such as SignedZeroInfNanPreserve
.
Forces MoltenVK to only advertise the low-power GPUs, if availble on the device.
If Metal supports native per-texture swizzling (macOS 10.15+ with Mac 2 GPU, ios/tvOS 13+), this parameter is ignored.
When running on an older version of Metal that does not support native per-texture swizzling,
if this parameter is enabled, VkImageView
swizzling is automatically performed in the converted
Metal shader code during all texture sampling and reading operations. This occurs regardless
of whether a swizzle is required for the VkImageView
associated with the Metal texture,
which may result in reduced performance.
If disabled, and native Metal per-texture swizzling is not available on the platform, the
following very limited set of VkImageView
component swizzles is supported via format substitutions:
Texture format Swizzle
-------------- -------
VK_FORMAT_R8_UNORM ZERO, ANY, ANY, RED
VK_FORMAT_A8_UNORM ALPHA, ANY, ANY, ZERO
VK_FORMAT_R8G8B8A8_UNORM BLUE, GREEN, RED, ALPHA
VK_FORMAT_R8G8B8A8_SRGB BLUE, GREEN, RED, ALPHA
VK_FORMAT_B8G8R8A8_UNORM BLUE, GREEN, RED, ALPHA
VK_FORMAT_B8G8R8A8_SRGB BLUE, GREEN, RED, ALPHA
VK_FORMAT_D32_SFLOAT_S8_UINT RED, ANY, ANY, ANY (stencil only)
VK_FORMAT_D24_UNORM_S8_UINT RED, ANY, ANY, ANY (stencil only)
If native per-texture swizzling is not available, and this feature is not enabled, an error is logged and returned in the following situations:
VkImageView
creation if thatVkImageView
requires full image view swizzling.- A pipeline that was not compiled with full image view swizzling uses a
VkImageView
that is expecting a swizzle. VkPhysicalDeviceImageFormatInfo2KHR
is passed in a call tovkGetPhysicalDeviceImageFormatProperties2KHR()
to query for anVkImageView
format that will require full swizzling.
0
: No logging.1
: Log errors only.2
: Log errors and warning messages.3
: Log errors, warnings and informational messages.4
: Log errors, warnings, infos and debug messages.
Controls the level of logging performed by MoltenVK.
The maximum number of Metal command buffers that can be concurrently active per Vulkan queue. The number
of active Metal command buffers required depends on the MVK_CONFIG_PREFILL_METAL_COMMAND_BUFFERS
parameter.
If MVK_CONFIG_PREFILL_METAL_COMMAND_BUFFERS
is set to anything other than 0
, one Metal command buffer
is required per Vulkan command buffer, otherwise one Metal command buffer is required per command buffer
queue submission, which will typically be significantly less than the number of Vulkan command buffers.
The maximum amount of time, in nanoseconds, to wait for a Metal library, function, or pipeline state object to be compiled and created by the Metal compiler. An internal error within the Metal compiler may stall the thread for up to 30 seconds. Setting this value limits that delay to a specified amount of time, allowing shader compilations to fail fast.
If the MVK_CONFIG_PERFORMANCE_TRACKING
parameter is enabled, and this parameter is non-zero,
performance and frame-based statistics will be logged, on a repeating cycle, once per this many frames.
If this parameter is zero, or the MVK_CONFIG_PERFORMANCE_TRACKING
parameter is disabled,
no frame-based performance statistics will be logged.
If enabled, performance statistics, as defined by the MVKPerformanceStatistics
structure,
are collected, and can be retrieved via the private-API vkGetPerformanceStatisticsMVK()
function.
You can also use the MVK_CONFIG_ACTIVITY_PERFORMANCE_LOGGING_STYLE
and
MVK_CONFIG_PERFORMANCE_LOGGING_FRAME_COUNT
parameters to configure when to log the performance statistics collected by this parameter.
Controls whether MoltenVK should preallocate memory in each VkDescriptorPool
according
to the values of the VkDescriptorPoolSize
parameters. Doing so may improve descriptor set
allocation performance and memory stability at a cost of preallocated application memory.
If this setting is disabled, the descriptors required for a descriptor set will be individually
dynamically allocated in application memory when the descriptor set itself is allocated.
0
: During Vulkan command buffer filling, do not prefill a Metal command buffer for each Vulkan command buffer. A single Metal command buffer will be created and encoded for all the Vulkan command buffers included whenvkQueueSubmit()
is called. MoltenVK automatically creates and drains a single Metal object autorelease pool whenvkQueueSubmit()
is called. This is the fastest option, but potentially has the largest memory footprint.1
: During Vulkan command buffer filling, encode to the Metal command buffer whenvkEndCommandBuffer()
is called. MoltenVK automatically creates and drains a single Metal object autorelease pool whenvkEndCommandBuffer()
is called. This option has the fastest performance, and the largest memory footprint, of the prefilling options using autorelease pools.2
: During Vulkan command buffer filling, as each command is submitted to the Vulkan command buffer, immediately encode it to the Metal command buffer, and do not retain any command content in the Vulkan command buffer. MoltenVK automatically creates and drains a Metal object autorelease pool for each and every command added to the Vulkan command buffer. This option has the smallest memory footprint, and the slowest performance, of the prefilling options using autorelease pools.3
: During Vulkan command buffer filling, as each command is submitted to the Vulkan command buffer, immediately encode it to the Metal command buffer, do not retain any command content in the Vulkan command buffer, and assume the app will ensure that each thread that fills commands into a Vulkan command buffer has a Metal autorelease pool. MoltenVK will not create and drain any autorelease pools during encoding. This is the fastest prefilling option, and generally has a small memory footprint, depending on when the app-provided autorelease pool drains.
For any value other than 0
, be aware of the following:
- One Metal command buffer is required for each Vulkan command buffer. Depending on the
number of command buffers that you use, you may also need to change the value of the
MVK_CONFIG_MAX_ACTIVE_METAL_COMMAND_BUFFERS_PER_QUEUE
parameter. - Prefilling of a Metal command buffer will not occur during the filling of secondary command buffers
(
VK_COMMAND_BUFFER_LEVEL_SECONDARY
), or for primary command buffers that are intended to be submitted to multiple queues concurrently (VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT
). - For primary command buffers that are intended to be reused (
VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT
is not set), prefilling will only apply to the first submission. Later submissions of the same command buffer will behave as if this configuration parameter is set to0
. - If you have recorded commands to a Vulkan command buffer, and then choose to reset that command buffer instead of submitting it, the corresponding prefilled Metal command buffer will still be submitted. This is because Metal command buffers do not support the concept of being reset after being filled. Depending on when and how often you do this, it may cause unexpected visual artifacts and unnecessary GPU load.
- This configuration is incompatible with updating descriptors after binding. If any of the UpdateAfterBind
feature flags of
VkPhysicalDeviceDescriptorIndexingFeatures
orVkPhysicalDeviceInlineUniformBlockFeatures
have been enabled, the value of this parameter will be ignored and treated as if it is0
.
Controls whether MoltenVK should treat a lost VkDevice
as resumable, unless the corresponding
VkPhysicalDevice
has also been lost. The VK_ERROR_DEVICE_LOST
error has a broad definitional range,
and can mean anything from a GPU hiccup on the current command buffer submission, to a physically removed
GPU. In the case where this error does not impact the VkPhysicalDevice
, Vulkan requires that the app
destroy and re-create a new VkDevice
. However, not all apps (including CTS) respect that requirement,
leading to what might be a transient command submission failure causing an unexpected catastrophic app failure.
If this parameter is enabled, in the case of a VK_ERROR_DEVICE_LOST
error that does NOT impact
the VkPhysicalDevice
, MoltenVK will log the error, but will not mark the VkDevice
as lost,
allowing the VkDevice
to continue to be used. If this parameter is disabled, MoltenVK will
mark the VkDevice
as lost, and subsequent use of that VkDevice
will be reduced or prohibited.
0
: No compression.1
:LZFSE
: Apple proprietary. Good balance of high performance and small compression size, particularly for larger data content.2
:ZLib
: Open cross-platform format. For smaller data content, has better performance and smaller size thanLZFSE
.3
:LZ4
: Fastest performance. Largest compression size.4
:LZMA
: Slowest performance. Smallest compression size, particular with larger content.
Pipeline cache compression is available for macOS 10.15+, and iOS/tvOS 13.0+.
Controls the type of compression to use on the MSL source code that is stored in memory for use in a pipeline cache.
After being converted from SPIR-V, or loaded directly into a VkShaderModule
, and then compiled into a MTLLibrary
,
the MSL source code is no longer needed for operation, but it is retained so it can be written out as part of a
pipeline cache export. When a large number of shaders are loaded, this can consume significant memory. In such a case,
this parameter can be used to compress the MSL source code that is awaiting export as part of a pipeline cache.
If enabled, MSL vertex shader code created during runtime shader conversion will flip the Y-axis of each vertex, as the Vulkan Y-axis is the inverse of OpenGL.
An alternate way to reverse the Y-axis is to employ a negative Y-axis value on the viewport, in which case this parameter can be disabled.
Maximize the concurrent executing compilation tasks.
To have effect, this parameter requires macOS 13.3+, and has no effect on iOS or tvOS.
Metal does not distinguish functionality between queues, which would normally mean only a single general-purpose queue family with multiple queues is needed. However, Vulkan associates command buffers with a queue family, whereas Metal associates command buffers with a specific Metal queue. In order to allow a Metal command buffer to be prefilled before it is formally submitted to a Vulkan queue, each Vulkan queue family can support only a single Metal queue. As a result, in order to provide parallel queue operations, MoltenVK provides multiple queue families, each with a single queue.
If this parameter is disabled, all queue families will be advertised as having general-purpose graphics + compute + transfer functionality, which is how the actual Metal queues behave.
If this parameter is enabled, one queue family will be advertised as having general-purpose graphics + compute + transfer functionality, and the remaining queue families will be advertised as having specialized graphics or compute or transfer functionality, to make it easier for some apps to select a queue family with the appropriate requirements.
Depending on the GPU, Metal allows 8,192 or 32,768 occlusion queries per MTLBuffer
.
If enabled, MoltenVK allocates a MTLBuffer
for each query pool, allowing each query
pool to support that permitted number of queries. This may slow performance or cause
unexpected behaviour if the query pool is not established prior to a Metal renderpass,
or if the query pool is changed within a renderpass. If disabled, one MTLBuffer
will
be shared by all query pools, which improves performance, but limits the total device
queries to the permitted number.
If enabled, swapchain images will use simple Nearest sampling when minifying or magnifying the swapchain image to fit a physical display surface. If disabled, swapchain images will use Linear sampling when magnifying the swapchain image to fit a physical display surface. Enabling this setting avoids smearing effects when swapchain images are simple interger multiples of display pixels (eg- macOS Retina, and typical of graphics apps and games), but may cause aliasing effects when using non-integer display scaling.
If enabled, when the app creates a VkDevice
from a VkPhysicalDevice
(GPU) that is neither
headless nor low-power, and is different than the GPU used by the windowing system, the
windowing system will be forced to switch to use the GPU selected by the Vulkan app.
When the Vulkan app is ended, the windowing system will automatically switch back to
using the previous GPU, depending on the usage requirements of other running apps.
If disabled, the Vulkan app will render using its selected GPU, and if the windowing system uses a different GPU, the windowing system compositor will automatically copy framebuffer content from the app GPU to the windowing system GPU.
The value of this parmeter has no effect on systems with a single GPU, or when the
Vulkan app creates a VkDevice
from a low-power or headless VkPhysicalDevice
(GPU).
Switching the windowing system GPU to match the Vulkan app GPU maximizes app performance, because it avoids the windowing system compositor from having to copy framebuffer content between GPUs on each rendered frame. However, doing so forces the entire system to potentially switch to using a GPU that may consume more power while the app is running.
Some Vulkan apps may want to render using a high-power GPU, but leave it up to the system window compositor to determine how best to blend content with the windowing system, and as a result, may want to disable this parameter.
(The default value is 0
for OS versions prior to macOS 10.14+/iOS 12+).
If enabled, queue command submissions vkQueueSubmit()
and vkQueuePresentKHR()
will be processed on the thread that called the submission function. If disabled,
processing will be dispatched to a GCD dispatch_queue
whose priority is determined
by VkDeviceQueueCreateInfo::pQueuePriorities
during vkCreateDevice()
.
Controls whether MoltenVK should use a Metal 2D texture with a height of 1 for a Vulkan 1D image, or use a native Metal 1D texture. Metal imposes significant restrictions on native 1D textures, including not being renderable, clearable, or permitting mipmaps. Using a Metal 2D texture allows Vulkan 1D textures to support this additional functionality.
This parameter is ignored on Apple Silicon (Apple GPU) devices.
Non-Apple GPUs can have a dynamic timestamp period, which varies over time according to GPU
workload. Depending on how often the app samples the VkPhysicalDeviceLimits::timestampPeriod
value using vkGetPhysicalDeviceProperties()
, the app may want up-to-date, but potentially
volatile values, or it may find average values more useful.
The value of this parameter sets the alpha (A)
value of a simple lowpass filter on the
timestampPeriod
value, of the form:
TPout = (1 - A)TPout + (A * TPin)
The alpha value can be set to a float between 0.0
and 1.0
. Values of alpha closer to 0.0
cause the value of timestampPeriod
to vary slowly over time and be less volatile, and values
of alpha closer to 1.0
cause the value of timestampPeriod
to vary quickly and be more volatile.
Apps that query the timestampPeriod
value infrequently will prefer low volatility, whereas
apps that query frequently may prefer higher volatility, to track more recent changes.
0
: No Vulkan call logging.1
: Log the name of each Vulkan call when the call is entered.2
: Log the name and thread ID of each Vulkan call when the call is entered.3
: Log the name of each Vulkan call when the call is entered and exited. This effectively brackets any other logging activity within the scope of the Vulkan call.4
: Log the name and thread ID of each Vulkan call when the call is entered, and name when exited. This effectively brackets any other logging activity within the scope of the Vulkan call.5
: Same as3
, plus logs the time spent inside the Vulkan function.6
: Same as4
, plus logs the time spent inside the Vulkan function.
Controls the information MoltenVK logs for each Vulkan call made by the application.
Controls whether MoltenVK should use pools to manage memory used when adding commands to command buffers.
If this setting is enabled, MoltenVK will use a pool to hold command resources for reuse during command execution.
If this setting is disabled, command memory is allocated and destroyed each time a command is executed.
This is a classic time-space trade off. When command pooling is active, the memory in the pool can be
cleared via a call to the vkTrimCommandPoolKHR()
command.
Controls whether MoltenVK should use Metal argument buffers for resources defined in descriptor sets, if Metal argument buffers are supported on the platform. Using Metal argument buffers dramatically increases the number of buffers, textures and samplers that can be bound to a pipeline shader, and in most cases improves performance.
Controls whether MoltenVK should use MTLHeaps
for allocating textures and buffers from device memory.
If this setting is enabled, and placement MTLHeaps
are available on the platform, MoltenVK will allocate a
placement MTLHeap
for each VkDeviceMemory
instance, and allocate textures and buffers from that placement heap.
If this parameter is disabled, MoltenVK will allocate textures and buffers from general device memory.
Apple recommends that MTLHeaps
should only be used for specific requirements such as aliasing or hazard tracking,
and MoltenVK testing has shown that allocating multiple textures of different types or usages from one MTLHeap
can occassionally cause corruption issues under certain circumstances.
0
: Limit Vulkan to a single queue, with no explicit semaphore synchronization, and use Metal's implicit guarantees that all operations submitted to a queue will give the same result as if they had been run in submission order.1
: Use Metal events (MTLEvent
) when available on the platform, and where safe. This will revert to the same as0
on some NVIDIA GPUs and Rosetta2, due to potential challenges withMTLEvents
on those platforms, or in older environments whereMTLEvents
are not supported.2
: Always use Metal events (MTLEvent
) when available on the platform. This will revert to the same as0
in older environments whereMTLEvents
are not supported.3
: Use CPU callbacks upon GPU submission completion. This is the slowest technique, but allows multiple queues, compared to0
.
Determines the style used to implement Vulkan semaphore (VkSemaphore
) functionality in Metal.
In the special case of VK_SEMAPHORE_TYPE_TIMELINE
semaphores, MoltenVK will always use
MTLSharedEvent
if it is available on the platform, regardless of the value of this parameter.
If enabled, MoltenVK will use private interfaces exposed by Metal to implement Vulkan features that are difficult to support otherwise.
Unlike MVK_USE_METAL_PRIVATE_API
, this setting may be overridden at run time.
This option is not available unless MoltenVK was built with MVK_USE_METAL_PRIVATE_API
set to 1
.
(The default value is an empty string).
If not empty, MoltenVK will dump all SPIR-V shaders, compiled MSL shaders, and pipeline shader lists to the given directory. The directory will be non-recursively created if it doesn't already exist.