Accurate event for when a swapchain image is visible on screen #370

haasn · 2016-09-18T13:30:04Z

I see no way currently to figure out when a swapchain image is actually visible on the screen.

Imagine an application which needs 4ms to execute a draw call and is running on a 16ms vsync display. Here's what a timeline could look like: (correct me if I'm wrong), supposing that we start the application immediately after a vsync already happened.

t=0ms: Swapchain is created and all command buffers drawing to its images are recorded. Images are guarded by a semaphore (respectively).
t=0ms: The application acquires the next image for use, which will signal the semaphore and (optionally) a fence
t=0ms: The application submits a draw command which will wait for the signal, remove it, and resignal it once done
t=0ms: The application queues up the image for presentation, which will wait for signal and remove it again

After this batch of setup, the following things happen:
3. t=0ms: the semaphore is signaled right away, and (optionally) the fence is triggered indicating that the image acquired in step 2 is available for use. The semaphore being signalled allows the draw command to start
4. t=4ms: The draw command finishes, and signals the semaphore again. This allows the GPU to start using the image for presentation (removing the signal). But it is not visible yet, because the next page flip has not yet occurred
5. t=16ms: The GPU flips pages and actually starts displaying the screen.
... at this point it is assume that the application also does whatever is necessary for drawing the next frame
6. t=32ms: The GPU flips pages again and stops using the surface (signalling the semaphore). Assume it takes 1ms for the image to get freed up and be reusable again
7. t=33ms: The application would be able to acquire the image again (i.e. triggering the fence)

To summarize it, on the CPU side of things I can get accurate information about the following points in time:

The image is ready for use (t=0ms, t=33ms)
The draw command completes, but the image is not yet visible (e.g. by triggering an event from the end of the command buffer)

But I can't seem to get any reliable information about t=16ms, i.e. when the frame I just submitted is actually visible. This is important to me because I need to measure display latency and effective refresh rate accurately.

The problem gets worse if I use a large swapchain. For example, suppose my swapchain is size 4.

In the first world, i.e. where I wait on the fence indicating that the image is ready for use again, I would measure differences in frame times something like this:

t=0ms -> delta = 0ms
t=0ms -> delta = 0ms
t=0ms -> delta = 0ms
t=0ms -> delta = 0ms
t=17ms -> delta = 17ms
t=33ms -> delta = 16ms
t=49ms -> delta = 16ms
...

In the second world, i.e. where I trigger an event once I've finished rendering and wait on that to complete, I would measure frame times like this:

t=4ms -> delta = 4ms
t=8ms -> delta = 4ms
t=12ms -> delta = 4ms
t=16ms -> delta = 4ms
t=20ms -> delta = 4ms
t=36ms -> delta = 16ms
t=52ms -> delta = 16ms
...

Basically, they all converge to the true vsync timing (16ms) in the limit, but the measurements at the start will always be off since the GPU can already acquire the next image and/or render to it well in advance of when it will actually be used.

How do you advise accomplishing what I want? (Measuring the real delay between submitting a frame and it being visible on screen)

ianelliottus · 2016-09-21T18:59:54Z

Hi @haasn,

What you are asking for is very reasonable. Unfortunately, we don't have a solution for you at this time. Khronos is working on it, but I'm sorry to say that there's no estimated time for when we'll be shipping a solution.

I'd like to understand what you want, to compare it with other requests we've received. Is your goal to call vkQueuePresentKHR() and then be able to find out when the image(s) are actually presented? Something different? Something additional?

To clarify, your description makes it sound like you are using the same semaphore for multiple purposes, which is not correct. Was that just to make it easier for you to describe?

Thanks for your input/feedback!
Ian Elliott

haasn · 2016-09-21T21:21:27Z

Is your goal to call vkQueuePresentKHR() and then be able to find out when the image(s) are actually presented? Something different? Something additional?

My ultimate goal is to keep audio and video playback synchronized while minimizing glitches due to repeated or droppde frames, which requires measuring 1. display refresh rate, and 2. frame skips.

In the case of 1., I do not want to rely on the EDID information or “reported” display refresh rate alone, but I want to measure it in realtime, since these can be both inconsistent and subtly different.

In the case of 2., I need to know when I've dropped a vsync due to rendering too slowly. For example, imagine a program which uses a swapchain of size 4, acquires 4 imags, submits 4 draw calls, and 4 vkQueuePhresentKHR calls at t=0ms. Depending on how long the draw calls actually took to execute, the semaphore guarding each rendered image might not be triggered in time for the corresponding page flip, so it might be the case that frame 2 was displayed for 32ms instead of 16ms. In this case, I need to know it, so I can resynchronize the video and audio.

It's worth noting that the approaches I have already outlined (waiting on an event which the end of my rendering command emits, and waiting on a fence signalling the next image was acquired) both cover my needs already, in the limit. The only complaint I have about them is that the timing gets thrown off near the beginning of playback, which I'm trying to minimize since it can throw off stuff like averaging filters for the duration of the averaging window.

If I had to design an API for this myself, I would loosely suggest the following:

Add a ‘fence’ parameter to vkQueuePresentKHR, which gets signalled once the image queued is made visible. This works fine for FIFO-like presentation modes, but it won't necessarily work reliably for mailbox swapchains because an image queued might never be visible (e.g. triple buffering). Personally, I have no use for mailbox mode, but it might still be worth thinking about.
Add an API call to block until the next page flip, and perhaps also report the index of the swapchain image that got made visible in this page flip, or -1 if none.

Of these two, 2. might be the more powerful of the two approaches, since it solves a number of problems:

Lets me explicitly detect missed vsyncs without having to rely on comparing the actual and expected timing myself
Lets me know in retrospect how long it took for given frame to be visible (by recording the timestamps and indices returned)
Lets me easily wait however long it takes until a given image is visible (by looping)
Lets me synchronize the start of playback to a vsync boundary (which can solve a few edge cases)
Works well with fifo-mailbox swapchains

It also requires no changes to existing API calls. So all things considered, that's the approach I'd be happiest with, I think.

To clarify, your description makes it sound like you are using the same semaphore for multiple purposes, which is not correct. Was that just to make it easier for you to describe?

I wrote this post before having a solid understanding of the rules for semaphore use and ordering. You're right in that you'd usually use a pair of semaphores for each image in a swapchain. That said, I think you could re-use the same semaphore for both directions as long as it's done on the same VkQueue. Either way, I don't think the distinction is meaningful for this problem.

ianelliottus · 2016-09-22T16:17:00Z

@haasn, thanks for your input! It makes sense and will help us design a good solution.

To clarify, your description makes it sound like you are using the same semaphore for multiple purposes, which is not correct. Was that just to make it easier for you to describe?
I wrote this post before having a solid understanding of the rules for semaphore use and ordering. ... Either way, I don't think the distinction is meaningful for this problem.

Yes, it was orthogonal to the main topic (just an FYI).

ghost · 2017-05-18T07:27:59Z

Without knowing much about Vulkan, I think such an API should provide the following mechanisms:

retrieve the current refresh rate
retrieve the time of the most recent swap event
- a vsync counter, which is incremented on each "physical" screen refresh
- or by real time (would be useful for gsync/freesync)
- possibly with a flag that determines whether an image, whose targeted display time is in the past, should be displayed anyway for at least 1 refresh, or should be dropped
possibly retrieve or set the current display latency (if that even makes sense on Vulkan's level, I don't know) (the MS DXGI API has this)
possibly allow very quick refresh rate changing (useful for displays and connects which support it, and could be emulated by gsync/freesync) (the MS DXGI API has this)
set explicitly when a queued image should be displayed
- by a vsync number (for example for intentionally skipping a number of refresh cycles)
- or by real time time (would be useful for gsync/freesync)
feedback when a previously queued image was displayed
- reporting the time (again by vsync time or real time (or both))
- there should be some sort of event mechanism, that avoids the need to block in the caller
- of course it must be possible to associate this feedback with input images (some APIs report feedback for a single past frame, and make it surprisingly tricky to associate it with a user's swap call)

Here are some links to other display APIs, which try to deal with this, for better or worse, in no specific order:
https://cgit.freedesktop.org/wayland/wayland-protocols/tree/stable/presentation-time/presentation-time.xml
https://www.khronos.org/registry/OpenGL/extensions/OML/GLX_OML_sync_control.txt
https://msdn.microsoft.com/en-us/library/windows/desktop/bb173060.aspx (and others)
http://http.download.nvidia.com/XFree86/vdpau/doxygen/html/group___vdp_presentation_queue.html

ratchetfreak · 2017-05-18T12:23:40Z

VK_GOOGLE_display_timing nearly all of that already...

haasn · 2017-05-20T23:04:02Z

It's worth pointing out that VK_KHX_display_control also covers some of this, notably vkRegisterDisplayEventEXT (allows signalling a VkFence when a frame becomes visible) and vkGetSwapchainCounterEXT (allows counting the number of vlanks on a display).

More interestingly, nvidia has added support for VK_KHX_display_control in their new 381.22 drivers; while there's no support for VK_GOOGLE_display_timing (yet).

haasn · 2017-08-29T06:58:26Z

Upon re-approaching this problem, I noticed that this is not just a requirement for “accurate” vsync timing the way mpv does it - this is in fact a basic requirement for simply metering rendering to the display rate at all. (i.e. implementing vsync).

The vulkan samples I'm looking at (e.g. cube.c from LunarG/VulkanSamples) seem to essentially do this:

while (outstanding_fences == max_frame_latency) { wait(fences); }
vkAcquireNextImageKHR(signal=acquired, fence=NULL);
vkQueueSubmit(wait=acquired, signal=done, fence=fences[i]);
vkQueuePresentKHR(wait=done, index=i);

But this appears to have a rather serious bug: It only waits on the vkQueueSubmit to complete, not on the actual vkQueuePresentKHR. So if you imagine a GPU that renders a cube at 1000 fps, the fences would all fire 1ms after the corresponding vkQueueSubmit, and thus the only factor metering rendering speed here is the implicit assumption that vkAcquireNextImageKHR will block if the available swapchain images are all stuck in the presentation queue (due to the vkQueuePresentKHR calls). But the spec explicitly states that you cannot rely on vkAcquireNextImageKHR blocking to meter rendering speed, because implementations could have an arbitrary upper bound (or even no upper bound whatsoever) on the size of the swapchain. After all, the only thing the application can do is set the minimum swapchain size, not the maximum.

If even demo applications like cube.c seem to get this wrong, then I'm at a complete loss for what khronos expects the correct behavior to look like. It seems like changing this at the source requires either vkQueuePresentKHR to signal a fence once the frame either leaves the frame queue (and becomes the active front-buffer), or once it's done being (and the contents have effectively been fully sent to the GPU). Alternatively, vkQueuePresentKHR could be redesigned to be part of a command buffer - so you could vkCmdQueuePresentKHR(cmdbuf, image, swapchain);, and this command would be marked as “pending” until the image is no longer in use.

haasn · 2017-08-29T07:41:42Z

It also seems like VK_EXT_display_control may not be as good a solution for this problem as I had originally anticipated: It requires a VkDisplayKHR, which I can't necessarily easily figure out. (Shouldn't the VkSurface have this information?) It also has a very, very awkward design. (For some reason it seems to violate vulkan API conventions by requiring that the pAllocator be non-NULL. I don't have a custom allocator though, can't it just use malloc like everything else? Or is that because it expects me to allocate a new fence for every vsync? Why can't it just re-use the same fence like literally everything other command?)

cubanismo · 2017-08-29T15:32:45Z

@haasn, I agree the first pixel event in VK_EXT_display_control is not a good solution for this problem. It simply generates a signal when the next vblank occurs. That doesn't necessarily correspond to when any prior-submitted presentation command completes. I agree the Google display timing spec is a closer match to your needs, but it is unlikely we will ever implement it outside of Android. Its semantics don't align with the capabilities available to us across other operating systems. We'll continue to work on a general solution for this problem within the Khronos working groups. As @ianelliottus mentioned, we're aware it's a sorely needed bit of functionality missing from the current specs.

There's nothing special about the allocator requirements of the functions in VK_EXT_display_control. They will fall back to the system allocator if pAllocator is NULL. If you're seeing issues with that, let me know, and ideally provide some code snippets illustrating the problem. This would be a bug.

Yes, the notifications in VK_EXT_display_control require using VK_KHR_display. Note that doesn't mean you need to be using a swapchain that presents to a VK_KHR_display. VK_KHR_display just allows enumerating displays, and VK_EXT_display_control let's you wait for events on those displays. You'd need some way to figure out what display your window system is using for swapchains presenting to it though to correlate the events back to your presentation commands. You could do this on X11 with the RANDR correlation function provided in VK_EXT_acquire_xlib_display. I'm not aware of definitive solutions available for other platforms at the moment, but you could compare display names with some native API to make an educated guess.

Yes a new fence does need to be allocated for each vblank. This design choice of creating a fence when requesting the events was made because these fences were different enough from regular fences that we would essentially have to do the equivalent of re-creating the fence anyway within the driver to convert an existing fence into a vblank event, and I needed to ensure they were not shareable using the new fence export extensions. The need to create a new fence every time was a side effect of that. In retrospect, I wish I'd created a new object type entirely to handle these notifications, and allowed them to be reusable. If there's ever a KHR version of this functionality, that's likely the direction I'll recommend.

haasn · 2017-08-29T16:18:39Z

There's nothing special about the allocator requirements of the functions in VK_EXT_display_control. They will fall back to the system allocator if pAllocator is NULL. If you're seeing issues with that, let me know, and ideally provide some code snippets illustrating the problem. This would be a bug.

From the spec:

pAllocator must be a pointer to a valid VkAllocationCallbacks structure

This goes against the convention of most other pAllocator functions, which all state:

If pAllocator is not NULL, pAllocator must be a pointer to a valid VkAllocationCallbacks structure

So it's actually a spec-documented deviation, not an implementation bug. The validation layers also confirm this:

vk [ParameterValidation] 4: vkRegisterDisplayEventEXT: required parameter pAllocator specified as NULL (obj 0x0 (unknown object), loc 0xb3)

(But perhaps it's a bug in the specification)

cubanismo · 2017-08-29T17:31:36Z

That is indeed a bug in the spec. Thanks for pointing it out. I'll get it fixed.

This time based on RA. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The entire thing depends on VK_NV_glsl_shader, which is a god-awful nvidia-exclusive hack that barely works and is held together with duct tape and prayers. Long-term, we really, REALLY need to figure out a way to use a GLSL->SPIR-V middleware like glslang. The problem with glslang in particular is that it's a gigantic pile of awful, but maybe time will help here.. 2. We don't use async transfer at all. This is very difficult, but doable in theory with the newer design. Would require refactoring vk_cmdpool slightly, and also expanding ra_vk.active_cmd to include commands on the async queue as well. Also, async compute is pretty much impossible to benefit from because we need to pingpong with serial dependencies anyway. (Sorry AMD users, you fell for the async compute meme) 3. Lots of resource deallocation callbacks are thread-safe (because the vulkan device itself is, and once we've added a free callback we're pretty much guaranteed to never use that resource again from within mpv). As such, we could call those cleanup callbacks from a different thread. This would make stuff slightly more responsive when deallocating lots of resources at once. (e.g. resizing swapchain) 4. The custom memory allocator is pretty naive. It's prone to under-allocating memory, allocation thrashing, freeing slabs too aggressively, and general slowness due to allocating from the same thread. In addition to making it smarter, we should also make it multi-threaded: ideally it would free slabs from a different thread, and also pre-allocate slabs from a different thread if it reaches some critical "low" threshold on the amount of available bytes. (Perhaps relative to the current heap size). These limitations manifest themselves as occasional choppy performance when changing the window size. 5. The swapchain code and ANGLE's swapchain code could share common options somehow. Left away for now because I don't want to deal with that headache for the time being. 6. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370)

This time based on RA. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The entire thing depends on VK_NV_glsl_shader, which is a god-awful nvidia-exclusive hack that barely works and is held together with duct tape and prayers. Long-term, we really, REALLY need to figure out a way to use a GLSL->SPIR-V middleware like glslang. The problem with glslang in particular is that it's a gigantic pile of awful, but maybe time will help here.. 2. We don't use async transfer at all. This is very difficult, but doable in theory with the newer design. Would require refactoring vk_cmdpool slightly, and also expanding ra_vk.active_cmd to include commands on the async queue as well. Also, async compute is pretty much impossible to benefit from because we need to pingpong with serial dependencies anyway. (Sorry AMD users, you fell for the async compute meme) 3. The custom memory allocator is pretty naive. It's prone to under-allocating memory, allocation thrashing, freeing slabs too aggressively, and general slowness due to allocating from the same thread. In addition to making it smarter, we should also make it multi-threaded: ideally it would free slabs from a different thread, and also pre-allocate slabs from a different thread if it reaches some critical "low" threshold on the amount of available bytes. (Perhaps relative to the current heap size). These limitations manifest themselves as occasional choppy performance when changing the window size. 4. The swapchain code and ANGLE's swapchain code could share common options somehow. Left away for now because I don't want to deal with that headache for the time being. 5. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370)

This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370) 2. The memory allocator could be improved. (This is a universal constant) 3. Use async compute on supported devices. 4. Could/should use sub-command buffers instead of semaphores/switching for stuff involving multiple queue families. 5. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows.

This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows.

This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. vo_gpu: vulkan: implement ra_vk_ctx.depth Also moved the depth querying for vo_gpu from preinit to resize, since it was a tiny bit more convenient. (And in theory, it could change during runtime anyway) This only affects a calculation in the dither code path anyway.

This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential.

This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. KhronosGroup/Vulkan-Docs#370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.

haasn · 2018-11-23T19:47:36Z

It's been over a year, Has there been any progress on this issue? (Also, sorry for the commit spam)

One thing I tried doing to solve this bug in practice (if not in theory) is to use the time that vkAcquireNextImageKHR takes to block as a sort of “wait for vblank” substitute, but that has the side effect of actually acquiring the next image, which is not always what I want to do when waiting for the swap_buffers to complete. (In particular, it breaks horribly if I have a window size operation in between this swapchain acquisition and the rendering of the next frame). Without a way to "un-present" acquired images, it therefore does not solve my problem even in practice.

I had a closer look at the (now renamed) VK_EXT_display_control, but it still doesn't map very well to my use case:

If there's a delay between submitting a frame and swap_buffers, the swap_buffers call should immediately return. If I implement the swap_buffers call as "wait for next vblank", it will always block, even if it shouldn't have to.
There's still no clear way to associate a VkSwapchainKHR with a VkDisplayKHR.

That said, in theory it might be possible to combine the vblank event with the swapchain counters, in the following manner:

when the queue submission succeeds, record the current vblank counter
when calling swap_buffers(), first do a check to see if the vblank counter has increased by a sufficient amount relative to the recorded vblank counter that we can assume the image must have already been made visible, and if so, return
otherwise, wait for the next vblank

But this does not seem like a clean solution, nor do I know how well it extends to e.g. mailbox-style swapchains. Things could could help me include:

A way to map a VkSurfaceKHR or VkSwapchainKHR to a VkDisplayKHR, alternatively something equivalent to VkDisplayKHR's vblank event that works for a VkSwapchainKHR. For example, something like a fence that get signalled when the internal state of a VkSwapchainKHR advances as a result of a vsync. (i.e. when an image gets dequeued from the swapchain)
Alternatively, a way to query the internal status of a swapchain: how many images are queued? how many images are available? That way I could make a more informed decision about when to resort to waiting until the vblank counter advances or not.
A fence that gets signalled when the status of a surface counter changes (e.g. the vblank counter). Right now the only way to wait until that changes is a busy wait loop.

ianelliottus · 2018-11-28T16:58:57Z

We are working on this in Khronos, and hope to have a new extension that solves this, in the early part of next year.

singron · 2019-03-17T04:10:59Z

Any progress to report on this?

singron · 2019-03-18T16:31:53Z

It looks like the newly announced 0.9 provisional OpenXR spec has the required features for proper frame timing. It seems a little odd that vulkan applications without AR/VR have to integrate with OpenXR, but whatever.

xrWaitFrame blocks to synchronize render thread to swapchain and returns the predicted display time of the next frame.
xrBeginFrame (Not sure why this exists)
xrEndFrame submits the frame to be rendered for a given display time (I'm guessing you mostly use the predicated display time you got from xrWaitFrame).

Since OpenXR seems to have endorsements from AMD, NVIDIA, Intel, and Microsoft, it seems to be the most likely way forward that will actually be implemented unless Khronos is planning to announce a Vulkan specific extension to duplicate this functionality.

ghost · 2021-01-23T09:35:26Z

3-4 years later, Vulkan present types are still a joke, VSync timing isn't a thing, and the only way to mitigate this is by abusing mailbox plus overpowered hardware resources to bruteforce the timing. What the hell is going on?

osor-io · 2021-05-24T09:48:32Z

Has there been any more progress on this @ianelliottus @cubanismo? Or is there a different recommended way to get the functionality of VK_GOOGLE_display_timing?

I've seen the need for this pop up and be mentioned for quite a while now but doesn't seem like the needle has moved sadly 😢

krOoze · 2021-05-24T11:40:27Z

@osor-io WIP: #1364

stonesthrow · 2021-05-24T21:54:43Z

Yes, please refer to #1364 for solution, this thread is dead and can be closed.

stonesthrow · 2021-06-14T17:34:23Z

please refer to #1364 for solution

Triang3l · 2021-07-13T09:26:31Z

please refer to #1364 for solution

Would it help with safely destroying the semaphore awaited by vkQueuePresentKHR (in a situation when a full vkQueueWaitIdle or vkDeviceWaitIdle is overkill), or is it limited to just querying time intervals?

krOoze · 2021-07-13T13:33:33Z

@Triang3l It is broke. As per #152 not even vk*WaitIdle might be enough.

The extension does add another way to infer the semaphore state. Nevertheless it is something that should be fixed in core 1.0 and not by usage of extensions. Besides, busywaiting on vkGetPastPresentationTimingEXT might be no better than vk*WaitIdle.

It would also be getting on the thin ice a bit. vkDestroySemaphore says the whole batch must be finished before destroying the semaphore, whatever that means in the case of present op.

stonesthrow · 2021-07-13T15:03:49Z

There is an update, long in coming, to address these scenarios. Specifically semaphore states for one. Its priority

Triang3l · 2021-07-13T17:13:26Z

There is an update, long in coming, to address these scenarios. Specifically semaphore states for one. Its priority

Oh, nice, thank you for helping in resolving this confusing part! What is the current "industry standard" solution to this issue, by the way? Acquiring all images (not sure if that's possible for the mailbox mode) and awaiting all fences? Full WaitIdle? Or would just destroying the swapchain before the semaphores be enough (or is vkDestroySwapchainKHR also affected by this lack of a fence, and doesn't have implicit lifetime tracking)?

oddhack added Specification System Integration Resolving Inside Khronos labels Sep 19, 2016

alonorbach added the Feature Request Feature request (enhancement) label Sep 19, 2016

alonorbach self-assigned this Sep 19, 2016

krOoze mentioned this issue Apr 27, 2019

vkQueuePresentKHR should've signal a fence. #960

Closed

ChanLee1123 mentioned this issue Oct 7, 2019

Why MaxFlightNumb should be always 2? krOoze/Hello_Triangle#1

Closed

krOoze mentioned this issue Jan 8, 2020

Unclear blocking behavior in vkQueuePresentKHR #1158

Open

haasn mentioned this issue May 4, 2020

This app desperately needs vulkan support graphitemaster/moreram#8

Open

name-here mentioned this issue Dec 11, 2020

Add an OS method to retrieve a screen's refresh rate godotengine/godot-proposals#1284

Closed

haasn mentioned this issue Apr 17, 2021

Entering fullscreen with Vulkan freezes window content on AMD mpv-player/mpv#8008

Closed

stonesthrow assigned stonesthrow and unassigned alonorbach Jun 14, 2021

stonesthrow closed this as completed Jun 14, 2021

rohanlean mentioned this issue Feb 22, 2022

Expose double-buffered V-Sync as an option godotengine/godot-proposals#4065

Open

TempoLabGames mentioned this issue Sep 25, 2023

Add Direct3D 12 rendering driver (Mesa NIR approach) godotengine/godot#70315

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accurate event for when a swapchain image is visible on screen #370

Accurate event for when a swapchain image is visible on screen #370

haasn commented Sep 18, 2016 •

edited

Loading

ianelliottus commented Sep 21, 2016

haasn commented Sep 21, 2016 •

edited

Loading

ianelliottus commented Sep 22, 2016

ghost commented May 18, 2017

ratchetfreak commented May 18, 2017

haasn commented May 20, 2017 •

edited

Loading

haasn commented Aug 29, 2017 •

edited

Loading

haasn commented Aug 29, 2017

cubanismo commented Aug 29, 2017

haasn commented Aug 29, 2017 •

edited

Loading

cubanismo commented Aug 29, 2017

haasn commented Nov 23, 2018 •

edited

Loading

ianelliottus commented Nov 28, 2018

singron commented Mar 17, 2019

singron commented Mar 18, 2019

ghost commented Jan 23, 2021 •

edited by ghost

Loading

osor-io commented May 24, 2021

krOoze commented May 24, 2021 •

edited

Loading

stonesthrow commented May 24, 2021

stonesthrow commented Jun 14, 2021

Triang3l commented Jul 13, 2021 •

edited

Loading

krOoze commented Jul 13, 2021

stonesthrow commented Jul 13, 2021

Triang3l commented Jul 13, 2021

Accurate event for when a swapchain image is visible on screen #370

Accurate event for when a swapchain image is visible on screen #370

Comments

haasn commented Sep 18, 2016 • edited Loading

ianelliottus commented Sep 21, 2016

haasn commented Sep 21, 2016 • edited Loading

ianelliottus commented Sep 22, 2016

ghost commented May 18, 2017

ratchetfreak commented May 18, 2017

haasn commented May 20, 2017 • edited Loading

haasn commented Aug 29, 2017 • edited Loading

haasn commented Aug 29, 2017

cubanismo commented Aug 29, 2017

haasn commented Aug 29, 2017 • edited Loading

cubanismo commented Aug 29, 2017

haasn commented Nov 23, 2018 • edited Loading

ianelliottus commented Nov 28, 2018

singron commented Mar 17, 2019

singron commented Mar 18, 2019

ghost commented Jan 23, 2021 • edited by ghost Loading

osor-io commented May 24, 2021

krOoze commented May 24, 2021 • edited Loading

stonesthrow commented May 24, 2021

stonesthrow commented Jun 14, 2021

Triang3l commented Jul 13, 2021 • edited Loading

krOoze commented Jul 13, 2021

stonesthrow commented Jul 13, 2021

Triang3l commented Jul 13, 2021

haasn commented Sep 18, 2016 •

edited

Loading

haasn commented Sep 21, 2016 •

edited

Loading

haasn commented May 20, 2017 •

edited

Loading

haasn commented Aug 29, 2017 •

edited

Loading

haasn commented Aug 29, 2017 •

edited

Loading

haasn commented Nov 23, 2018 •

edited

Loading

ghost commented Jan 23, 2021 •

edited by ghost

Loading

krOoze commented May 24, 2021 •

edited

Loading

Triang3l commented Jul 13, 2021 •

edited

Loading