-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash with partially bound texture arrays with variable descriptor count #2206
Comments
Thinking about this more. I think when a MVK resource is destroyed it needs to know all of the partially bound descriptors that reference it, and clears out those references. Otherwise dangling references to resources can occur when converting the MVK state into Metal calls. |
Perhaps I'm missing something here.
What behaviour are you seeing on other Vulkan platforms? What kind of crash are you seeing here? Do you have a call trace, or a small app that can demonstrate the issue you're encountering?
This is existing behaviour. The resources are retained until they are removed from the descriptors. However, Vulkan does require that an app not destroy resources while they are in-flight in an executing command buffer, and that a descriptor holding a destroyed resource must not be used further while it holds that resource. |
It works fine on Windows for all GPUs there. There are two crashes so far I've seen that show up.
Here is the call stack from the second. This one is much rarer. In this one the MVKImageViewPlane::_imageView is NULL.
I also notice that if I use MVK_CONFIG_PREFILL_METAL_COMMAND_BUFFERS_STYLE_IMMEDIATE_ENCODING, or ensure all my resource destructions happen in my main thread (instead of a worker thread), I can't make the crash occur. So that's another hint it's a race condition. What I think is happening is that the worker thread is clearing up resources at the same time the the descriptor binding operations are occurring inside of MoltenVK. For the first case the MVKGraphicsResourcesCommandEncoderState was given a MTLTexture that has since been deleted. For the second case the MVKImageView had it's _planes object cleaned up between the start of the getMTLTexture and the end of it (since it shouldn't be able to get into MVKImageViewPlane::getMTLTexture() when _planes.size() == 0), which it is when I inspect the stack at this point. So the question is, is it legal to destroy resources pointed to by a descriptor, while that descriptor is part of a descriptor set that is being used, even though that particular descriptor isn't referenced by the shader? |
As I mention above, Vulkan does require that an app not destroy resources while they are in-flight in an executing command buffer, and that a descriptor holding a destroyed resource must not be used further while it holds that destroyed resource:
It's possible that you may not be encountering this on other platforms, because the representation within the platform command encoder may not depend on the Vulkan resource object the way it does in MoltenVK and Metal. To improve performance, MoltenVK deliberately uses a Having said that, MoltenVK does retain the resource objects within descriptors, but if you also destroy the descriptor set on your worker thread, then it might all disappear too early. |
Thanks for your continued attention to this. The spec also states:
This clarification was only added recently, in spec v1.3.210. There is also this validation layer issue that mentions a similar workflow: I do understand why this would be problematic for MoltenVK though, as the translation layer between it and Metal requires accessing resources on the CPU to do the translation that would otherwise not be accessed by other implementations. Is my understanding correct? If so, I can try to think of a solution. |
@mbechard Have you found a solution yet? Can you share that solution with me in here? |
What I ended up doing is for each descriptor, I keep track of which elements have been bound. When I update the array for a new usage, I make sure I unbind elements (or bind them to known valid entries) for anything that has been previously bound. So even if your array allocation is large, you can still leave most of it partially-bound entries untouched, and only maintain the ones you are actually making use of. |
PR #2320 should fix this. Please retest with latest MoltenVK and close this issue if the problem is fixed. |
It's been a month. Closing now. |
Sorry, this fell through the cracks. Honestly I feel like the cure is worse than the symptom on this one. I think in both cases you are stuck doing non-Vulkan spec behavior. With this new change, GPU resources aren't freed until descriptor sets stop referencing them. So the choices are; Dealing with (2) seems much more difficult than (1), and I worry that for people that aren't running into (1), they are going to be getting hit by (2) without realizing it. (Sorry about the errant references to issues 1 and 2 from this repo in my original post in this message) |
I've created the separate enhancement issue #2359 to deal with the unreleased memory consumed by the dead resources that are still retained by a descriptor. |
If I have a texture array that has been sized 10, but I'm only providing 8 elements in a call to vkUpdateDescriptorSets(), it doesn't seem like the other elements in the array in MVKDescriptorSet get cleared out. This can cause a crash with deleted resources, since bindings for elements 8 and 9 will still try to get bound in
MVKDescriptorSetLayoutBinding::bind
. That function loops over the entire length of the variableDescriptorCount, and doesn't seem to account for partially bound sizes.Happy to do a PR, just looking for confirmation my assumptions are correct.
I would think the solution is to call
MVKImageDescriptor::reset()
on every element of the array that isn't set in the Update. Is that correct?Thanks
The text was updated successfully, but these errors were encountered: