-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StdDescriptorPool
rewamp
#1943
StdDescriptorPool
rewamp
#1943
Conversation
Self::new_with_pool(layout, 0, &mut pool, descriptor_writes) | ||
layout | ||
.device() | ||
.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice here and a few other places, you clone the device Arc
. But since with_standard_descriptor_pool
takes a reference, the clone isn't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is, because the closure takes ownership of layout
, and without cloning the device the layout would still be borrowed. I thought about this and it is possible to fix it by changing the argument to &Arc<DescriptorSetLayout>
, the only reason I did not do this is because I didn't want to break the API. But indeed I would like this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding move
to the closure work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sadly.
The
|
Oh wow, thank you for catching that! What's really weird is that this worked fine on my system even with this huge issue. |
Forgot to mention that the issue has been fixed. |
That's the nasty thing about undefined behaviour. Sometimes it just works, and you never notice! |
Changelog:
Seeing as
StdDescriptorPool
was originally using 3 layers ofMutex
, and the allocation algorithm wasn't very optimized, I felt the need to address these and some other issues.StandardCommandPool
lockless #1939.DescriptorPool
being implemented onArc<StdDescriptorPool>
now made little sense, since theStdDescriptorPool
s are all thread local andDescriptorPool
takes&mut self
. The trait is now implemented onStdDescriptorPool
directly, andDevice::standard_descriptor_pool
has been removed, since it handed outArc
clones previously, which would now be redundant overhead. Instead,Device::with_standard_descriptor_pool
can be used, which still gives the user full access to theStdDescriptorPool
without the need for theseArc
clones. I will be working on consistency withStandardCommandPool
in an upcoming PR.SingleLayoutDescSetPool
would perform the best, but I tried some other ideas only to conclude that is the case.SingleLayoutDescSetPool
took 58ns on my machine. That is in contrast to the current strategy employed inStdDescriptorPool
which takes 3800ns with no contention. There's just some small details I thought could maximize the performance ofSingleLayoutDescSetPool
, which reduced the time to 38ns. Not a big difference, but that's what happens when there's not much to improve in the first place!SingleLayoutDescSetPool
does and reusing those, I believe the next-best performing strategy is to reuse the Vulkan pools. Allocate from a pool until its full, then yeet it and grab a new one, then once the pool is no longer needed reset it as a whole and put it back in the queue. This of course has the advantage that the Vulkan implementation can use a simple bump allocator, and no pool fragmentation can occur. Also, this saves some cycles since pool creation is expensive. So this is the approach I took for variable descriptor counts, but I think we could absolutely have one pool per count per layout for the performance. If someone wants to do that I'm all for it. This approach clocks in at 104ns for me.StdDescriptorPool
was now as easy as using the now addedSingleLayoutVariableDescSetPool
together withSingleLayoutDescSetPool
to support all of theDescriptorPool
API. The result is that allocating a fixed count descriptor set takes 104ns, and variable count takes 171ns for me. Meaning that the newStdDescriptorPool
adds about 66ns of overhead on top. I don't like this one bit and I have a good idea where this overhead is coming from, so expect future PRs.