Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VK_MSFT_layered_driver #2121

Merged
merged 1 commit into from
Sep 8, 2023

Conversation

jenatali
Copy link
Contributor

@jenatali jenatali commented May 8, 2023

This is my first attempt at a Vulkan extension. I chose MSFT for the vendor bits, but perhaps EXT would be more appropriate if MoltenVK has interest? Not sure if I'm supposed to also go ahead and update XML as part of this proposal or if that would come later.

This is needed to address KhronosGroup/Vulkan-Loader#981. Without this, or something like it, VulkanOn12's physical devices can't be enumerated by LUID, or else they run the risk of being enumerated before a native driver for the same GPU.

@CLAassistant
Copy link

CLAassistant commented May 8, 2023

CLA assistant check
All committers have signed the CLA.

@oddhack
Copy link
Contributor

oddhack commented May 8, 2023

Generally we defer the XML / spec work until the proposal is agreed on by all the stakeholders - apparently that includes Microsoft and perhaps MoltenVK, but would this touch on LunarG and the SDK work they do as well? We are having the F2F meeting this week and I'll try to make sure this gets brought up to people's attention.

@jenatali
Copy link
Contributor Author

jenatali commented May 9, 2023

would this touch on LunarG and the SDK work they do as well?

I had planned to submit the changes to the Vulkan-Loader repo to make use of this extension once there was consensus that it was a good idea. That should be the only impact to LunarG as far as I'm aware.

@oddhack
Copy link
Contributor

oddhack commented May 10, 2023

would this touch on LunarG and the SDK work they do as well?

I had planned to submit the changes to the Vulkan-Loader repo to make use of this extension once there was consensus that it was a good idea. That should be the only impact to LunarG as far as I'm aware.

Thanks. Is there anyone else in particular that should feed into consensus? I put this on the F2F agenda for this evening (Osaka physical meeting) to raise awareness, but not sure who in particular you would like input from.

@jenatali
Copy link
Contributor Author

Good question. I'm honestly not sure. I/we are new at this so I'm not really sure what to expect or who needs to be aware of it.

@linyaa-kiwi
Copy link
Contributor

linyaa-kiwi commented May 10, 2023

jenatali, thanks for the proposal. We just now briefly discussed it at the Khronos F2F. I will put it on the agenda for a Khronos-internal meeting on Wed 24 May, where we can get realtime feedback from a Vulkan loader dev.

But please also continue the discussion here in GitHub. The public and internal discussions can happen in parallel, and I will cross-post updates as needed.

@linyaa-kiwi
Copy link
Contributor

@charles-lunarg Do you have any feedback for this proposal? It will also be on the SI agenda for Wed 24 May.

@pdaniell-nv
Copy link

@charles-lunarg Do you have any feedback for this proposal? It will also be on the SI agenda for Wed 24 May.

It looks like @charles-lunarg already discussed this in KhronosGroup/Vulkan-Loader#981 and recommended doing this extension. FWIW I have no objection to this extension and think it's the right solution.

@charles-lunarg
Copy link

I confirm, this extension matches the discussion. Happy to put it up to further discussion/review in the SI meeting.

@oddhack
Copy link
Contributor

oddhack commented May 18, 2023

@jenatali since this is your first extension, be aware you need to reserve an extension number in main branch before creating the PR with the actual extension XML / spec language. Something like #2128 but using the next free extension number at the time you propose the PR, and the MSFT (or EXT) author ID. This is so enum values remain stable during development of the extension, since each extension gets its own numeric range to play in.

@jenatali
Copy link
Contributor Author

@oddhack Thanks for pointing that out. I've created #2129.

@krOoze
Copy link
Contributor

krOoze commented May 19, 2023

If we suspect those drivers are redundant and inferior in every way, perhaps they should not be exposed by default to the users. Add something like a ENUMERATE_EXTRANEOUS_DRIVERS flag similar to portability flag?

On point of nomenclature, is "layered" the term we want to go with? It is somewhat vague, and also has overloaded meaning with Vulkan Layers.

Permiting duplicate drivers is in of itself bit abstraction breaking. It takes away the "physical" from physical device. If there are to be competing drivers for the same device, it should be more neatly exposed.

@jenatali
Copy link
Contributor Author

If we suspect those drivers are redundant and inferior in every way, perhaps they should not be exposed by default to the users.

As it stands today, I don't expect a layered driver to be able to match a native driver for performance or functionality, though that may change in the future (e.g. I've heard that OpenGLOn12 can surpass some vendor's native GL drivers for performance, and D3D11On12 can surpass native D3D11 drivers for some hardware). That said, there's 2 cases where it's interesting to expose by default:

  • Devices where there is no native driver. Windows ARM devices do not have a Qualcomm Vulkan driver, and the D3D software rasterizer can also benefit from exposing Vulkan support without having a full native Vulkan frontend.
  • There may be a select subset of extensions or functionality that could be exposed from a layered driver ahead of a native implementation, or interop scenarios where choosing a layered implementation makes sense.

All that said, the specific loader policy is a bit outside the scope of this extension definition. The purpose here is to enable the loader to make any kind of choice for which drivers to expose for a given device and how to sort them.

On point of nomenclature, is "layered" the term we want to go with? It is somewhat vague, and also has overloaded meaning with Vulkan Layers.

I'm building on top of the nomenclature that Khronos has used in the past in official documentation. For example, in the Khronos conformance process documents, section A9 part (ii) has specific instructions for "layered implementations." That said, I have no specific attachment to the term.

@krOoze
Copy link
Contributor

krOoze commented May 19, 2023

As it stands today, I don't expect a layered driver to be able to match a native driver for performance or functionality, though that may change in the future

Ok, but in that case, what does the layered bool even tells the application. Makes the whole information irrelevant to anything.

Devices where there is no native driver.

Certainly, I meant where there are duplicate drivers, loader could eliminate the inferior ones (non-conformant\portability, and "layered" ones).

There may be a select subset of extensions or functionality that could be exposed from a layered driver

That sounds like something useful to a developer, and not an end user, and as such should be protected away behind a flag?

All that said, the specific loader policy is a bit outside the scope of this extension definition.

I try to think how it fits in the overal picture. The question should always be "why do we need to frankenstein this onto the Vulkan API", and not a cat&dog bakes a cake situation. Dog thinks having a bone in would be neat. Cat thinks some mice might be nice. But who will want to consume that resulting cake?

This proposes an extension for a certain situation. Should we contemplate the extension, without first contemplating whether we even should be getting ourself into that situation? Before we contemplate sorting, should there even be duplicate inferior drivers included there to be sorted in the first place?

I mean what does an end user even get out of this, except more potential for things to explode? I mean, old apps won't even be aware of existence of this proposed extension. And as you seem to imply, the information that the driver is "layered" can theoretically imply anything, and so doesn't tell the app much actionable information in the first place that can be relied upon for anything.

@jenatali
Copy link
Contributor Author

Just to be clear, the goal of this extension is for the implementation to provide information to the loader about what kind of driver it is. Providing this information to applications is much, much less interesting. If the loader then uses that info to hide the devices reported by that driver when there's already a (non-layered/native) driver that enumerated the same physical device, that sounds fine. Personally I'd like a way to opt out of that, but I'm fine with an environment variable.

If there's an alternative mechanism to provide this information to the loader beyond extending the information reported from the driver, I'm all ears, but this was the suggested mechanism from the loader maintainers.

@krOoze
Copy link
Contributor

krOoze commented May 19, 2023

the goal of this extension is for the implementation to provide information to the loader

Curious. That would imply modifying Loader-ICD interface, rather than the Vulkan API, no?

PS: In this case it feels the direction of information should be reverse. The loader should be giving enough information to the emulated driver, so it can make an informed decision whether it should be intervening and offering itself as a legitimate choice to the users.

@jenatali
Copy link
Contributor Author

An extension seemed like the right fit because then the information can be queried by applications, if they'd have a use for it, similar to the portability subset, but yes a viable alternative would be to revise the Loader-ICD interface.

I strongly disagree about the loader giving information to the driver to put policy about what should be enumerated in the driver. The loader should be the arbitrator of policy.

@charles-lunarg
Copy link

If there's an alternative mechanism to provide this information to the loader beyond extending the information reported from the driver, I'm all ears, but this was the suggested mechanism from the loader maintainers.

I initially suggested creating an extension because I thought it would be useful for both the loader and application developers to 'know' what kind of driver they are using. Granted, there already are driver information extensions & some properties in base vulkan, but none of them tell the application what the underlying API is or any other info specific to 'driver implemented on another driver'.

Modifying the Loader-ICD interface by adding optional fields to the driver manifest is another strategy to get the same information into the loader. This is how portability drivers are declared, so there is precedent.

The loader should be giving enough information to the emulated driver, so it can make an informed decision whether it should be intervening and offering itself as a legitimate choice to the users.

I also disagree strongly for the same reasons (loader should define policy) but also because there is no clean way to implement that. Drivers and loaders do talk, but the flow of communication is "loader asks driver for some info, driver returns said info". The loader doesn't tell the driver to do anything, rather it only relays what the application is asking for.

@krOoze
Copy link
Contributor

krOoze commented May 19, 2023

IDK, Loader is the one component that has least amount of information and context to decide anything. Its purpose is to mediate between Apps, Drivers, and Layers, nothing more. This sounds like something two drivers want to talk through between themselves. It feels the Emulator should be the one to decide whether competing with a native driver is its core mission, or not. To do so, it must somehow be made to know there is a native driver in the first place.

link #2130

@r-potter
Copy link
Contributor

IMO this extension is the right way to do this. Applications are the ultimate arbiter of what implementations they want to run on top of. All this extension is doing is:

1.) Providing the loader enough information to make a best effort at a sensible default ordering
2.) Providing applications some information to distinguish between two implementations more explicitly than trying to parse driver name strings

Existing examples of API layering, like ANGLE, suggest that describing the layered implementation as "inferior" is likely to be overly simplistic. It's entirely likely that a layered implementation will outperform a native driver in some situations, and underperform it in others. It's also entirely plausible for a layered implementation to end up exposing features or extensions that the native driver didn't, which may also make the layered implementation the preferred choice in some situations.

Sorting native drivers first is the best default behaviour we can expect to come up with. If an application developer discovers through benchmarking that actually the layered implementation is better for their specific application, then with this extension they still get the option to it make use of that. If we start hiding (layered) implementations then we we lose that option.

@krOoze
Copy link
Contributor

krOoze commented May 19, 2023

@r-potter I am not so sure. Drivers want to offload responsibilities to the Loader (which knows nothing) and exports it to the App which will do nothing else than try and fail using increasingly flimsy (or as you say "simplistic") information, and in the end export the responsibilities to the unsuspecting end-users, which perhaps do not even know what a "driver" is.

  1. Best effort is not good enough. Separation of concerns so each party only does what he is most competent at is the way. The Emulator driver is the one who knows what it wants to achieve. E.g. MoltenVK driver wants to achieve bringing Vulkan to Apple stuff. If a native driver shows up, that precondition is invalidated. We can either communicate ever increasing information to the Loader on all the nuanced ways whether device should show up, and at what position. OR, we can let driver decide on things that increasingly start looking more and more like driver responsibility to me. Who the hell else is supposed to know whether the Layered Implementation wants to co-exist with a native driver or not?

  2. Old apps would not have access to those informations, therefore might explode. Additionally, not really being apps concern in the first place, it does not have much to decide on (as you said, describing the X implementation as "inferior" based on Y flag is likely to be overly simplistic, and apps can only do deterministic choices, not some kind of "moral judgements"). It is the "layered implementation" choice\mission whether it wants to outperform a native driver, or whether its mission is to offer a driver where there was none before. What seems to be the case, we want to strip out this context the Driver knows about itself, and dump this responsibility on someone else down the road.

If we start hiding (layered) implementations then we we lose that option.

Options can always be re-exposed. But on the other side, it is hard to put a djinn back into the bottle. It is trivial to permit a developer to force-expose a particular driver for his niche developer needs. It is nontrivial to keep ever increasing corpus of mainstream apps (of various age) running as-intended™ if we start throwing curveballs through the enumeration API. Even the mention of "sorting" is kinda scary in the first place, because Physical Devices are not specified as sorted.

@charles-lunarg
Copy link

@krOoze What are you suggesting we do then?

My view of the problem is that "layered implementations exist, but there may be native drivers available. The loader's response to this situation is to simply order the list of VkPhysicalDevices such that the native one is ahead of the layered one." Do you intent to suggest an alternate solution to 'there may exist drivers that are layered ontop of native ones while native drivers are also present'?

@krOoze
Copy link
Contributor

krOoze commented May 19, 2023

@charles-lunarg Think, measure, look at things from distance, and only then act (or don't). Not just add and add because the narrow situation seems to call for it.

Anyway, it is your responsibility to answer the basic questions, not me. Let's even ignore there are old unupdatable apps that can't act on it even if they wanted, and all they know is all the physical devices in the array are equally legitimate (and not sorted in any way). Answer me even this basic (use-case\scenario) question: I am an app, and I see isLayered == true flag. What should I do with it that works well even 10 years from now? What should happen next with this information? Will you write a guidance what the best practice is here that will stand the test of time? Or will this be extended again 4 months from now to add more muddy info to it or requiring to do something slighly different again?

PS: I made one suggestion in the first comment.

@MarkY-LunarG
Copy link
Contributor

I personally think that an enum for various APIs would be better than a string. If we have strings, there may be different variations for the same underlying API if the Vulkan layer support comes from different developers.

@nanokatze
Copy link
Contributor

Has a case of two "native" drivers been considered/in scope for this ext? An example of this could be RADV and AMDVLK

@jenatali
Copy link
Contributor Author

After a few iterations of fighting with CI (I didn't try to set up a local spec build) I have something that mostly works, though it does depend on ash-rs/ash#762 before the rust bindings will work.

@MarijnS95
Copy link
Contributor

Re-kicked the CI because it was designed to pick up the master branch of Ash.

@jenatali
Copy link
Contributor Author

Re-kicked the CI because it was designed to pick up the master branch of Ash.

Thanks, looks like that's worked and CI is happy.

I downloaded the artifacts as well and I'm content with the results.

@MarijnS95
Copy link
Contributor

MarijnS95 commented Jun 22, 2023

@jenatali I almost agree, except that it doesn't extend any structure so you're not allowed to push it into any structs' pNext member. Perhaps the struct is missing structextends="VkPhysicalDeviceFeatures2,VkDeviceCreateInfo"?

@jenatali
Copy link
Contributor Author

Good catch.

xml/vk.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@linyaa-kiwi linyaa-kiwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my review comments in this batch, the extension MR looks good. After they're addressed, I believe this MR will be ready for approval.

xml/vk.xml Outdated Show resolved Hide resolved
proposals/VK_MSFT_layered_driver.adoc Outdated Show resolved Hide resolved
proposals/VK_MSFT_layered_driver.adoc Outdated Show resolved Hide resolved
@jenatali
Copy link
Contributor Author

Thanks, I've addressed those comments, and rebased and resolved the conflicts as well.

@linyaa-kiwi linyaa-kiwi changed the title Add a layered driver extension proposal Add VK_MSFT_layered_driver Jul 19, 2023
@linyaa-kiwi
Copy link
Contributor

@jenatali Since nothing in this extension is related to Microsoft, and is equally useful on non-Microsoft platforms, would you be interested in changing it to EXT? Several Khronos members have expressed an interest in the rename.

Changing it to EXT would add some extra overhead, however.

  • You would also need to submit CLs to the CTS. For this extension, only light testing would be required.
  • A delay of approx 8 weeks between approval and publication, due to additional Khronos-internal process. This process will be automatic from your perspective.

It is your call to make. Keeping it as MSFT is a valid choice if the additional delay is problematic for you.

@jenatali
Copy link
Contributor Author

I'm open to it. The only real complicating factor is that I'm about to take parental leave, so I'm not sure if/when I'd be able to add the relevant CTS tests. Though if the delay lines up with the period while I'm out, that'd be convenient, since I'd be able to start leveraging it once I'm back.

@r-potter
Copy link
Contributor

A delay of approx 8 weeks between approval and publication, due to additional Khronos-internal process. This process will be automatic from your perspective.

@versalinyaa Was this time estimate intended to cover the new EXT ratification process? My intuition is that probably isn't applicable here.

@oddhack
Copy link
Contributor

oddhack commented Jul 20, 2023

A delay of approx 8 weeks between approval and publication, due to additional Khronos-internal process. This process will be automatic from your perspective.

@versalinyaa Was this time estimate intended to cover the new EXT ratification process? My intuition is that probably isn't applicable here.

Agreed. As long as there are multiple vendors intending to ship an EXT it can be published. Ratification need not happen first (or at all, though at this point we'd like to ratify new EXTs).

@linyaa-kiwi
Copy link
Contributor

@jenatali If you're not on parental leave yet, and are available to answer questions...

Do you want us to merge this branch as-is, with the MSFT prefix? If yes, I can drive that while you're on parental leave.

If you are interested in renaming it to EXT, then there should preferably be a commitment from at least one additional vendor to implement. MoltenVK seems like the obvious choice.

@jenatali
Copy link
Contributor Author

jenatali commented Aug 2, 2023

My gut says to merge as-is. I haven't heard interest from another vendor, though I haven't tried to seek it out yet.

@oddhack
Copy link
Contributor

oddhack commented Aug 3, 2023

My gut says to merge as-is. I haven't heard interest from another vendor, though I haven't tried to seek it out yet.

@versalinyaa maybe it's worth your pinging the WG mailing list as a reminder that this potentially could be EXT (and ties into the maintenance6 agenda)? If no interest then you drive publication as a vendor extension. If there is interest, would @jenatali be willing to let us promote the proposal to EXT status prior to publication, to move this along?

@linyaa-kiwi
Copy link
Contributor

@versalinyaa maybe it's worth your pinging the WG mailing list as a reminder that this potentially could be EXT (and ties into the maintenance6 agenda)? If no interest then you drive publication as a vendor extension.

In SI TSG discussions, there were strong opinions to do this differently for EXT. For example, (a) expose more info about the layered driver, and (b) return an array of layered driver info to support the case when the layering is more than 2-deep due to VMs.

Let's proceed with publishing as a vendor MSFT extension.

@linyaa-kiwi linyaa-kiwi added this to the Signed-off to Merge milestone Aug 30, 2023
@oddhack
Copy link
Contributor

oddhack commented Sep 2, 2023

@jenatali can you resolve the conflicts? I don't know if you're still away from work.

(sorry, missed one character in your handle and tagged someone else initially).

@jenatali
Copy link
Contributor Author

jenatali commented Sep 2, 2023

I'll be back in November.

@oddhack
Copy link
Contributor

oddhack commented Sep 6, 2023

@jenatali I cannot push to your fork but I can fix the trivial conflicts in my local clone and merge that way. Given discussion above I'm going to do that in order to get this published, unless you or @versalinyaa objects by Thursday evening.

(Also I appear unable to type your handle correctly the first time, again).

@jenatali
Copy link
Contributor Author

jenatali commented Sep 6, 2023

@jenatali I cannot push to your fork but I can fix the trivial conflicts in my local clone and merge that way. Given discussion above I'm going to do that in order to get this published, unless you or @versalinyaa objects by Thursday evening.

(Also I appear unable to type your handle correctly the first time, again).

Works for me. This PR is configured to allow edits by project maintainers though so it should be possible to push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.