-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle virtual instruction traps for hypervisor CSRs during KVM initialization #102
Comments
Can we tell KVM not to read them instead? providing MMU mode and VMID bits available in a "get_info" call would be more consistent with how other TSM-related things happen. |
I would prefer this too, but if we decide that KVM should be able to support multiple "modes" at runtime (e.g. in a TSM-in-HS deployment model one would imagine that KVM could support legacy VMs with native virtualization plus TVMs using the AP-TEE interface) you'd have separate initialization paths where either or both would need to run and discover the capabilities of that particular virtualization mode. That said, this is what pKVM appears to be doing and we'll have to bifurcate most of the other KVM code paths for TEE vs non-TEE VMs anyway. Also, IMO, for PoC purposes it doesn't seem horrible to force KVM into a particular mode at startup. @atishp04 thoughts? |
Yes. That was my concern having multiple diverged paths. We have opportunity to align the same path for TEE and nested. Here are the different path this can be invoked.
It would be good to minimize the divergence between 2, 3 & 4. Except #1, everything else can just rely on the shared memory approach if it is available. Otherwise, we create 3 different path. #1 Direct H CSRs For PoC standpoint, KVM can just ignore the access. No issues with that. I am trying to understand how the final code flow should like across all the cases. |
Another idea: could we just treat That still leaves Speaking of, we'll need some way for KVM to detect that it's running in a nested mode and shouldn't try to access the H* CSRs. (For AP-TEE we can just probe the SBI extension.) |
That would work but is there any benefit in doing this way compared to saving everything in the shared memory state ? I feel KVM code would be much cleaner this way.
Probing SBI extension will work for case #3 (in my previous comment). For #4, technically that's incorrect. For TVM specific path, we have vm type anyways. |
What's the "shared-memory state" in this context? The shared-memory region is per-vCPU; there are no vCPUs yet at KVM initialization. Unless you're proposing the shared-memory region is per physical CPU? (Or both per-physical-CPU and per-vCPU shared memory?) It might seem cleaner based on the current KVM implementation, but what about the host hypervisor / TSM side? Whatever CSRs it gets blasted with through this other interface (i.e. outside of running a vCPU) it's just going to stash away (or completely discard) until the next time the guest hypervisor goes to run a vCPU. In other words, this just seems like an interface to accommodate the current KVM initialization flow, rather than something that immediately results in CSRs being updated.
That implies we're trapping-and-emulating How about this for determining available virtualization modes:
We then discover the capabilities of each mode through the appropriate means: CSR accesses for native virtualization, and SBI calls for AP-TEE / nested virtualization. |
I like the direction this is going. What if Certainly have different paths in the kernel KVM code, but only two. One for native, and one for NESTED/TVM. Then do detection similar to abrestic's |
During the initiation phase, it will be per-physical CPU. At runtime, it will be per vCPU. However, I agree it would be an overkill to define the shared memory just for initialization phase. It's just one time per kvm initialization.
There are also htimedelta, hvip which are accessed in vcpu_load/put path. Then there are Hmode CSRs accessed for in kernel AIA emulation.
Sounds good to me. |
AP-TEE & SBI_VIRT needs to be independent of each other to support model 4 (HOST is running in HS mode). Correct ?
|
May be you meant in addition to regular APTEE detection(will be used for host in HS) and we provide another sub feature in SBI_VIRT ? |
It's a little more complicated than just confidential memory as we also require the VMM to declare which parts of the address space are used for which purpose (confidential vs shared vs MMIO), which I assume is something we don't want to impose on the (non-confidential) nested virtualization case. The device assignment and AIA flows will have some differences as well. But otherwise I agree, there's a lot of overlap.
Ok, so thinking more about it there are a couple of "interesting" cases revolving around interrupts:
Case (1) gets kinda complicated -- the host/TSM would basically have to virtualize Sstc for the guest hypervisor. Fortunately KVM doesn't rely on this, though it's something we might want to solve for. Or maybe we just say that VS-level interrupts always get delegated to the nested guest. Case (2) is something KVM actually uses, though it seems like it should be easier to virtualize. Also cc @rsahita |
For shared vs confidential => default shared. And for MMIO the hypervisor already has to have those ranges figured out at some point(and to run a TVM will have to have it figured out early). So I don't see that as a big imposition. In general, the cost on the hypervisor/VMM by TVM requirements will have to be payed. Leveraging it for Nested seems like good reuse, even if it's slightly more than nested needs.
|
Is it likely that this would ever be needed for a TVM? For nested I'm OK not supporting that directly. Optimized nested is going to be great for most use cases but there still might be some non-kvm ones that want full trap-and-emulate with features such as above.
guest external interrupt traps seems like something we should handle for TVMs too. right? |
Yeah, to be clear this doesn't apply to TVMs and it's not something that KVM makes use of presently. I agree that it's something we can say is unsupported for nested virtualization. We aren't advertising the H-extension to the guest hypervisor; we're providing a way to accelerate virtualization via calls to the underlying execution environment, which itself may or may not be making use of the H-extensions. Host hypervisors can instead do trap & emulate if they want to support genuine nested virtualization. Other than SGEIs (mentioned below) and the H{L,S}V instructions (which we'd accelerate via an SBI call anyway), I'm having a hard time coming up with other examples where a hypervisor would want to modify its own execution environment using the H-extension CSRs.
Yes, we for sure need to virtualize SGEIs. We could trap & emulate
Also, for posterity, pasting the TODO I left in
|
KVM will expect to be able to access a few of the hypervisor CSRs (
hideleg
,hgatp
, etc.) at module load, i.e. not part of the run loop (which will use shared-memory for TVM CSR access). Salus needs to handle the resulting virtual instruction traps for these without blowing up the host VM. Most of these accesses we can ignore since we're not doing nested virtualization of legacy VMs and a TSM (regardless of deployment model) will overwrite these CSRs anyway.hgatp
is the main exception as we need to allow KVM to discover the MMU mode and VMID bits.cc @atishp04
The text was updated successfully, but these errors were encountered: