Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVM TODO #12

Open
laijs opened this issue Sep 10, 2024 · 0 comments
Open

PVM TODO #12

laijs opened this issue Sep 10, 2024 · 0 comments

Comments

@laijs
Copy link

laijs commented Sep 10, 2024

Specification

  • Update and refine the PVM specification.
  • Document the TSC ABI.
    • Add documentation for TSC ABI in PVM.
  • Emulate TSC offset (is it possible?).
    • Use MSR_IA32_TSC_ADJUST to emulate the TSC offset.

Guest

  • KASLR
    • Perform randomization within the allowed range.
  • Shadow Stack
  • PKRU
    • Implement PKRU on shadow paging.
  • SMAP Emulation
    • Enhance protection of the guest kernel from guest userspace.
    • Use PKU to emulate SMAP, depending on the PKRU implementation for shadow paging.
  • Analyze and mitigate kernel/user side-channel attacks.
  • Support for KASAN.
  • Support for PIE.
  • Upstreaming not-so-fixed fixed mapping.
  • #VC/#VE Handling
    • Support PVM in TDX guest. ✅
    • Support PVM in SEV guest.
  • Switching 5-level/4-level pagetable during booting from uncompressed mode.

Host Kernel

  • Integral Entry
    • Atomic stack switching for IST. See reference.
  • Host KPTI
  • Host LASS
    • Relocate guest kernel to userspace address range.
  • PVM switcher with FRED.

Host KVM

  • Multi-KVM (reference): Allow PVM to coexist with VMX/SVM.
    • User-return MSR.
    • Perf callback.
    • VPID (per pCPU VPID).
  • ASI: Handle PVM VMExit events without switching hardware CR3 to kernel CR3.
  • KVM shared TLB for different vCPUs in the same VM & pCPU (context_id & generation).
  • Permission clone on non-final SPTes when guest CR0.WP is set (allow PVM hypervisor to share shadowed user page tables for guest kernel/user CR3).
  • PMU Refactor
    • Create separate kernel modules for pmu_intel.c and pmu_amd.c. This allows PVM to select the appropriate module based on the vendor.

PVM Hypervisor

  • Analyze and mitigate hypervisor/guest side-channel attacks.
  • Direct #PF Injection
    • Inject #PF into the guest in the switcher based on the error code. Fetch all GPTes during SP allocation and set the SPTE as reserved if the associated GPTE is present. For NP (Not Present) faults, inject directly into the guest. See commit c7addb902054195b995114df154e061c7d604f69. Note: This method may fail under PV MMU, as there will be no TLB flushing when the guest modifies a GPTE.
  • Post Interrupt Emulation
    • Inject hardware interrupts directly into the guest. Emulating full APIC logic in the switcher might be complex. Consider allowing the guest kernel to handle passthrough hardware interrupts directly using user interrupts.
  • PerVM Mapping Space for PVCS (PGD Granularity)
    • Current PVCS relies on PFN cache (KVM_USE_GUEST_USAGE), but it has been dropped from the mainline. Consider dynamically mapping guest vCPU PVCS pages to a VA range under a fixed, unused PGD.
  • PMU
    • PMU virtualization should support separate statistics for kernel and user modes, as both the guest kernel and userspace run in hardware CPL3.
  • Remove non-PVM mode when VMM/guest supports 64-bit-only.

PV Optimization

  • SMP PV Booting (Doing)

    • Allow the secondary CPU to boot directly into 64-bit mode. This might also eliminate the need for non-PVM mode (requires specification change).
      Note:
      We have a functional internal implementation, but we would like a more general implementation, possibly in accordance with the TDX booting specification. We need to discuss this.
  • PV MMIO Write

    • Use a hypercall to emulate MMIO write directly (may require extra PVOPS).

MMU Optimization

  • PV MMU (Doing)
    • Remove write protection from guest page tables. Allow the guest to notify the hypervisor of GPTE changes using PVOPS to synchronize SPTE. Full design details are in PV MMU Design #13 .
      Note:
      We have a functional internal implementation, but we would like a more general implementation that is also available for nested TDP MMU. We need to discuss this.
  • Direct Page Table (Xen-like)?
  • Finer-Grained TLB Flushing
  • Move TLB flushing outside of MMU lock (TLB flushing delay).

VMM

  • QEMU boots guest directly into 64-bit mode.
  • QEMU/FC supports saving/restoring PVM virtual MSRs (for migration). ✅

Migration

  • Host kernel parameters and PVM module parameters should reserve a fixed address range for guests to facilitate migration between hosts.
  • Support migration of a guest from a 5-level paging mode host to a 4-level paging mode host and vice versa.

Testing

  • KVM Selftests/KVM Unit Test
    • Implement PVM guest detection/setup and PVM-related tests.
  • PVM ABI Selftests
    • Add self-tests to ensure ABI stability.

Debug Tools

  • Enhance pvm_get_exit_info().
    • Segment various exit reasons, e.g., export syscall number for tracing guest syscalls, and export hypercall number for tracing guest hypercalls.
  • Perf-KVM
    • Allow perf-kvm to analyze PVM exit reasons.
  • Crash
    • Adapt crash analysis to examine the PVM guest kernel address space layout, as the CPU_ENTRY_AREA is no longer fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant