Skip to content

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Sep 6, 2025

This is kind of three parts:

  • teach cpuid-utils about Intel-only leaves 4 and 18h
  • teach propolis' CPUID specializer about specializing leaf 4 (at least as it relates to core/thread counts)
  • include with_cpu_topo() so we update to-be-specialized topo leaves instead of just discard them

this fixes #921 and is part of fixing #940 (the other part being https://www.illumos.org/issues/17529 or getting leaf B from an instance spec)

I've run Linux and OmniOS guests on a 2697v4 and Xeon D-1713NTE now, neither directly complained about the bogus leaf 18h entries but the incorrect leaf 4 entries had resulted in Linux spinning fairly early in boot.

On AMD, specializing leaf B fixes #940, but I've deferred doing that in propolis-server until at least landing oxidecomputer/omicron#8728. Since nothing is obviously wrong about behavior with the silly "multi-socket" VM topology I'd rather not change it until it's part of a Milan V2 CPU platform. That can come with a constraint that vCPU counts have to be consistent with SMT and nudge all that forward more gracefully too.

// indicate that SMT is enabled, then vCPUs are presented as pairs
// of sibling threads on vproc-many processors.
let num_vproc = if self.has_smt {
// TODO: What if num_vcpu is odd?!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, on Linux if you're indicating SMT and then have an odd number of cores, at least with the RFD 314-like CPUID leaves, the OS decides that your topology is nonsense and asserts it's all one big core:

root@debian:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0-2
root@debian:~# cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
0-2

where the guest did in fact get reasonable leaf B bits:

root@debian:~# cpuid -r | grep 0000000b
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
   0x0000000b 0x01: eax=0x00000008 ebx=0x00000003 ecx=0x00000201 edx=0x00000000
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000001
   0x0000000b 0x01: eax=0x00000008 ebx=0x00000003 ecx=0x00000201 edx=0x00000001
   0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000002
   0x0000000b 0x01: eax=0x00000008 ebx=0x00000003 ecx=0x00000201 edx=0x00000002

if you have an even number of CPUs then Linux sees them all paired up (see the new test in cpuid.rs). it'd be kinda nice to just not allow odd vCPU-count guests until there's a reason to permit them. for the time being I think it's time to let this sit, do virtual platforms, and have reasonable topologies in the non-initial platform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the guest might be in a better state if the non-paired core does not have hyperthreading indicated in its CPUID. that makes me a little less worried about potential guest issues but is more work than this merits right now.

@iximeow iximeow changed the title Specialize CPUID leaves 4 and Bh for CPU topology Handle Intel CPUID leaves 4 and 18h, specialize CPUID for VM shape Sep 15, 2025
@iximeow iximeow marked this pull request as ready for review September 15, 2025 23:19
@iximeow iximeow requested a review from pfmooney September 15, 2025 23:27
// Cache types come in any order, but type 0 means there are
// no more caches, so iterate and adjust as needed until we
// see that.
const MAX_REASONABLE_LEVEL: u32 = 0x20;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to hoist this to the top of the cpuid-utils, considering there are several places where such a heuristic limit is appropriate?

Comment on lines 307 to 308
let ty = leaf.eax & 0b11111;
let level = (leaf.eax >> 5) & 0b111;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other bits get named-constant masks. Should these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was on the fence because names here are a bit silly ( .. yes, let ty = leaf.eax & LEAF4_EAX_CACHE_TYPE; ..), but it's kind of nice to line up the masks as much as rustfmt permits to see they don't overlap.

// that point.
if num_vcpu >= 256 {
return Err(SpecializeError::IncompatibleTopology {
leaf: 4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaf B, not 4

Should TopoKind have some accessor like leaf() which emits a u32, so it could be passed in here, rather than typed in by hand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubly awkward: leaf, which is used at various points in the match, is the u32. the enum is also #[repr(u32)] which is why that value comes from a simple *topo as u32. i've struck the explicit numbers in these IncompatibleTopology errors in favor of just leaf,.

@iximeow
Copy link
Member Author

iximeow commented Sep 16, 2025

on the D-1713NTE with this change we mostly just see that migration doesn't work on Intel:

failures:
    phd_tests::hyperv::hyperv_reference_tsc_elapsed_time_test
    phd_tests::hyperv::hyperv_reference_tsc_clocksource_test
    phd_tests::hyperv::hyperv_lifecycle_test
    phd_tests::disk::in_memory_backend_migration_test
    phd_tests::crucible::smoke::shutdown_persistence_test
    phd_tests::crucible::smoke::guest_reboot_test
    phd_tests::crucible::smoke::api_reboot_test
    phd_tests::crucible::smoke::boot_test
    phd_tests::migrate::running_process::export_failure
    phd_tests::migrate::running_process::import_failure
    phd_tests::migrate::running_process::migrate_running_process
    phd_tests::stats::instance_vcpu_stats
    phd_tests::cpuid::cpuid_migrate_smoke_test
    phd_tests::cpuid::cpuid_boot_test
    phd_tests::migrate::vm_reaches_destroyed_after_migration_out
    phd_tests::migrate::migration_ensures_instance_metadata
    phd_tests::migrate::multiple_migrations
    phd_tests::migrate::serial_history
    phd_tests::migrate::smoke_test
    phd_tests::crucible::migrate::load_test
    phd_tests::crucible::migrate::smoke_test

test result: FAILED. 21 passed; 21 failed; 0 skipped; 0 not run; finished in 959.52s

the cpuid failures were because i'd used a phd-runner built without this change: it built a profile with inappropriate leaf 4 subleaves which hung those guests on boot. building phd-runner with this diff, the cpuid_*_test tests pass.

the hyperv failures are because those tests involve migration, and trip over the broader "can't migrate on Intel". other tests that pass do include guests that boot, so some passes are legitimate exercises of functionality that now work.

@iximeow
Copy link
Member Author

iximeow commented Sep 17, 2025

phd_tests::migrate::from_base::migration_from_base_and_back ... FAILED: building VM spec from VmConfig

Caused by:
    0: getting guest OS kind for boot disk
    1: Failed to download alpine.iso from any remote URI

=| interesting failure mode after 417 minutes.

@iximeow iximeow merged commit db57d92 into master Sep 17, 2025
11 checks passed
@iximeow iximeow deleted the ixi/cpuid-cpu-topo-spec branch September 17, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VMs should not have vCPU-many sockets Guest OS SMT Panic with Intel(r) Xeon(r) CPU E5-2660

2 participants