Skip to content

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Oct 10, 2025

If leaf 8000_001E is to be discarded, ensure we clear TopologyExtensions so that guests are not told the now-zeroed leaf is valid

If leaf 8000_001E is to be kept, specialize it for each vCPU. The code that was present is fine, we just need to include it in the list of leaves that Propolis supports specializing.

The initial CPUID profile must have TopologyExtensions set if guests are expected to use leaf 8000_001E.

This leaf has spooky action properties with Windows guests. To reiterate https://github.com/oxidecomputer/omicron/pull/9043/files#r2418399831: if PerfCtrExtCore is set and TopologyExtensions is not, Windows Server 2022 has been seen to enter an infinite loop at boot. If both bits are set, neither bit is set, or TopologyExtensions is set without PerfCtrExtCore, Windows Server 2022 appears to be fine.

This (minorly) changes CPUID profiles that had included leaf 8000_001E. There is no provision for the (incorrect) CPUID profile that propolis-{server,standalone} could provide before, where the initial APIC ID was read as zero on all cores while having the TopologyExtensions bit set.

@iximeow iximeow requested a review from pfmooney October 10, 2025 18:56
@iximeow
Copy link
Member Author

iximeow commented Oct 10, 2025

discovered by a bit of a fluke: if I use propolis-server with this patch, and propolis-cli sends no CPUID profile.. the profile from my 7950x via query_complete, then specialized with leaf 8000_001E, produces something that panics EDK2 at start. figuring that out before merging, just to be sure about what's happening there.

If leaf 8000_001E is to be discarded, ensure we clear
`TopologyExtensions` so that guests are not told the now-zeroed leaf is
valid

If leaf 8000_001E is to be kept, specialize it for each vCPU. The code
that was present is mostly fine for this purpose. It is not in the list
of leaves Propolis supports specializing because `propolis-server` and
`propolis-standalone` unconditionally set `has_smt == true` (even for
single-vCPU VMs!), which means a specialized leaf 8000_001E will fault
in EDK2 on boot.

The initial CPUID profile must have `TopologyExtensions` set if guests
are expected to use leaf 8000_001E.

This leaf has spooky action properties with Windows guests. To reiterate
https://github.com/oxidecomputer/omicron/pull/9043/files#r2418399831:
if `PerfCtrExtCore` is set and `TopologyExtensions` is not, Windows
Server 2022 has been seen to enter an infinite loop at boot. If both
bits are set, neither bit is set, or `TopologyExtensions` is set without
`PerfCtrExtCore`, Windows Server 2022 appears to be fine.

This (minorly) changes CPUID profiles that had included leaf 8000_001E.
There is no provision for the (incorrect) CPUID profile that
`propolis-{server,standalone}` could provide before, where the initial
APIC ID was read as zero on all cores while having the
`TopologyExtensions` bit set.
@iximeow
Copy link
Member Author

iximeow commented Oct 11, 2025

OK. I'd gotten the fluke backwards: propolis-server/propolis-cli are fine, using propolis-standalone is where this patch runs into trouble (very literally had left and right in diff wrong...). The immediate issue is that in EDK2 this division yields zero if ThreadsPerCore + 1 is greater than MaxLogicProcessorsPerPackage. This is the denominator on line 1213, we die to division by zero.

ThreadsPerCore is in leaf 8000001E ebx, where we either set it to 0 or 1 (for 1 thread or 2 threads per core) based on has_smt. Before, we'd discard leaf 80000001E, but still allege that that it was a valid leaf. So EDK2 would get here, read leaf 80000008 to see ApicIdCoreIdSize == 0 (also an issue...), then read ThreadsPerCore == 0 as if there was no SMT, compute MaxCoresPerPackage == 1/1, and on we go with the goofy many-socket topology. I confirmed this is the path we've been taking in the product with a bit of dtrace, it's... rather embarrassing.

With propolis-standalone my host offers up leaf 8000_001E. Now it's Supported so we give the guest a leaf with ThreadsPerCore set to 1 instead of 0, and everything goes sideways. propolis-server doesn't hit this EDK2 behavior because I'd intentionally not used all supported topology leaves, so we always just clip out leaf 8000_001E and hide the topology extension bit.

I'm going to remove Ext1E from the supported topo leaves list for now, figure out the relationship between has_smt, #940, and ApicIdCoreIdSize separately..

@iximeow iximeow merged commit ff52055 into oxidecomputer:master Oct 11, 2025
11 checks passed
iximeow added a commit to oxidecomputer/omicron that referenced this pull request Oct 18, 2025
Propolis changes:

* oxidecomputer/propolis#950
* oxidecomputer/propolis#952
* oxidecomputer/propolis#951
* oxidecomputer/propolis#954
* oxidecomputer/propolis#957
* oxidecomputer/propolis#960
* oxidecomputer/propolis#961
* oxidecomputer/propolis#955

Crucible changes:

* oxidecomputer/crucible#1773
* oxidecomputer/crucible#1774
* oxidecomputer/crucible#1780
* oxidecomputer/crucible#1778

Crucible shouldn't have functional changes here, Propolis' big ones are
@sunshowers' work moving Propolis to versioned APIs, plus propolis#960
turning the crank on MAXCPU.

propolis#961 changes the initial Milan CPU profile one last time before
the release in service of propolis#959. Propolis will clear [this
bit](https://github.com/oxidecomputer/omicron/blob/d74f5e3f1ae0a378dcdb9795a0ada2426702b046/nexus/src/app/instance_platform/cpu_platform.rs#L423).
Later we want to actually set up leaf 8000_001E, so after this merges
I'll have a followup to remove that leaf from the inital Milan
definition to keep the profile constant when `propolis-server` is
smarter about the leaf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants