add CPU platforms to instances #8728

iximeow · 2025-07-30T00:21:36Z

this materializes RFD 314 and in some respects, 505.

builds on #8725 for CPU family information, which is a stand-in for the notion of sled families and generations described in RFD 314. There are a few important details here where CPU platforms differ from the sled CPU family and I've differed from 314/505 (and need to update the RFDs to match). I'd not noticed the sheer volume of comments on https://github.com/oxidecomputer/rfd/pull/515 so I'm taking a pass through those and the exact bits in MILAN_CPUID may be further tweaked. I suspect the fixed array needs at least a few more tweaks anyway, cross-referencing RFD 314 turns out to make for awkward review and it's hard to eyeball the semantic of bits here (or which are to be filled in by some later component of the stack!)

As-is: I think would be OK to merge but is not quite as polished as I'd like it to be, so it's a real PR but I expect further changes.

hardware CPU families are less linear than Oxide CPU platforms.

We can (and do, in RFD 314) define Milan restrictively enough that we can present Turin (and probably later!) CPUs to guests "as if" they were Milan. Similarly I'd expect that Turin would be defined as roughly "Milan-plus-some-AVX-512-features" and pretty forward-compatible. Importantly these are related to but not directly representative of real CPUs; as an example I'd expect "Turin"-the-instance-CPU-platform to be able to run on a Turin Dense CPU. Conversely, there's probably not a reason to define a "Turin Dense" CPU platform since from a guest perspective they'd look about the same.

But at the same time the lineage through the AMD server part family splits at Zen 4 kind of, with Zen 4 vs Zen 4c-based parts and similar with Zen 5/c. It's somewhat hard (I think) to predict what workloads would be sensitive to this. And as #8730 gets into a bit, the details of a processor's packaging (core topology, frequency, cache size) can vary substantially even inside one CPU family. The important detail here is that we do not expect CPU platforms to cover these details and it would probably be cumbersome to try; if the instance's constraint is "I want AVX256, and I want to be on high-frequency-capable processors only", then it doesn't actually matter if it's run on a Turin or a Milan and to tie it to that CPU platform may be overly restrictive.

On instance CPU platforms, the hope is that by focusing on CPU features we're able to present a more linear path as the microarchitecture grow.

instance platforms aren't "minimum"

I've walked back the initial description of an instance's CPU platform as the "minimum CPU platform". As present in other systems, "minimum CPU platform" would more analogously mean "can we put you on a Rome Gimlet or must we put you on a Milan Gimlet?", or "Genoa Cosmo vs Turin Cosmo?" - it doesn't seem possible to say "this instance must have AVX 512, but otherwise I don't care what kind of hardware it runs on.", but that's more what we mean by CPU platform.

In a "minimum CPU platform" interpretation, we could provide a bunch of Turin CPUID bits to a VM that said it wanted Milan. But since there's no upper bound here, if an OS has an issue with a future "Zen 14" or whatever, a user would discover that by their "minimum-Milan" instance getting scheduled on the new space-age processor and exploding on boot or something. OSes shouldn't do that, but...

Implementation-wise, this is really just about the names right now. You always get Milan CPUID leaves for the time being. When there are Turin CPUID leaves defined for the instance CPU platform, and Cosmos on which they make sense, this becomes more concrete.

"are these CPU platforms compatible?"

RFD 314 deserves a section outlining how we determine a new CPU can be a stand-in for an older platform, talking about this and other leaves (particularly if we pass topology information through, elsewhere). I've added a few words about the problem of a CPUID profile being reasonable to present on some hardware's different CPUID profile, but I've also outlined what I've spotted as potential incompatibilities in the currently-dead functionally_same(). This is one of those things that will have the shape of "very boring and straightforward for eight years, and then a bit will change (or be defined) that's kind of annoying".

RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows users to constrain an instance to run on sleds that have a specific set of CPU features available. This allows a user to opt a VM into advanced hardware features (e.g. AVX-512 support) by constraining it to run only on sleds that support those features. For this to work, Nexus needs to understand what CPUs are present in which sleds. Have sled-agent query CPUID to get CPU vendor and family information and report this to Nexus as part of the sled hardware manifest.

the existing plumbing was sufficient for sled-agent to report the CPU family at startup, but did not provide the CPU family when Nexus calls later for inventory collections. when you've upgraded to this version, the database migration sets the sled CPU family to `unknown` expecting that the next inventory collection will figure things out. this doesn't happen, and the initial check-in doesn't update the CPU type either (presumably because the sled is already known and initialized from the control plane's perspective?) this does... most of the plumbing to report a sled's CPU family for inventory collection, but it doesn't actually work. `SledCpuFamily` being both in `omicron-common` and `nexus-client` is kind of unworkable. probably need a `ConvertInto` or something to transform the shared into the `nexus-client` when needed..? i've been trying to figure out what exactly is necessary and what is just building a mess for myself for two hours and this feels like it's going nowhere.

nexus/src/app/instance_platform.rs

iximeow

finally got to taking a fine-tooth comb through the CPUID bits here and differences between hardware and what guests currently see. for the most part this is in line with what guests already get from byhve defaults but i've noticed a few typos that unsurprisingly do not pose an issue booting at least Alpine guests. i'll clean that up and update 314 appropriately tomorrow.

iximeow · 2025-08-04T22:57:46Z

nexus/src/app/instance_platform.rs

+    // See [RFD 314](https://314.rfd.oxide.computer/) section 6 for all the
+    // gnarly details.
+    const MILAN_CPUID: [CpuidEntry; 32] = [
+        cpuid_leaf!(0x0, 0x0000000D, 0x68747541, 0x444D4163, 0x69746E65),


a guest currently sees eax=0x10 here, where leaves 0xe, 0xf, and 0x10 are all zeroes. 0xe is reserved and zero on the host. 0xf and 0x10 are zeroed because of the default byhve masking behavior. setting eax=0xd is just more precise: leaves 0xf and 0x10 being "present" but zero does not communicate any feature presence.

i know we reference RFD 314 but it kinda seems like it would be nice to have the comment explaining this here?###

i went a bit hard on the comment and now there's like 600 more lines of more-a-code-than-a-comment (https://github.com/oxidecomputer/omicron/pull/8728/files#diff-0c754f739cd972f46b539d2a2e5a6220cd0b72e8c9fe7f20a2013fab3f28aa21R167)

nexus/src/app/instance_platform.rs

iximeow · 2025-08-05T19:25:29Z

nexus/src/app/instance_platform.rs

+            0xB, 0x0, 0x00000001, 0x00000002, 0x00000100, 0x00000000
+        ),
+        cpuid_subleaf!(
+            0xB, 0x1, 0x00000000, 0x00000000, 0x00000201, 0x00000000


this is the same as RFD 314 says, but eax ought to be different in the RFD. Propolis fixes up B.1.EBX here to the number of vCPUs, but eax of 0 implies that the subleaf is actually invalid (from the APM: "If this function is executed with an unimplemented level (passed in ECX), the instruction returns all zeroes in the EAX register." .. also taken faithfully, this implies the VM's topology is vCPU-many sockets with SMT pairs across each socket pair. oops).

we should probably update the RFD to reflect that? also lol Oxide is the ONLY platform that lets you make a 64-socket VM...

well, it turns out the RFD is fine here, there's a later sentence that says "Index 1's eax and ebx depend on the VM shape". But Propolis is wrong here: oxidecomputer/propolis#939

So I'm going to hide this leaf in the proposed Milan v1 platform, fix the Propolis bug, and we can expose leaf Bh in a v2.

RFD 505 proposes that instances should be able to set a "minimum hardware platform" or "minimum CPU platform" that allows uers to constrain an instance to run on sleds that have a specific set of CPU features available. Previously, actually-available CPU information was plumbed from sleds to Nexus. This actually adds a `min_cpu_platform` setting for instance creation and uses it to drive selection of guest CPUID leaves. As-is, this moves VMs on Gimlets away from the byhve-default CPUID leaves (which are effectively "host CPUID information, but features that are not or cannot be virtualized are masked out"), instead using the specific CPUID information set out in RFD 505. There is no provision for Turin yet, which instead gets CPUID leaves that look like Milan. Adding a set of CPUID information to advertise for an `amd_turin` CPU platform, from here, is fairly straightforward. This does not have a mechanism to enforce specific CPU platform use or disuse, either in a silo or rack-wide. One could imagine a simple system oriented around "this silo is permitted to specify these minimum CPU platforms", but that leaves uncomfortable issues like: if a silo A permits only Milan, and silo B permits Milan and Turin, all Milan CPUs are allocated already, and someone is attemting to create a new Milan-based VM in silo A, should this succeed using Turin CPUs potentially starving silo B?

…platform as defined in Nexus

iximeow · 2025-09-03T18:03:48Z

the new #[test] fn milan_current_vs_rfd314_is_understood() describes the differences between a guest today's CPUID profile and what's in this PR in a more durable (and imo reasonable!) way than a bunch of in-line comments on this PR. i've resolved all the "a guest currently sees" comments on the RFD 314 Milan in favor of that, and in a few cases, comments there.

For a reason I can't figure out, when this was ``` AND sled.cpu_family IN (...) ``` binding a parameter made that query end up like ``` AND sled.cpu_family IN ($1,) ``` which then expects a single `sled_cpu_family` enum for later binding. Binding an array then yielded a type error like ``` DatabaseError(Unknown, "invalid cast: sled_cpu_family[] -> sled_cpu_family") ``` ... but writing it like `AND sled.cpu_family = ANY (...)` does not get the extra comma and instead renders like `ANY ($1)` which happily takes a `sled_cpu_family[]`.

hawkw

I'm happy to leave my approval on this now --- I like the use of raw-cpuid. I will freely admit, however, that my disposition towards reviewing all the CPUID bits that get constructed here is basically "take ixi's word for it". I think it could be worthwhile to get @pfmooney to take a look at this change as well?

hawkw · 2025-09-05T22:36:30Z

nexus/db-queries/src/db/queries/sled_reservation.rs

-    // TODO(gjc): eww. the correct way to do this is to write this as
-    //
-    // "AND sled.cpu_family = ANY ("
-    //
-    // and then just have one `param` which can be bound to a
-    // `sql_types::Array<SledCpuFamilyEnum>`
-    if let Some(families) = sled_families {
-        query.sql(" AND sled.cpu_family IN (");
-        for i in 0..families.len() {
-            if i > 0 {
-                query.sql(", ");
-            }
-            query.param();
-        }
-        query.sql(")");
+    if sled_families.is_some() {
+        query.sql(" AND sled.cpu_family = ANY (");
+        query.param();
+        query.sql(") ");


you love to see it

hawkw · 2025-09-05T22:39:45Z