Skip to content

Conversation

@jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Dec 18, 2025

Bump MAX_DISKS_PER_INSTANCE from 8 to 12. This plus the cloud init volume means that instances could now see up to 13 disks.

All affected sagas were able to easily deal with this change due to being parameterized by either that constant or integers (in the case of the saga action node repetition syntax).

slot_to_pci_bdf will place these new disks after all existing ones, which will not break existing instances: if disks were added to the beginning of the device range instead, guests would number the devices in a different order.

The trickiest part here was the schema migration: the original schema added a CHECK constraint to a disk's slot, and this wasn't named like regular constraints are. A little digging into cockroach's internal tables found the generated constraint name (which the upgrader matches for compatibility).

Fixes #9513

Bump `MAX_DISKS_PER_INSTANCE` from 8 to 12. This plus the cloud init
volume means that instances could now see up to 13 disks.

All affected sagas were able to easily deal with this change due to
being parameterized by either that constant or integers (in the case of
the saga action node repetition syntax).

`slot_to_pci_bdf` will place these new disks after all existing ones,
which will not break existing instances: if disks were added to the
beginning of the device range instead, guests would number the devices
in a different order.

The trickiest part here was the schema migration: the original schema
added a CHECK constraint to a disk's slot, and this wasn't named like
regular constraints are. A little digging into cockroach's internal
tables found the generated constraint name (which the upgrader matches
for compatibility).

Fixes oxidecomputer#9513
@jmpesp jmpesp requested a review from hawkw December 18, 2025 04:18
@sudomateo
Copy link
Contributor

@lgfa29 you should take a look at this to see if/how it affects the work you're doing at the hypervisor layer to enable more disks for the CSI plugin.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major concerns from me, provided that works with what it sounds like @lgfa29 is doing --- maybe worth waiting to hear back from him before merging.

Comment on lines +144 to +154
// disks get 16 through 23, and the cloud-init disk is device 24:
//
// 0 1
// 0123456789ABCDEF0123456789ABCDEF
// NNNNNNNNDDDDDDDDC DDDD
// ^^^^
//
// The additional disks at the end (marked with ^) were added to support up
// to 12 disks. Adding to the end of the range won't break existing
// instances: if disks were added to the beginning of the range, then guests
// (like Linux) would number the devices in a different order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this!

PciDeviceKind::Nic if logical_slot < 8 => logical_slot + 0x8,
PciDeviceKind::Disk if logical_slot < 8 => logical_slot + 0x10,
PciDeviceKind::CloudInitDisk if logical_slot == 0 => 0x18,
PciDeviceKind::Disk if logical_slot >= 8 && logical_slot < 12 => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, I think this could be

Suggested change
PciDeviceKind::Disk if logical_slot >= 8 && logical_slot < 12 => {
PciDeviceKind::Disk if (8..12).contains(logical_slot) < 12 => {

which...is either more or less clear, i think. probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client: &ClientTestContext,
name: &str,
) -> Vec<Disk> {
let url = format!("/v1/instances/{name}/disks?project={}", PROJECT_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unimportant nit: could also inline the PROJECT_NAME:

Suggested change
let url = format!("/v1/instances/{name}/disks?project={}", PROJECT_NAME);
let url = format!("/v1/instances/{name}/disks?project={PROJECT_NAME}");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +175 to +176
static_assertions::const_assert!(MAX_DISKS_PER_INSTANCE == 12);
seq!(N in 0..12 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i presume that the seq macro needs a literal and cannot be N in 0..MAX_DISKS_PER_INSTANCE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

error: expected integer
   --> nexus/src/app/sagas/instance_start.rs:175:22
    |
175 |         seq!(N in 0..MAX_DISKS_PER_INSTANCE {
    |                      ^^^^^^^^^^^^^^^^^^^^^^

ALTER TABLE
omicron.public.disk
DROP CONSTRAINT IF EXISTS
check_slot_slot;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool that you were able to find this!

@rmustacc
Copy link

@lgfa29 you should take a look at this to see if/how it affects the work you're doing at the hypervisor layer to enable more disks for the CSI plugin.

Realistically that needs the punted on phase 2 work outlined in #9513 (comment).

@lgfa29
Copy link
Member

lgfa29 commented Dec 18, 2025

@lgfa29 you should take a look at this to see if/how it affects the work you're doing at the hypervisor layer to enable more disks for the CSI plugin.

Thanks for the ping, but yeah, I was following the discussions and there's no impact on my work, so feel free to merge this whenever it's ready. Increasing the number of disks was something I was planning for a later time since it's not strictly a blocker for CSI.

@jmpesp jmpesp enabled auto-merge (squash) December 18, 2025 21:24
@jmpesp jmpesp merged commit 09b3715 into oxidecomputer:main Dec 19, 2025
16 checks passed
@jmpesp jmpesp self-assigned this Dec 19, 2025
@jmpesp jmpesp deleted the support_up_to_12_disks branch December 19, 2025 17:38
@AlejandroME AlejandroME added this to the 18 milestone Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Increase the MAX_DISKS_PER_INSTANCE limit to support using up to all disks in a sled for local storage

6 participants