Skip to content

Randomize VCR UUIDs for read-only disks#9882

Merged
jmpesp merged 4 commits intooxidecomputer:mainfrom
jmpesp:randomize_read_only_ids
Feb 20, 2026
Merged

Randomize VCR UUIDs for read-only disks#9882
jmpesp merged 4 commits intooxidecomputer:mainfrom
jmpesp:randomize_read_only_ids

Conversation

@jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Feb 18, 2026

Each read-only disk's VCR currently shares the same set of UUIDs, and even read-only downstairs do not support multiple connections that share the same upstairs UUID. Fix this here.

Note this won't fix any currently created read-only disks. They'll work just fine as long as only one read-only disk created from a particular snapshot or image is attached to a running instance, but using multiple will run into this issue.

Each read-only disk's VCR currently shares the same set of UUIDs, and
even read-only downstairs do not support multiple connections that share
the same upstairs UUID. Fix this here.

Note this won't fix any currently created read-only disks. They'll work
just fine as long as only one read-only disk created from a particular
snapshot or image is attached to a running instance, but using multiple
will run into this issue.
@jmpesp jmpesp requested a review from hawkw February 18, 2026 22:19
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, and especially thank you for including the test. I had some nitpicks, but none of them are worth blocking merging the fix over --- up to you if you want to address any of them.

// Randomize the IDs in the copied VCR: even read-only downstairs will
// still be sad if multiple connections share upstairs UUIDs and try to
// connect.
let copy_of_vcr = Self::randomize_ids(&copy_of_vcr).map_err(|e| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit of a nitpick on unchanged code, sorry --- i was a bit surprised to discover that this function is defined in the volume module rather than here, but has a rather generic name; it's kinda weird that there's a function DataStore::randomize_ids that doesn't ever actually tell the reader what kind of IDs its randomizing. it seems like it would read nicer if this was a free function in the volume module rather than being defined as an associated function on DataStore, and then we would call it here by saying volume::randomize_ids(&copy_of_vcr), which IMO makes it a bit clearer what is being done.

no pressure to change it in this PR if you really don't want to, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, i noticed that there is an existing volume_checkout_randomize_ids method, which it seems like we cannot use here as it is not intended for use inside a transaction? that seems a little sad but i don't actually care)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've been thinking about this for the past few days. I plan on imminently fixing this with a higher level Volume type, because it's a mess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, okay, let's save that for a follow-up then!

Comment on lines +2000 to +2001
err.bail(Error::InternalError {
internal_message: format!("failed to randomize VCR IDs: {e}"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the current implementation of randomize_ids, it looks like there is currently only one case where it will return an error, which is if there's a non-readonly region encountered: https://github.com/jmpesp/omicron/blob/b352ffbccbca5e0df1a86bfce254dbcfd3d6abe4/nexus/db-queries/src/db/datastore/volume.rs#L1112-L1121

i kinda wonder if this error message ought to say something a bit more explicitly about that: it would be a bug in higher-level code if we were trying to construct a read-only disk from a volume that contained non-read-only regions (since we should only permit creation of read-only disks from snapshots/images). on the other hand, that might be weird if randomize_ids was later changed to return other errors under other circumstances. i dunno.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a rough draft but I'm planning on eventually addressing this in the higher level type as well, something like a randomize_ids function that consumes the object and potentially returns a specific error variant, like:

fn randomize_ids(self) -> Result<Self, SomeErrorType> {
  // return a SomeErrorType::TryingToRandomizeAReadWriteVolume in that case
}

Naming is still one of the N hardest problems in computer science

Comment on lines +3040 to +3041
#[nexus_test]
async fn test_read_only_disk_different_vcr(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the thing we're testing here is a bit "in the weeds" (rather than being a basic sort of "does this API operation work?" test), it might be worth leaving a comment on this for future readers explaining what we're trying to verify here?

something like:

Suggested change
#[nexus_test]
async fn test_read_only_disk_different_vcr(
/// Tests that multiple read-only disks created from the same underlying
/// snapshot (or image) receive unique VCRs, so that the Crucible upstairs
/// instances for the two read-only disks can co-exist peacefully.
#[nexus_test]
async fn test_read_only_disk_different_vcr(

or something like that? feel free to change the wording if you can explain it better than I can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it - added in af39f7c with a minor clarification about what part of the VCR is unique.

assert_eq!(volume_1_ids.intersection(&volume_2_ids).count(), 0);
}

pub fn gather_ids(ids: &mut HashSet<Uuid>, vcr: &VolumeConstructionRequest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turbo nit: does this need to be pub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope! 21e6a3a

read_only_parent,
..
} => {
ids.insert(*id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should happen if the same ID is already in the HashSet? does that mean something is wrong, or is it okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that's bad news if the same ID is used for different region sets in the same vcr. I changed gather_ids to not care about the Volume ID, and only about the Region ID, as that is what is given to upstairs.

added in b373381

}

VolumeConstructionRequest::Region { opts, .. } => {
ids.insert(opts.id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, should we be panicking or something if we already encountered this ID, or is this okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see also b373381

@jmpesp jmpesp enabled auto-merge (squash) February 19, 2026 22:07
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, looks good to me!

@jmpesp jmpesp merged commit c3bf6cc into oxidecomputer:main Feb 20, 2026
16 checks passed
sunshowers added a commit that referenced this pull request Feb 20, 2026
@jmpesp jmpesp deleted the randomize_read_only_ids branch February 20, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants