Randomize VCR UUIDs for read-only disks by jmpesp · Pull Request #9882 · oxidecomputer/omicron

jmpesp · 2026-02-18T22:19:58Z

Each read-only disk's VCR currently shares the same set of UUIDs, and even read-only downstairs do not support multiple connections that share the same upstairs UUID. Fix this here.

Note this won't fix any currently created read-only disks. They'll work just fine as long as only one read-only disk created from a particular snapshot or image is attached to a running instance, but using multiple will run into this issue.

Each read-only disk's VCR currently shares the same set of UUIDs, and even read-only downstairs do not support multiple connections that share the same upstairs UUID. Fix this here. Note this won't fix any currently created read-only disks. They'll work just fine as long as only one read-only disk created from a particular snapshot or image is attached to a running instance, but using multiple will run into this issue.

hawkw

Thanks for fixing this, and especially thank you for including the test. I had some nitpicks, but none of them are worth blocking merging the fix over --- up to you if you want to address any of them.

hawkw · 2026-02-19T18:15:24Z

nexus/db-queries/src/db/datastore/disk.rs

+        // Randomize the IDs in the copied VCR: even read-only downstairs will
+        // still be sad if multiple connections share upstairs UUIDs and try to
+        // connect.
+        let copy_of_vcr = Self::randomize_ids(&copy_of_vcr).map_err(|e| {


this is a bit of a nitpick on unchanged code, sorry --- i was a bit surprised to discover that this function is defined in the volume module rather than here, but has a rather generic name; it's kinda weird that there's a function DataStore::randomize_ids that doesn't ever actually tell the reader what kind of IDs its randomizing. it seems like it would read nicer if this was a free function in the volume module rather than being defined as an associated function on DataStore, and then we would call it here by saying volume::randomize_ids(&copy_of_vcr), which IMO makes it a bit clearer what is being done.

no pressure to change it in this PR if you really don't want to, though.

(also, i noticed that there is an existing volume_checkout_randomize_ids method, which it seems like we cannot use here as it is not intended for use inside a transaction? that seems a little sad but i don't actually care)

Yeah, I've been thinking about this for the past few days. I plan on imminently fixing this with a higher level Volume type, because it's a mess.

cool, okay, let's save that for a follow-up then!

hawkw · 2026-02-19T18:19:17Z

nexus/db-queries/src/db/datastore/disk.rs

+            err.bail(Error::InternalError {
+                internal_message: format!("failed to randomize VCR IDs: {e}"),


looking at the current implementation of randomize_ids, it looks like there is currently only one case where it will return an error, which is if there's a non-readonly region encountered: https://github.com/jmpesp/omicron/blob/b352ffbccbca5e0df1a86bfce254dbcfd3d6abe4/nexus/db-queries/src/db/datastore/volume.rs#L1112-L1121

i kinda wonder if this error message ought to say something a bit more explicitly about that: it would be a bug in higher-level code if we were trying to construct a read-only disk from a volume that contained non-read-only regions (since we should only permit creation of read-only disks from snapshots/images). on the other hand, that might be weird if randomize_ids was later changed to return other errors under other circumstances. i dunno.

Still a rough draft but I'm planning on eventually addressing this in the higher level type as well, something like a randomize_ids function that consumes the object and potentially returns a specific error variant, like:

fn randomize_ids(self) -> Result<Self, SomeErrorType> { // return a SomeErrorType::TryingToRandomizeAReadWriteVolume in that case }

Naming is still one of the N hardest problems in computer science

hawkw · 2026-02-19T18:22:54Z

nexus/tests/integration_tests/disks.rs

+#[nexus_test]
+async fn test_read_only_disk_different_vcr(


since the thing we're testing here is a bit "in the weeds" (rather than being a basic sort of "does this API operation work?" test), it might be worth leaving a comment on this for future readers explaining what we're trying to verify here?

something like:

Suggested change

#[nexus_test]

async fn test_read_only_disk_different_vcr(

/// Tests that multiple read-only disks created from the same underlying

/// snapshot (or image) receive unique VCRs, so that the Crucible upstairs

/// instances for the two read-only disks can co-exist peacefully.

#[nexus_test]

async fn test_read_only_disk_different_vcr(

or something like that? feel free to change the wording if you can explain it better than I can.

I like it - added in af39f7c with a minor clarification about what part of the VCR is unique.

hawkw · 2026-02-19T18:24:09Z

nexus/tests/integration_tests/disks.rs

+    assert_eq!(volume_1_ids.intersection(&volume_2_ids).count(), 0);
+}
+
+pub fn gather_ids(ids: &mut HashSet<Uuid>, vcr: &VolumeConstructionRequest) {


turbo nit: does this need to be pub?

nope! 21e6a3a

hawkw · 2026-02-19T18:24:51Z

nexus/tests/integration_tests/disks.rs

+                read_only_parent,
+                ..
+            } => {
+                ids.insert(*id);


what should happen if the same ID is already in the HashSet? does that mean something is wrong, or is it okay?

hmm, that's bad news if the same ID is used for different region sets in the same vcr. I changed gather_ids to not care about the Volume ID, and only about the Region ID, as that is what is given to upstairs.

added in b373381

hawkw · 2026-02-19T18:25:10Z

nexus/tests/integration_tests/disks.rs

+            }
+
+            VolumeConstructionRequest::Region { opts, .. } => {
+                ids.insert(opts.id);


similarly, should we be panicking or something if we already encountered this ID, or is this okay?

		err.bail(Error::InternalError {
		internal_message: format!("failed to randomize VCR IDs: {e}"),

Conversation

jmpesp commented Feb 18, 2026

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants