Skip to content

Conversation

@sunshowers
Copy link
Contributor

Last part of oxidecomputer/omicron#8922 for Propolis.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@sunshowers
Copy link
Contributor Author

looking at the migration_smoke test for example, the server is returning a 500: https://buildomat.eng.oxide.computer/wg/0/artefact/01K717KJSAFFZEZRRCPGK10RJ5/bvmawp89NoBGXZ5jEjKUtQkXmSjjd2pTVbq79DYSSWuQxW0x/01K717KWC9JNT7MDN1FB8QG55C/01K719297J2K1SXCT1GQNJCXXY/phd-runner.log?format=x-bunyan#L88.

downloading the log file tarball, within migration_smoke.stdout.log, I'm seeing:

{"msg":"failed to determine request API version","v":0,"name":"propolis-server","level":50,"time":"2025-10-08T06:38:57.426243649Z","hostname":"buskin","pid":868,"uri":"/instance/migrate/0dd308a4-bf7d-4bee-b8f6-c8bee8e9248f/start","method":"GET","req_id":"3e08a376-c7c1-4385-86d7-6493b7a4e76f","remote_addr":"127.0.0.1:57147","local_addr":"127.0.0.1:9000","error":"HttpError { status_code: 400, error_code: None, external_message: \"missing expected header \\\"api-version\\\"\", internal_message: \"missing expected header \\\"api-version\\\"\", headers: None }"}

That looks like the phd-runner is somehow using a Progenitor version < 0.10.0 (ie before the first version that sent the api-version header). But the Cargo.lock doesn't have any references to Progenitor 0.9 at all. Could the phd runner be using some kind of binary that needs to be rebuilt?

@sunshowers
Copy link
Contributor Author

Ah, so the issue is that this code is constructing its own client which of course won't include the API header.

// Build upgrade request to the source instance
// (we do this by hand because it's hidden from the OpenAPI spec)
// TODO(#165): https (wss)
// TODO: We need to make sure the src_addr is a valid target
let src_migrate_url = format!(
"ws://{}/instance/migrate/{}/start",
migrate_info.src_addr, migration_id,
);
info!(log, "Begin migration"; "src_migrate_url" => &src_migrate_url);
let (mut conn, _) =
tokio_tungstenite::connect_async(src_migrate_url).await?;

Going to dig into what to do here.

@sunshowers
Copy link
Contributor Author

So this is propolis-server instances making direct requests to other propolis-server instances to start migrations. I'm not sure how this would work in the context of self-service update, where the propolis-server instances might be on different versions.

There are two parts to this:

  1. The InstanceMigrateStartRequest path params, which seems to just have the migration ID.
  2. The actual migration protocol (though there seems to be protocol negotiation built into that part of migration, so maybe version skew isn't an issue there)

Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@sunshowers sunshowers changed the base branch from master to sunshowers/spr/master.propolis-make-api-versioned October 9, 2025 01:29
@sunshowers sunshowers changed the base branch from sunshowers/spr/master.propolis-make-api-versioned to master October 9, 2025 02:06
Created using spr 1.3.6-beta.1
@sunshowers
Copy link
Contributor Author

We addressed the above concern in #957.

@sunshowers
Copy link
Contributor Author

Ah so the migrate-from-base test will fail until the api-version header is passed in. So I guess I'll need to split that change into its own PR.

sunshowers added a commit that referenced this pull request Oct 10, 2025
Split out of #955 -- for the `phd-run-migrate-from-base` tests to work, this has to be in its own PR that's landed first.
Created using spr 1.3.6-beta.1
@sunshowers sunshowers marked this pull request as ready for review October 10, 2025 18:10
@sunshowers sunshowers merged commit 8e92529 into master Oct 11, 2025
11 checks passed
@sunshowers sunshowers deleted the sunshowers/spr/propolis-make-api-versioned branch October 11, 2025 02:50
iximeow added a commit to oxidecomputer/omicron that referenced this pull request Oct 18, 2025
Propolis changes:

* oxidecomputer/propolis#950
* oxidecomputer/propolis#952
* oxidecomputer/propolis#951
* oxidecomputer/propolis#954
* oxidecomputer/propolis#957
* oxidecomputer/propolis#960
* oxidecomputer/propolis#961
* oxidecomputer/propolis#955

Crucible changes:

* oxidecomputer/crucible#1773
* oxidecomputer/crucible#1774
* oxidecomputer/crucible#1780
* oxidecomputer/crucible#1778

Crucible shouldn't have functional changes here, Propolis' big ones are
@sunshowers' work moving Propolis to versioned APIs, plus propolis#960
turning the crank on MAXCPU.

propolis#961 changes the initial Milan CPU profile one last time before
the release in service of propolis#959. Propolis will clear [this
bit](https://github.com/oxidecomputer/omicron/blob/d74f5e3f1ae0a378dcdb9795a0ada2426702b046/nexus/src/app/instance_platform/cpu_platform.rs#L423).
Later we want to actually set up leaf 8000_001E, so after this merges
I'll have a followup to remove that leaf from the inital Milan
definition to keep the profile constant when `propolis-server` is
smarter about the leaf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants