-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Milestone
Description
i tried to create an instance on dogfood after updating this week and it ended up stuck in Starting.
switch 0 on dogfood seems in some kind of nebulous unhappy state (issue to come: short of it, we accidentally filled the switch zone with a core file when copying an old one out, and everything there went sideways), but separately the instance-start saga for this instance seems stuck in instance_start.dpd_ensure:
root@oxz_switch1:~# /tmp/omdb-saga db sagas show 727d4812-9383-4df3-985b-0c1bce68d5ad
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:109::3]:32221,[fd00:1122:3344:105::3]:32221,[fd00:1122:3344:10b::3]:32221,[fd00:1122:3344:107::3]:32221,[fd00:1122:3344:108::3]:32221/omicron?sslmode=disable
WARN: found schema version 144.0.0, expected 7.0.0
It's possible the database is running a version that's different from what this
tool understands. This may result in errors or incorrect output.
id | time_created | name | state
--------------------------------------+--------------------------------+----------------+--------------------------
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.112472 UTC | instance-start | SagaCachedState(Running)
saga id | event time | node id | event type | data
------------------------------------ | ------------------------------ | ---------------------------------------- | ---------- | ---
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.120631 UTC | 10: start | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.126739 UTC | 10: start | succeeded |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.130584 UTC | 0: instance_start.generate_propolis_id | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.134919 UTC | 0: instance_start.generate_propolis_id | succeeded | "b5bf8281-09fc-43e1-b12c-c91c0bb18543"
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.138944 UTC | 1: instance_start.alloc_server | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.179417 UTC | 1: instance_start.alloc_server | succeeded | "b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88"
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.183787 UTC | 2: instance_start.alloc_propolis_ip | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.193104 UTC | 2: instance_start.alloc_propolis_ip | succeeded | "fd00:1122:3344:106::1:9b7"
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.196560 UTC | 3: instance_start.create_vmm_record | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.205571 UTC | 3: instance_start.create_vmm_record | succeeded | {"id":"b5bf8281-09fc-43e1-b12c-c91c0bb18543","instance_id":"bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4","propolis_ip":"fd00:1122:3344:106::1:9b7/128","propolis_port":12400,"runtime":{"gen":1,"state":"Creating","time_state_updated":"2025-05-23T00:24:21.200216Z"},"sled_id":"b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88","time_created":"2025-05-23T00:24:21.200216Z","time_deleted":null}
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.208748 UTC | 4: instance_start.mark_as_starting | started |
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.314530 UTC | 4: instance_start.mark_as_starting | succeeded | {"auto_restart":{"cooldown":null,"policy":null},"boot_disk_id":null,"hostname":"ixi-600g-mem","identity":{"description":"beeeeeg memory (shouldn't panic a sled, probably)","id":"bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4","name":"ixi-600g-mem","time_created":"2025-05-23T00:24:19.204040Z","time_deleted":null,"time_modified":"2025-05-23T00:24:19.204040Z"},"intended_state":"Running","memory":644245094400,"ncpus":2,"project_id":"9c4152f9-4317-4269-9018-66142964d21c","runtime_state":{"dst_propolis_id":null,"gen":3,"migration_id":null,"nexus_state":"Vmm","propolis_id":"b5bf8281-09fc-43e1-b12c-c91c0bb18543","time_last_auto_restarted":null,"time_updated":"2025-05-23T00:24:19.204040Z"},"updater_gen":1,"updater_id":null,"user_data":[]}
727d4812-9383-4df3-985b-0c1bce68d5ad | 2025-05-23 00:24:21.318499 UTC | 5: instance_start.dpd_ensure | started |
very unfortunately, enough of the instance's state was determined that we started by looking for a Propolis issue, and came up blank for a while even though it looks convincing from omdb:
root@oxz_switch1:~# omdb db instance info bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:109::3]:32221,[fd00:1122:3344:105::3]:32221,[fd00:1122:3344:10b::3]:32221,[fd00:1122:3344:107::3]:32221,[fd00:1122:3344:108::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (144.0.0)
== INSTANCE ====================================================================
ID: bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4
project ID: 9c4152f9-4317-4269-9018-66142964d21c
name: ixi-600g-mem
description: beeeeeg memory (shouldn't panic a sled, probably)
created at: 2025-05-23 00:24:19.204040 UTC
last modified at: 2025-05-23 00:24:19.204040 UTC
== CONFIGURATION ===============================================================
vCPUs: 2
memory: 600 GiB
hostname: ixi-600g-mem
boot disk: None
auto-restart:
InstanceAutoRestart {
policy: None,
cooldown: None,
}
== RUNTIME STATE ===============================================================
nexus state: Vmm
(i) external API state: Starting
intended state: running
last updated at: 2025-05-23T00:24:19.204040Z (generation 3)
needs reincarnation: false
karmic status: saṃsāra (reincarnation enabled)
last reincarnated at: None
active VMM ID: Some(b5bf8281-09fc-43e1-b12c-c91c0bb18543)
target VMM ID: None
migration ID: None
updater lock: UNLOCKED at generation: 1
== ACTIVE VMM ==================================================================
ID: b5bf8281-09fc-43e1-b12c-c91c0bb18543
instance ID: bf2e1d9e-fcb4-47fe-9cc5-c2e9a268fda4
created at: 2025-05-23 00:24:21.200216 UTC
state: creating
updated at: 2025-05-23T00:24:21.200216Z (generation 1)
propolis address: fd00:1122:3344:106::1:9b7:12400
sled ID: b886b58a-1e3f-4be1-b9f2-0c2e66c6bc88
at the very least, we probably should have timed out and failed the instance start at some point?
Metadata
Metadata
Assignees
Labels
No labels