lxc start fails despite stopped state #13453

holmanb · 2024-05-06T23:07:14Z

Required information

Distribution: Ubuntu
Distribution version: all
The output of "snap list --all lxd core20 core22 core24 snapd":

Name    Version         Rev    Tracking       Publisher   Notes
core22  20240111        1122   latest/stable  canonical✓  base,disabled
core22  20240408        1380   latest/stable  canonical✓  base
lxd     5.21.1-d46c406  28460  5.21/stable    canonical✓  -
snapd   2.61.2          21184  latest/stable  canonical✓  snapd,disabled
snapd   2.62            21465  latest/stable  canonical✓  snapd

The output of "lxc info" or if that fails:

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: holmanb
auth_user_method: unix
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB3TCCAWOgAwIBAgIQBECwI03mRzmVUgAKkav1ejAKBggqhkjOPQQDAzAiMQww
    CgYDVQQKEwNMWEQxEjAQBgNVBAMMCXJvb3RAaml2ZTAeFw0yNDA1MDYwODMxNDFa
    Fw0zNDA1MDQwODMxNDFaMCIxDDAKBgNVBAoTA0xYRDESMBAGA1UEAwwJcm9vdEBq
    aXZlMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEC2ZKsbCIKcuVrBWpLCY8eaL13dBc
    bro6wgVAg4014UeIBfDpmNKb/mJKKt/DxlRIq9/w7kvxMHHpLa9+NPB+pr6H/R51
    Vcz24YlY7Gp+almRBnWJIVjBT2tbFUjp+0lco14wXDAOBgNVHQ8BAf8EBAMCBaAw
    EwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAnBgNVHREEIDAeggRq
    aXZlhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMDKI
    vl5f2CGnF+m6AilFUAEIYZk+HYL6zkFlc+vWBVAVqxRiuQu0AkrqWKa0j1hgxAIx
    AOdkEP6KYCB3XcXH6cw6b9o+yZkCB2S8lQnKyk1lH76dHCx8e2Ivu/mhiYRfr9lz
    Sw==
    -----END CERTIFICATE-----
  certificate_fingerprint: 4a5b695606afb8fa1b37f921e76fc748f910a84e6d21c22a868b8dc7c9b80090
  driver: lxc | qemu
  driver_version: 6.0.0 | 8.2.1
  instance_types:
  - container
  - virtual-machine
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.8.0-31-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "24.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: jive
  server_pid: 3915
  server_version: 5.21.1
  server_lts: true
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.48.0
    remote: false
  - name: powerflex
    version: 1.16 (nvme-cli)
    remote: true
  - name: zfs
    version: 2.2.2-0ubuntu9
    remote: false
  - name: btrfs
    version: 5.16.2
    remote: false
  - name: ceph
    version: 17.2.7
    remote: true
  - name: cephfs
    version: 17.2.7
    remote: true
  - name: cephobject
    version: 17.2.7
    remote: true

Issue description

When an instance has recently shutdown and reports a state of STOPPED, an attempt to start the instance may fail with an error: The instance is already running. I observe this on occasion while manually running commands, but I wasn't bothered too much by the behavior so I never bothered to report. I discovered a bug in our test framework that prevents us from running certain tests, so I'm reporting it now.

Steps to reproduce

I see this in an integration test that fails intermittently, and I can reproduce this locally with the test running in a loop. A trivial reproducer could be made that launches an instance, shuts it back down (our test uses guest-initiated shutdown), then waits for STOPPED state before running lxc start.

Information to attach

Lxc monitor lxd-monitor-yml.txt

Note that messages about instance shutdown occur both before and after the start fails, yet the code that initiates the lxc start doesn't run until STOPPED is the reported instance state.

The text was updated successfully, but these errors were encountered:

holmanb · 2024-05-07T15:02:38Z

@tomponline thanks for brainstorming this issue earlier, I've collected some additional information to ease debugging. I discovered a couple of more things that are relevant:

Like you suspected, immediately after the lxc start fails, lxc list shows the state RUNNING.
This does not reproduce when running lxc stop manually rather than a guest-initiated shutdown

The following script reproduces the error while collecting lxc monitor logs.

For me this fails reliability in about 5 seconds.

#!/bin/sh

NAME=me
LOG=./reproduced.log

# stop if started
lxc stop $NAME 2> /dev/null || true

while true; do

    # monitor, wipe log if repro failed
    (lxc monitor me --pretty > $LOG) &

    echo "starting instance"
    lxc start $NAME

    # wait until dbus is available (required for shutdown to work)
    lxc exec $NAME -- sh -c 'while [ ! -S /run/dbus/system_bus_socket ]; do sleep 0.1; done'

    # shutdown from the host
    echo "shutting down the instance"
    lxc exec $NAME -- shutdown -H now
    while true; do

        status=$(lxc list $NAME -cs --format csv)
        if [ $status = STOPPED ] ; then

            # this will fail sometimes
            lxc start $NAME
            rc="$?"
            if [ $rc ]; then
                echo "lxd start failed"
                rerun="$(lxc list -cs --format csv $NAME)"
                echo "lxc state immediately after failure: $rerun"
                exit $rc
            fi

            # no error, retry
            echo "did not repro, retrying"
            lxc stop $NAME
            break
        fi
    done
done

holmanb · 2024-05-07T15:48:46Z

And a monitor log using --pretty:

reproduced.log

tomponline · 2024-05-08T08:11:20Z

Like you suspected, immediately after the lxc start fails, lxc list shows the state RUNNING.

This does not reproduce when running lxc stop manually rather than a guest-initiated shutdown

Perfect thanks, so that confirms my suspicion that there is a tiny duration of time where liblxc is reporting the container's state as stopped, before it has notified LXD that the guest has self-stopped which would trigger LXD's stop cleanup operation (which is then reporting the container's status as running).

MggMuggins · 2024-11-04T22:30:19Z

Thanks for the script Brett.

When shutting down a container, LXC sets the container state to STOPPING, runs the on-stop hook, and then closes the command socket (to avoid a different but similar race). When the command socket is closed, get_state (from a client) returns STOPPED instead of an error.

If GET /1.0/instances/{name}/state can start a get_state command before the operation has been set up, but lxc closes the command socket before it finishes responding to the request, the client ends up with STOPPED instead of STOPPING. The next request will then pick up the operation and report RUNNING until the operation has completed.

The LXC log for the container with raw.lxc: lxc.log.level = 0 makes this look like a race in LXC:

lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:484 - Set container state to STOPPING
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_serve_state_clients:487 - No state clients registered
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_USER_NS=/proc/59272/fd/18
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_MNT_NS=/proc/59272/fd/19
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_PID_NS=/proc/59272/fd/20
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_UTS_NS=/proc/59272/fd/21
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_IPC_NS=/proc/59272/fd/22
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_NET_NS=/proc/59272/fd/4
lxc me 20241104211249.598 TRACE    start - ../src/lxc/start.c:lxc_expose_namespace_environment:907 - Set environment variable LXC_CGROUP_NS=/proc/59272/fd/23
lxc me 20241104211249.598 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/home/wesley/.local/go/bin/lxd callhook /var/lib/lxd "default" "me" stopns" for container "me"
lxc me 20241104211249.598 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=stop
lxc me 20241104211249.598 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc me 20241104211249.715 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_tree_remove:491 - Removed cgroup tree 10(lxc.payload.me)
lxc me 20241104211249.715 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:726 - Reusing 10(lxc.pivot) cgroup
lxc me 20241104211249.715 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgroup_tree_create:741 - Opened cgroup lxc.pivot as 3
lxc me 20241104211249.731 TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_destroy:927 - Removed cgroup tree 10(lxc.monitor.me)
lxc me 20241104211249.731 TRACE    start - ../src/lxc/start.c:lxc_end:964 - Closed command socket
lxc 20241104211249.731 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc me 20241104211249.731 TRACE    start - ../src/lxc/start.c:lxc_end:975 - Set container state to "STOPPED"
lxc 20241104211249.731 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"
lxc me 20241104211249.749 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "me"
lxc me 20241104211249.749 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=post-stop
lxc me 20241104211249.749 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc
lxc me 20241104211250.255 INFO     utils - ../src/lxc/utils.c:run_script_argv:590 - Executing script "/home/wesley/.local/go/bin/lxd callhook /var/lib/lxd "default" "me" stop" for container "me"
lxc me 20241104211250.255 TRACE    utils - ../src/lxc/utils.c:run_script_argv:633 - Set environment variable: LXC_HOOK_TYPE=post-stop
lxc me 20241104211250.255 TRACE    utils - ../src/lxc/utils.c:run_script_argv:638 - Set environment variable: LXC_HOOK_SECTION=lxc

The socket is closed partway through handling of the get_state command.

If lxc could close the socket and wait for existing requests to complete before continuing with the container stop, then we wouldn't see this behavior. @mihalicyn might disagree, but I doubt that this is super straightforward/possible. My naive attempt with setsockopt(fd, SOL_SOCKET, SO_LINGER, ...) was not sufficient.

I've done a little testing and this doesn't appear to impact VM instances.

holmanb · 2024-11-12T23:59:47Z

Thanks for digging into this @MggMuggins!

The socket is closed partway through handling of the get_state command.

If lxc could close the socket and wait for existing requests to complete before continuing with the container stop, then we wouldn't see this behavior. @mihalicyn might disagree, but I doubt that this is super straightforward/possible.

Do you think that this deserves an upstream bug report to lxc?

My naive attempt with setsockopt(fd, SOL_SOCKET, SO_LINGER, ...) was not sufficient.

Digging around in lxc's source code, it looks like there are some timeouts that are set as well - I'm not sure if that may have an affect. I'd be curious to take a look if you have a public copy of your effort.

This doesn't work (I think) because SO_LINGER only prevents queued messages from being dropped; lxc hasn't queued a response to the client's request yet, so the race still exists. See canonical/lxd#13453 Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>

MggMuggins · 2024-11-13T17:18:58Z

When I say naive, I mean really naive: MggMuggins/lxc@0efd5c6

I considered a bug report but dismissed it since lxc does report a consistent transition from RUNNING -> STOPPING -> STOPPED; with fresh eyes today that seems like a bad excuse. I'll see if I can throw something together.

holmanb · 2024-11-14T02:34:05Z

When I say naive, I mean really naive: MggMuggins/lxc@0efd5c6

:-)

I considered a bug report but dismissed it since lxc does report a consistent transition from RUNNING -> STOPPING -> STOPPED; with fresh eyes today that seems like a bad excuse. I'll see if I can throw something together.

Sounds great, thanks for digging further. Please let me know how it goes either way.

Fixes canonical#13453 Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>

MggMuggins · 2024-11-14T03:34:34Z

I think my "fresh eyes" were just "poor memory eyes"... 😅. I got as far as our ask of upstream:

In an ideal world, lxc would partially close the command socket (shutdown(fd, SHUT_WR)?) and then block until all requests have been handled before continuing with the shutdown process.

But that doesn't resolve this; it just punts the race down the road a ways. Even if lxc doesn't interrupt get_state it will still pass back a status that could be stale.

Fundamentally, LXD (and clients) cannot make race-free decisions based on the state of instances because LXD does not maintain a canonical source for instance state; it is a middle-man between lxc|qemu and the client. We've run into this a few times recently with other features; it would be useful to have a more robust system for making decisions based on instance state. For what it's worth, I would consider it a dependency for some of our longer-term roadmap.

Without a bunch of design work I don't think it's feasible to truly fix this. However, checking for an ongoing operation after lxc returns significantly reduces the likelihood that get_state and the stop hook interleave. I've opened #14463 with this change.

I suspect that my initial assessment WRT VMs was wrong, they are likely affected by a similar race.

If an instance self-stops while `statusCode()` is waiting for `getLxcState()` to finish, `statusCode()` may return a stale instance state. This PR is a workaround for the use-case in #13453 and significantly reduces the likelihood that `statusCode` returns a stale status. In an ideal world, LXD would maintain a canonical cluster-wide view of instance state. This would allow making race-free decisions based on whether an instance is running or not. For example: - Project CPU/RAM limits could be enforced at instance start instead of at instance creation - Volumes with content-type block could be attached to more than one instance without `security.shared`; instance start could fail if another instance with any shared block volumes is already running.

holmanb · 2024-11-16T14:53:21Z

Thanks so much for working on this @MggMuggins and @tomponline! I'd like to have an idea of how frequently this race still occurs. In your testing of the fix, did the reproducer still trigger eventually with this fix?

MggMuggins · 2024-11-18T15:20:09Z

I got 15-20 iterations with no race; for comparison the script always reproduced it on the first try for me. I didn't see it again after the fix.

Fixes canonical#13453 Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com> (cherry picked from commit a7e88b0)

tomponline added this to the lxd-6.1 milestone May 8, 2024

tomponline added the Bug Confirmed to be a bug label May 8, 2024

tomponline modified the milestones: lxd-6.1, lxd-6.2 May 23, 2024

tomponline mentioned this issue Sep 5, 2024

Shortly after an instance start failure, the instance is reported as RUNNING instead of STOPPED #14042

Open

MggMuggins self-assigned this Oct 31, 2024

MggMuggins added a commit to MggMuggins/lxd that referenced this issue Nov 14, 2024

lxd/instance/drivers: Reduce chance of races during statusCode

266e06a

Fixes canonical#13453 Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>

MggMuggins mentioned this issue Nov 14, 2024

Workaround container self-stop race #14463

Merged

tomponline closed this as completed in a7e88b0 Nov 16, 2024

tomponline pushed a commit to tomponline/lxd that referenced this issue Dec 4, 2024

lxd/instance/drivers: Reduce chance of races during statusCode

b5cd06c

Fixes canonical#13453 Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com> (cherry picked from commit a7e88b0)

tomponline pushed a commit to tomponline/lxd that referenced this issue Dec 9, 2024

lxd/instance/drivers: Reduce chance of races during statusCode

45d0012

Fixes canonical#13453 Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com> (cherry picked from commit a7e88b0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lxc start fails despite stopped state #13453

lxc start fails despite stopped state #13453

holmanb commented May 6, 2024

holmanb commented May 7, 2024 •

edited

Loading

holmanb commented May 7, 2024

tomponline commented May 8, 2024

MggMuggins commented Nov 4, 2024 •

edited

Loading

holmanb commented Nov 12, 2024

MggMuggins commented Nov 13, 2024

holmanb commented Nov 14, 2024

MggMuggins commented Nov 14, 2024 •

edited

Loading

holmanb commented Nov 16, 2024

MggMuggins commented Nov 18, 2024

lxc start fails despite stopped state #13453

lxc start fails despite stopped state #13453

Comments

holmanb commented May 6, 2024

Required information

Issue description

Steps to reproduce

Information to attach

holmanb commented May 7, 2024 • edited Loading

holmanb commented May 7, 2024

tomponline commented May 8, 2024

MggMuggins commented Nov 4, 2024 • edited Loading

holmanb commented Nov 12, 2024

MggMuggins commented Nov 13, 2024

holmanb commented Nov 14, 2024

MggMuggins commented Nov 14, 2024 • edited Loading

holmanb commented Nov 16, 2024

MggMuggins commented Nov 18, 2024

holmanb commented May 7, 2024 •

edited

Loading

MggMuggins commented Nov 4, 2024 •

edited

Loading

MggMuggins commented Nov 14, 2024 •

edited

Loading