Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle LXD not removing container host side nic #3189

Closed
7 tasks
dann1 opened this issue Apr 5, 2019 · 0 comments
Closed
7 tasks

Handle LXD not removing container host side nic #3189

dann1 opened this issue Apr 5, 2019 · 0 comments

Comments

@dann1
Copy link
Contributor

dann1 commented Apr 5, 2019

Description
Sometimes when rebooting a container the host side nic of the veth pair isn't removed by LXD, causing the container to fail to boot afterwards

To Reproduce
The issue doesn't happen always, but here is a VM log of the actions performed. This set of actions has produced the issue several times.

Fri Apr 5 00:14:03 2019 [Z0][VM][I]: New state is ACTIVE
Fri Apr 5 00:14:03 2019 [Z0][VM][I]: New LCM state is PROLOG
Fri Apr 5 00:14:05 2019 [Z0][VM][I]: New LCM state is BOOT
Fri Apr 5 00:14:05 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/135/deployment.0
Fri Apr 5 00:14:06 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Apr 5 00:14:07 2019 [Z0][VMM][I]: ExitCode: 0
Fri Apr 5 00:14:07 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: deploy: Using qcow2 mapper for /var/lib/one/datastores/0/135/disk.0
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/lxd/storage-pools/default/containers/one-135/rootfs using device /dev/nbd8
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: deploy: Mounting /dev/nbd8 at /var/lib/lxd/storage-pools/default/containers/one-135/rootfs
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: deploy: Mapping disk at /var/lib/one/datastores/0/135/mapper/disk.1 using device /dev/loop1
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: deploy: Mounting /dev/loop1 at /var/lib/one/datastores/0/135/mapper/disk.1
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: ExitCode: 0
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: ExitCode: 0
Fri Apr 5 00:14:08 2019 [Z0][VMM][I]: Successfully execute network driver operation: post.
Fri Apr 5 00:14:08 2019 [Z0][VM][I]: New LCM state is RUNNING
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Command execution failed (exit code: 1): 'if [ -x "/var/tmp/one/vmm/lxd/reboot" ]; then /var/tmp/one/vmm/lxd/reboot one-135 ubuntu1804-lxd-marketplace-99293-0.test 135 ubuntu1804-lxd-marketplace-99293-0.test; else exit 42; fi'
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: reboot: Name: one-135
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Remote: unix://
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Architecture: x86_64
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Created: 2019/04/05 00:14 UTC
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Status: Stopped
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Type: persistent
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Profiles: default
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: 
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: Log:
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: 
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: lxc one-135 20190405001414.173 ERROR network - network.c:instantiate_veth:106 - Operation not permitted - Failed to create veth pair "one-135-0" and "vethJNSNXD"
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: lxc one-135 20190405001414.173 ERROR network - network.c:lxc_create_network_priv:2457 - Failed to create network device
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: lxc one-135 20190405001414.173 ERROR start - start.c:lxc_spawn:1626 - Failed to create the network
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: lxc one-135 20190405001414.173 ERROR start - start.c:__lxc_start:1939 - Failed to spawn container "one-135"
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: lxc one-135 20190405001414.174 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:842 - Received container state "STOPPING" instead of "RUNNING"
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: 
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: /var/tmp/one/vmm/lxd/client.rb:101:in `wait': {"type"=>"sync", "status"=>"Success", "status_code"=>200, "operation"=>"", "error_code"=>0, "error"=>"", "metadata"=>{"id"=>"2a6ea158-501c-4058-b919-783c701cc6c2", "class"=>"task", "description"=>"Starting container", "created_at"=>"2019-04-05T00:14:14.109968029Z", "updated_at"=>"2019-04-05T00:14:14.109968029Z", "status"=>"Failure", "status_code"=>400, "resources"=>{"containers"=>["/1.0/containers/one-135"]}, "metadata"=>nil, "may_cancel"=>false, "err"=>"Failed to run: /usr/lib/lxd/lxd forkstart one-135 /var/lib/lxd/containers /var/log/lxd/one-135/lxc.conf: "}} (LXDError)
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:429:in `wait?'
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:441:in `change_state'
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/container.rb:184:in `start'
Fri Apr 5 00:14:14 2019 [Z0][VMM][I]: from /var/tmp/one/vmm/lxd/reboot:38:in `<main>'
Fri Apr 5 00:14:14 2019 [Z0][VMM][E]: Error rebooting VM, assume it's still running
Fri Apr 5 00:14:23 2019 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Fri Apr 5 00:14:23 2019 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Fri Apr 5 00:14:23 2019 [Z0][VM][I]: New state is POWEROFF
Fri Apr 5 00:14:23 2019 [Z0][VM][I]: New LCM state is LCM_INIT

Expected behavior
The quick fix is to delete the nic that wasn't deleted, in this case, one-135-0. On the LXD node run ip link delete one-135-0 as root.

The drivers should handle this case where the nic isn't removed and delete it.

Details

  • Affected Component: Virtualization Drivers
  • Hypervisor: LXD
  • Version: one-5.8.0

Additional context
The ubuntu_xenial image from the LXD marketplace seems to cause this issue
frequently. Here is the VM template.

User template
ARCH = "x86_64"
LXD_SECURITY_PRIVILEGED = "true"
Template
AUTOMATIC_DS_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_NIC_REQUIREMENTS = "(\"CLUSTERS/ID\" @> 0)"
AUTOMATIC_REQUIREMENTS = "(CLUSTER_ID = 0) & !(PUBLIC_CLOUD = YES)"
CONTEXT = [
  DISK_ID = "1",
  ETH0_CONTEXT_FORCE_IPV4 = "",
  ETH0_DNS = "10.0.0.2",
  ETH0_EXTERNAL = "",
  ETH0_GATEWAY = "192.168.150.1",
  ETH0_GATEWAY6 = "",
  ETH0_IP = "192.168.150.100",
  ETH0_IP6 = "",
  ETH0_IP6_PREFIX_LENGTH = "",
  ETH0_IP6_ULA = "",
  ETH0_MAC = "02:00:c0:a8:96:64",
  ETH0_MASK = "",
  ETH0_MTU = "",
  ETH0_NETWORK = "",
  ETH0_SEARCH_DOMAIN = "",
  ETH0_VLAN_ID = "",
  ETH0_VROUTER_IP = "",
  ETH0_VROUTER_IP6 = "",
  ETH0_VROUTER_MANAGEMENT = "",
  NETWORK = "YES",
  ONEGATE_ENDPOINT = "http://192.168.150.1:5030",
  SSH_PUBLIC_KEY = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCYz+lkZoNyspRhrtXDKFN3cIEwN3w08mz0YGKpVDIiV0+/vgG8dAUQ70Irs3m83W9BHN+vNjKPgKcF+X+sSfxniOtavahxGCRjAhhs1IVm196C5ODbSgXVUWULdtmMHelXbLBJ8X340h/UO+eQ6eRLaRfslXUsgRqremVcvCCPz4LIuRiliGWiELAmqYcY+1zJLeg3QV2Pgn5vschM9e/A4AseKO+HnbGB/I5tnoeZT/Gc3FGfUZLNFVB2XsVGAEEzkqO8VI2msB7MCAZBHffIK6WfLIYgGP6Ha2JT1NWJU7Ncj9Xuql0ElF01VwWMDWzqc0DOiVSsTL89ugJKU6+h one",
  START_SCRIPT = "echo ok >/tmp/start_script",
  START_SCRIPT_BASE64 = "ZWNobyBvayA+L3RtcC9zdGFydF9zY3JpcHRfYmFzZTY0
",
  TARGET = "hda",
  TOKEN = "YES",
  VMID = "135" ]
CPU = "0.1"
CREATED_BY = "0"
DISK = [
  ALLOW_ORPHANS = "NO",
  CACHE = "unsafe",
  CLONE = "YES",
  CLONE_TARGET = "SYSTEM",
  CLUSTER_ID = "0",
  DATASTORE = "default",
  DATASTORE_ID = "1",
  DEV_PREFIX = "vd",
  DISK_ID = "0",
  DISK_SNAPSHOT_TOTAL_SIZE = "0",
  DISK_TYPE = "FILE",
  DRIVER = "qcow2",
  IMAGE = "ubuntu_xenial",
  IMAGE_ID = "27",
  IMAGE_STATE = "2",
  LN_TARGET = "NONE",
  ORIGINAL_SIZE = "1024",
  READONLY = "NO",
  SAVE = "NO",
  SIZE = "1024",
  SOURCE = "/var/lib/one//datastores/1/264659c1ba6d3074852b57301986dbb0",
  TARGET = "vda",
  TM_MAD = "qcow2",
  TYPE = "FILE" ]
GRAPHICS = [
  LISTEN = "0.0.0.0",
  PORT = "6035",
  TYPE = "vnc" ]
MEMORY = "768"
NIC = [
  AR_ID = "0",
  BRIDGE = "br0",
  BRIDGE_TYPE = "linux",
  CLUSTER_ID = "0",
  IP = "192.168.150.100",
  MAC = "02:00:c0:a8:96:64",
  MODEL = "virtio",
  NAME = "NIC0",
  NETWORK = "public",
  NETWORK_ID = "0",
  NIC_ID = "0",
  SECURITY_GROUPS = "0",
  TARGET = "one-135-0",
  VN_MAD = "dummy" ]
NIC_DEFAULT = [
  MODEL = "virtio" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "OUTBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
SECURITY_GROUP_RULE = [
  PROTOCOL = "ALL",
  RULE_TYPE = "INBOUND",
  SECURITY_GROUP_ID = "0",
  SECURITY_GROUP_NAME = "default" ]
TEMPLATE_ID = "153"
TM_MAD_SYSTEM = "qcow2"
VMID = "135"

Progress Status

  • Branch created
  • Code committed to development branch
  • Testing - QA
  • Documentation
  • Release notes - resolved issues, compatibility, known issues
  • Code committed to upstream release/hotfix branches
  • Documentation committed to upstream release/hotfix branches
@dann1 dann1 self-assigned this Apr 5, 2019
@dann1 dann1 added this to the Release 5.8.2 milestone Apr 5, 2019
dann1 added a commit to dann1/docs that referenced this issue Apr 8, 2019
rsmontero pushed a commit to OpenNebula/docs that referenced this issue Apr 8, 2019
rsmontero pushed a commit to OpenNebula/docs that referenced this issue Apr 8, 2019
@dann1 dann1 removed this from the Release 5.8.2 milestone Apr 23, 2019
dann1 added a commit to dann1/docs that referenced this issue Apr 23, 2019
rsmontero pushed a commit to OpenNebula/docs that referenced this issue Apr 26, 2019
rsmontero pushed a commit to OpenNebula/docs that referenced this issue Apr 26, 2019
dann1 added a commit to dann1/docs that referenced this issue May 5, 2019
@dann1 dann1 added this to the Release 5.10 milestone Jul 26, 2019
dann1 added a commit that referenced this issue Aug 19, 2019
dann1 added a commit that referenced this issue Aug 19, 2019
dann1 added a commit that referenced this issue Aug 19, 2019
dann1 added a commit that referenced this issue Aug 19, 2019
dann1 added a commit that referenced this issue Aug 21, 2019
dann1 added a commit that referenced this issue Aug 21, 2019
dann1 added a commit that referenced this issue Aug 21, 2019
dann1 added a commit that referenced this issue Aug 21, 2019
dann1 added a commit that referenced this issue Oct 1, 2019
dann1 added a commit that referenced this issue Oct 1, 2019
dann1 added a commit that referenced this issue Oct 2, 2019
dann1 added a commit that referenced this issue Oct 2, 2019
dann1 added a commit that referenced this issue Oct 2, 2019
dann1 added a commit that referenced this issue Oct 2, 2019
dann1 added a commit to OpenNebula/docs that referenced this issue Oct 2, 2019
rsmontero added a commit that referenced this issue Oct 2, 2019
rsmontero pushed a commit to OpenNebula/docs that referenced this issue Oct 2, 2019
dann1 added a commit that referenced this issue Nov 1, 2019
rsmontero pushed a commit that referenced this issue Nov 4, 2019
dann1 added a commit that referenced this issue Nov 4, 2019
dann1 added a commit that referenced this issue Nov 4, 2019
rsmontero pushed a commit that referenced this issue Nov 5, 2019
dann1 added a commit that referenced this issue Feb 18, 2020
dann1 added a commit that referenced this issue Feb 19, 2020
rsmontero pushed a commit that referenced this issue Feb 19, 2020
* F #3189: Fix lxd net hook

* F #3189: Fix LXD transitions

* F #3189:  Prioritize transition flag over status

* M #: Lint

* M #: Remove WIP hook

* M #: C7 compat

* F #3189: Remove flag only on native containers
rsmontero pushed a commit that referenced this issue Feb 19, 2020
* F #3189: Fix lxd net hook

* F #3189: Fix LXD transitions

* F #3189:  Prioritize transition flag over status

* M #: Lint

* M #: Remove WIP hook

* M #: C7 compat

* F #3189: Remove flag only on native containers

(cherry picked from commit e594c88)
atodorov-storpool pushed a commit to storpool/one that referenced this issue Feb 21, 2020
* F OpenNebula#3189: Fix lxd net hook

* F OpenNebula#3189: Fix LXD transitions

* F OpenNebula#3189:  Prioritize transition flag over status

* M #: Lint

* M #: Remove WIP hook

* M #: C7 compat

* F OpenNebula#3189: Remove flag only on native containers
dann1 added a commit that referenced this issue Feb 21, 2020
rsmontero pushed a commit that referenced this issue Feb 21, 2020
rsmontero pushed a commit that referenced this issue Feb 21, 2020
(cherry picked from commit 23b3aa3)
dann1 added a commit that referenced this issue Feb 26, 2020
dann1 added a commit that referenced this issue Feb 27, 2020
rsmontero pushed a commit that referenced this issue Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants