MCM should not try to drain machines that have not joined the cluster #465

rfranzke · 2020-05-28T16:00:51Z

What happened:
MCM is trying to delete some Machine objects. Now it tries to drain the corresponding Node objects, however, as the mentioned Machines never joined the cluster, there are no such Node objects. The drain fails forever with:

status:
  currentStatus:
    lastUpdateTime: "2020-05-28T15:30:18Z"
    phase: Terminating
  lastOperation:
    description: Drain failed - resource name may not be empty
    lastUpdateTime: "2020-05-28T15:45:52Z"
    state: Failed
    type: Delete

Logs of the machine-controller-manager:

I0528 15:45:42.432992       1 deployment.go:448] Processing the machinedeployment "shoot--foo--bar-cpu-worker" (with replicas 4)
W0528 15:45:42.639551       1 machine.go:658] Drain failed for machine "shoot--foo--bar-cpu-worker-7cdf986ff9-pzcgs".
Buf:
ErrBuf:
Err-Message:resource name may not be empty
W0528 15:45:42.785280       1 machine.go:658] Drain failed for machine "shoot--foo--bar-cpu-worker-7cdf986ff9-62mcz".
Buf:
ErrBuf:
Err-Message:resource name may not be empty
W0528 15:45:42.836888       1 machine.go:658] Drain failed for machine "shoot--foo--bar-cpu-worker-7cdf986ff9-k46s5".
Buf:
ErrBuf:
Err-Message:resource name may not be empty
I0528 15:45:42.883235       1 machine.go:551] Deleting Machine "shoot--foo--bar-cpu-worker-7cdf986ff9-pzcgs"
E0528 15:45:42.883357       1 drain.go:193] Error getting details for node: "". Err: resource name may not be empty
I0528 15:45:42.883371       1 drain.go:175] Machine drain ended on 2020-05-28 15:45:42.883368449 +0000 UTC m=+47.035130779 and took 90.578µs for ""

What you expected to happen:
MCM should not try to drain these nodes.

How to reproduce it (as minimally and precisely as possible):
Create machine objects that won't join the cluster and then try to delete them.

Environment:
MCM v0.29.0

The text was updated successfully, but these errors were encountered:

rfranzke · 2020-05-28T16:00:58Z

/cc @tim-ebert

ghost · 2020-05-28T16:01:02Z

@tim-ebert

Message

/cc @tim-ebert

vlerenc · 2020-05-29T15:19:35Z

/bark @rfranzke
The /cc command is like /ping and the others. Maybe you want simply no reaction to /cc? Or only no sweets? ;-)

ghost · 2020-05-29T15:19:43Z

@rfranzke

Message

/bark @rfranzke
The /cc command is like /ping and the others. Maybe you want simply no reaction to /cc? Or only no sweets? ;-)

vpnachev · 2020-06-25T08:18:07Z

@gardener/mcm-maintainers any updates on this issue?

I have an azure VM that failed to join and now the deletion is stuck, however the only resource in azure portal that I see is the network interface - the VM itself has been deleted.

PS. I am using the latest MCM version v0.31.0

prashanth26 · 2020-06-26T07:39:27Z

Hi Guys,

We haven't made any progress on this issue currently. I think we need to pick this issue on prio as this is affecting multiple clusters.

Although these machines get deleted after the drain timeout, still is not a good idea to keep such machines lying around.

/priority critical

cc: @hardikdr @amshuman-kr

rfranzke added the kind/bug Bug label May 28, 2020

gardener-robot added the priority/critical Needs to be resolved soon, because it impacts users negatively label Jun 26, 2020

prashanth26 mentioned this issue Jun 26, 2020

Bugfix: Drain machines with only a valid NodeName #480

Merged

hardikdr closed this as completed in #480 Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCM should not try to drain machines that have not joined the cluster #465

MCM should not try to drain machines that have not joined the cluster #465

rfranzke commented May 28, 2020

rfranzke commented May 28, 2020

ghost commented May 28, 2020

vlerenc commented May 29, 2020

ghost commented May 29, 2020

vpnachev commented Jun 25, 2020 •

edited

Loading

prashanth26 commented Jun 26, 2020 •

edited

Loading

MCM should not try to drain machines that have not joined the cluster #465

MCM should not try to drain machines that have not joined the cluster #465

Comments

rfranzke commented May 28, 2020

rfranzke commented May 28, 2020

ghost commented May 28, 2020

vlerenc commented May 29, 2020

ghost commented May 29, 2020

vpnachev commented Jun 25, 2020 • edited Loading

prashanth26 commented Jun 26, 2020 • edited Loading

vpnachev commented Jun 25, 2020 •

edited

Loading

prashanth26 commented Jun 26, 2020 •

edited

Loading