Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ProviderID as a fallback for fetching the VM, InitializeMachine returns Uninitialized error code if VM is not found. #173

Merged
merged 7 commits into from
Sep 13, 2024

Conversation

thiyyakat
Copy link
Contributor

What this PR does / why we need it:

This PR introduces a fallback for fetching the VM instances associated with a machine.

It also changes InitializeMachine to return error code Uninitialized, instead of NotFound when instances cannot be found for the machine.

Which issue(s) this PR fixes:
Fixes part of gardener/machine-controller-manager#933

Special notes for your reviewer:

The changes were manually tested by doing the following:

  1. Returning an error from getInstancesByTagsOrInstanceID(), such that instanceID is fetched by the fallback getInstanceByID(). Found that Machine was created successfully and went into Running state.
  2. returning codes.NotFound from getMatchingInstancesForMachine(). On triggering machine creation, initialization was tried in a loop after shortRetry.

Release note:

Use `ProviderID` as a fallback for fetching the VM.
`InitializeMachine` returns `Uninitialized` error code if VM is not found.

…d if not found.

Use GetInstanceByID as fallback if getInstancesByTagsOrInstanceID does not succeed.

Change error value returned if VM not found to codes.Uninitialized.
@thiyyakat thiyyakat requested review from a team as code owners September 5, 2024 10:00
@gardener-robot gardener-robot added the needs/review Needs review label Sep 5, 2024
@gardener-robot
Copy link

@thiyyakat Thank you for your contribution.

@gardener-robot gardener-robot added the size/s Size of pull request is small (see gardener-robot robot/bots/size.py) label Sep 5, 2024
@gardener-robot-ci-2
Copy link
Contributor

Thank you @thiyyakat for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below.

@rishabh-11 rishabh-11 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 5, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 5, 2024
@gardener-robot gardener-robot added size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) and removed size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Sep 6, 2024
@rishabh-11 rishabh-11 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 9, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 9, 2024
pkg/aws/core.go Outdated
@@ -259,18 +259,20 @@ func (d *Driver) InitializeMachine(_ context.Context, request *driver.Initialize
if err != nil {
return nil, err
}
instances, err := d.getInstancesFromMachineName(request.Machine.Name, providerSpec, request.Secret)
instances, err := d.getMatchingInstancesForMachine(request.Machine, providerSpec, request.Secret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just pass providerSpec.Tags to getMachineInstancesByTagsAndStatus function? Inside the function you can extract cluster name and node role.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was actually no reason to pass the entire providerSpec. Instead only pass the service. At present in InitializeMachine you are constructing it twice which is not really required.

}
if len(instances) == 0 {
errMessage := "AWS plugin is returning no VM instances backing this machine object"
return nil, status.Error(codes.NotFound, errMessage)
//if getMachineInstancesByTagsAndStatus returns an error, try fetching matching instances using ProviderID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is misleading. If there was an error returned by getMachineInstancesByTagsAndStatus then you will not even come here but return much earlier. So change this comment.

pkg/aws/core_util.go Show resolved Hide resolved
// getMatchingInstancesForMachine extracts AWS Instance object for a given machine
func (d *Driver) getMatchingInstancesForMachine(machine *v1alpha1.Machine, providerSpec *api.AWSProviderSpec, secret *corev1.Secret) ([]*ec2.Instance, error) {
var (
clusterName string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering that you will move the extraction of clusterName and nodeRole elsewhere there is no reason to define these variables and you can just use short-assign.

for _, reservation := range runResult.Reservations {
instances = append(instances, reservation.Instances...)
// getMatchingInstancesForMachine extracts AWS Instance object for a given machine
func (d *Driver) getMatchingInstancesForMachine(machine *v1alpha1.Machine, providerSpec *api.AWSProviderSpec, secret *corev1.Secret) ([]*ec2.Instance, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to prevent shadowing you can just define names for the return types so the function definition will become:

func (d *Driver) getMatchingInstancesForMachine(machine *v1alpha1.Machine, providerSpec *api.AWSProviderSpec, secret *corev1.Secret) (instances []*ec2.Instance, err error) {
}

pkg/aws/core_util.go Show resolved Hide resolved
pkg/aws/core_util.go Show resolved Hide resolved
pkg/aws/core.go Outdated
@@ -368,7 +369,7 @@ func (d *Driver) DeleteMachine(_ context.Context, req *driver.DeleteMachineReque

} else {
// ProviderID doesn't exist, hence check for any existing machine and then delete if exists
instances, err = d.getInstancesFromMachineName(req.Machine.Name, providerSpec, req.Secret)
instances, err = d.getMatchingInstancesForMachine(req.Machine, providerSpec, req.Secret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the If condition you have already checked that the providerID is empty. Therefore just calling getMachineInstancesByTagsAndStatus should be sufficient, right? This also prevents recreating the service which you have already created once.

pkg/aws/core.go Outdated Show resolved Hide resolved
pkg/aws/util.go Show resolved Hide resolved
@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Sep 9, 2024
@rishabh-11 rishabh-11 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 9, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 9, 2024
Copy link
Contributor

@unmarshall unmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/changes Needs (more) changes needs/review Needs review labels Sep 11, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 11, 2024
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 11, 2024
@rishabh-11 rishabh-11 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 11, 2024
@rishabh-11 rishabh-11 merged commit 1337c46 into gardener:master Sep 13, 2024
7 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants