Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag master machines for adoption by machine controller #479

Merged
merged 1 commit into from
Oct 19, 2018

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Oct 17, 2018

Add a clusterid tag to masters so they can be adopted by the machine controller in the future.

@csrwng
Copy link
Contributor Author

csrwng commented Oct 17, 2018

@abhinavdahiya ptal

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 17, 2018
@abhinavdahiya
Copy link
Contributor

My idea of adoption was that we create Machine objects for each master in installer; and the ec2 instances created by terraform are identical to what controller would have done in response to the Machine object, therefore creating adoption.

@wking
Copy link
Member

wking commented Oct 17, 2018

Are there docs for the clusterid tag? It sounds a lot like our existing kubernetes.io/cluster/${var.cluster_name} (which itself is poorly documented, but see here). Also, this fix looks AWS-specific. Should whatever tag also go into the libvirt and OpenStack setupts?

@csrwng
Copy link
Contributor Author

csrwng commented Oct 17, 2018

My idea of adoption was that we create Machine objects for each master in installer

Yes, that's coming :) Just wanted to get this small change in first. The one blocking issue I have with creating the master Machine assets on install is that I don't have a good way to assign AZ's to each. My understanding is that Terraform resolves those once it is running. However, we will need to know AZ's anyway in order to support Multi-AZ machinesets for compute.

Are there docs for the clusterid tag?

Not that I know of, but it's used by the AWS actuator to determine if a machine already exists:
https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/cloud/aws/actuators/machine/utils.go#L114-L132
^^ @enxebre do you know if there's official doc for it?

Also, this fix looks AWS-specific. Should whatever tag also go into the libvirt and OpenStack setupts?

It's only needed for the AWS actuator to match an existing instance with a machine object. For libvirt, the machine name is sufficient (https://github.com/openshift/cluster-api-provider-libvirt/blob/master/cloud/libvirt/actuators/machine/utils/domain.go#L617-L633). For OpenStack, the name, image + flavor are used (https://github.com/openshift/cluster-api-provider-openstack/blob/master/cloud/openstack/machineactuator.go#L434-L452)

@abhinavdahiya
Copy link
Contributor

Yes, that's coming :)

👍

@abhinavdahiya
Copy link
Contributor

e2e-aws failure:

Error: Error applying plan:

6 error(s) occurred:

* module.vpc.aws_eip.nat_eip[1]: 1 error(s) occurred:

* aws_eip.nat_eip.1: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
	status code: 400, request id: 68d2f8d1-521f-45b9-8a01-1f5814bd79cf
* module.vpc.aws_eip.nat_eip[5]: 1 error(s) occurred:

* aws_eip.nat_eip.5: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
	status code: 400, request id: d32497d3-b5fb-42cc-8b15-711417c27cb4
* module.vpc.aws_eip.nat_eip[0]: 1 error(s) occurred:

* aws_eip.nat_eip.0: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
	status code: 400, request id: 94ece091-7cd6-49a5-9377-fa5a4bf7c19e
* module.vpc.aws_eip.nat_eip[3]: 1 error(s) occurred:

* aws_eip.nat_eip.3: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
	status code: 400, request id: 88ed257a-abfc-4420-b40a-9b1863e73a92
* module.vpc.aws_eip.nat_eip[2]: 1 error(s) occurred:

* aws_eip.nat_eip.2: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
	status code: 400, request id: b31bc1ab-2680-4430-b3ad-5e3a58980468
* module.vpc.aws_eip.nat_eip[4]: 1 error(s) occurred:

* aws_eip.nat_eip.4: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
	status code: 400, request id: 1e76b769-fd09-4883-be84-d13e149f51a1

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure

Hopefully CI cleared up fingers crossed
/retest

@enxebre
Copy link
Member

enxebre commented Oct 18, 2018

@csrwng @abhinavdahiya I don't think there's docs on that tag yet, will fix that.

I don't have a good way to assign AZ's to each. My understanding is that Terraform resolves those once it is running. However, we will need to know AZ's anyway in order to support Multi-AZ machinesets for compute.

My understanding is that tf assign each master a subnet https://github.com/openshift/installer/blob/master/data/data/aws/master/main.tf#L83
which happens to be in different az https://github.com/openshift/installer/blob/master/data/data/aws/vpc/vpc-public.tf#L36-L45
So then we could just adopt machines by name

@csrwng
Copy link
Contributor Author

csrwng commented Oct 18, 2018

/retest

@abhinavdahiya
Copy link
Contributor

/test all

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2018
@openshift-merge-robot openshift-merge-robot merged commit 61bc533 into openshift:master Oct 19, 2018
@wking
Copy link
Member

wking commented Dec 4, 2018

It sounds a lot like our existing kubernetes.io/cluster/${var.cluster_name} (which itself is poorly documented, but see here).

Closing the loop on this, here is @csrwng's link from this comment pinned to a particular commit (vs. floating with master). And the difference is that kubernetes.io/cluster/{cluster-name} is per-AWS-provider via the cloud config, while sigs.k8s.io/cluster-api-cluster is per-machine via the sigs.k8s.io/cluster-api-cluster label. We started setting those labels in #468. And the clusterid tag is only used for instances, while the in-Kubernetes AWS provider only uses kubernetes.io/cluster/{cluster-name} for ELBs and route tables (kubernetes/kubernetes#41695).

The cluster-API AWS provider has also been setting kubernetes.io/cluster/ to the per-machine sigs.k8s.io/cluster-api-cluster since openshift/cluster-api-provider-aws@72b482323c (openshift/cluster-api-provider-aws#2), although I'm not clear on why it's setting both that and clusterid.

And we are setting the same sigs.k8s.io/cluster-api-cluster in each of our machine configs. That makes both approaches effectively per-cluster, and I don't see the need for two separate tags for the same purpose. I've filed openshift/cluster-api-provider-aws#119 to ask about dropping clusterid in favor of kubernetes.io/cluster/{id}. If we end up going with that, we'd be able to revert this PR.

wking added a commit to wking/openshift-installer that referenced this pull request Jan 11, 2019
Centralize extra-tag inclusion on aws/main.tf.  This reduces the
number of places we need to think about what tags should be ;).

Also keep kubernetes.io/cluster/{name} localized in the aws module.
See 8a37f72 (modules/aws/bootstrap: Pull AWS bootstrap setup into a
module, 2018-09-05, openshift#217) for why we need to keep it on the bootstrap
instance.  But the bootstrap resources will be removed after the
bootstrap-complete event comes through, and we don't want Kubernetes
controllers trying to pick them up.

This commit updates the internal Route 53 zone from KubernetesCluster
to kubernetes.io/cluster/{name}: owned, catching it up to
kubernetes/kubernetes@0b5ae539 (AWS: Support shared tag, 2017-02-18,
kubernetes/kubernetes#41695).  That tag originally landed on the zone
back in 75fb49a (platforms/aws: apply tags to internal route53 zone,
2017-05-02, coreos/tectonic-installer#465).

Only the master instances need the clusterid tag, as described in
6c7a5f0 (Tag master machines for adoption by machine controller,
2018-10-17, openshift#479).

A number of VPC resources have moved from "shared" to "owned".  The
shared values are from 45dfc2b (modules/aws,azure: use the new tag
format for k8s 1.6, 2017-05-04, coreos/tectonic-installer#469).  The
commit message doesn't have much to say for motivation, but Brad Ison
said [1]:

  I'm not really sure if anything in Kubernetes actually uses the
  owned vs. shared values at the moment, but in any case, it might
  make more sense to mark subnets as shared.  That was actually one of
  the main use cases for moving to this style of tagging -- being able
  to share subnets between clusters.

But we aren't sharing these resources; see 6f55e67 (terraform/aws:
remove option to use an existing vpc in aws, 2018-11-11, openshift#654).

[1]: coreos/tectonic-installer#469 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Jan 11, 2019
Centralize extra-tag inclusion on aws/main.tf.  This reduces the
number of places we need to think about what tags should be ;).

Also keep kubernetes.io/cluster/{name} localized in the aws module.
See 8a37f72 (modules/aws/bootstrap: Pull AWS bootstrap setup into a
module, 2018-09-05, openshift#217) for why we need to keep it on the bootstrap
instance.  But the bootstrap resources will be removed after the
bootstrap-complete event comes through, and we don't want Kubernetes
controllers trying to pick them up.

This commit updates the internal Route 53 zone from KubernetesCluster
to kubernetes.io/cluster/{name}: owned, catching it up to
kubernetes/kubernetes@0b5ae539 (AWS: Support shared tag, 2017-02-18,
kubernetes/kubernetes#41695).  That tag originally landed on the zone
back in 75fb49a (platforms/aws: apply tags to internal route53 zone,
2017-05-02, coreos/tectonic-installer#465).

Only the master instances need the clusterid tag, as described in
6c7a5f0 (Tag master machines for adoption by machine controller,
2018-10-17, openshift#479).

A number of VPC resources have moved from "shared" to "owned".  The
shared values are from 45dfc2b (modules/aws,azure: use the new tag
format for k8s 1.6, 2017-05-04, coreos/tectonic-installer#469).  The
commit message doesn't have much to say for motivation, but Brad Ison
said [1]:

  I'm not really sure if anything in Kubernetes actually uses the
  owned vs. shared values at the moment, but in any case, it might
  make more sense to mark subnets as shared.  That was actually one of
  the main use cases for moving to this style of tagging -- being able
  to share subnets between clusters.

But we aren't sharing these resources; see 6f55e67 (terraform/aws:
remove option to use an existing vpc in aws, 2018-11-11, openshift#654).

[1]: coreos/tectonic-installer#469 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Jan 11, 2019
Centralize extra-tag inclusion on aws/main.tf.  This reduces the
number of places we need to think about what tags should be ;).

Also keep kubernetes.io/cluster/{name} localized in the aws module.
See 8a37f72 (modules/aws/bootstrap: Pull AWS bootstrap setup into a
module, 2018-09-05, openshift#217) for why we need to keep it on the bootstrap
instance.  But the bootstrap resources will be removed after the
bootstrap-complete event comes through, and we don't want Kubernetes
controllers trying to pick them up.

This commit updates the internal Route 53 zone from KubernetesCluster
to kubernetes.io/cluster/{name}: owned, catching it up to
kubernetes/kubernetes@0b5ae539 (AWS: Support shared tag, 2017-02-18,
kubernetes/kubernetes#41695).  That tag originally landed on the zone
back in 75fb49a (platforms/aws: apply tags to internal route53 zone,
2017-05-02, coreos/tectonic-installer#465).

Only the master instances need the clusterid tag, as described in
6c7a5f0 (Tag master machines for adoption by machine controller,
2018-10-17, openshift#479).

A number of VPC resources have moved from "shared" to "owned".  The
shared values are from 45dfc2b (modules/aws,azure: use the new tag
format for k8s 1.6, 2017-05-04, coreos/tectonic-installer#469).  The
commit message doesn't have much to say for motivation, but Brad Ison
said [1]:

  I'm not really sure if anything in Kubernetes actually uses the
  owned vs. shared values at the moment, but in any case, it might
  make more sense to mark subnets as shared.  That was actually one of
  the main use cases for moving to this style of tagging -- being able
  to share subnets between clusters.

But we aren't sharing these resources; see 6f55e67 (terraform/aws:
remove option to use an existing vpc in aws, 2018-11-11, openshift#654).

[1]: coreos/tectonic-installer#469 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants