Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/asset/cluster: Pull metadata into the Cluster asset #454

Merged

Conversation

wking
Copy link
Member

@wking wking commented Oct 11, 2018

Terraform can die part-way through launching a cluster for many reasons, and we want it to be easy for users to clean up. destroy-cluster gets its teardown information from metadata.json, but before this commit we were only writing the file after a successful Terraform run. That left users with failed Terraform runs scrambling to delete their cluster on their own (e.g. with virsh-cleanup.sh or other external tools).

One solution to this problem would be to move the Metadata asset before the Cluster asset in targetAssets. But @abhinavdahiya wants:

  • To provide metadata about the cluster (e.g. bootstrap and master IPs) that is only available after Terraform wraps up.

  • To minimize the number of assets, so no pre-terraform-metadata.json and post-terraform-metadata.json as separate assets.

This commit squashes the metadata file into the Cluster asset, so we can fill it in as we go.

I've also updated openshift-install to write any files from failed assets before exiting, so we can land the metadata and recovered Terraform state where the user can find them before we die. To avoid nil-dereference panics after this change, I've also updated a number of Files() implementations to avoid returning a [<nil>] slice.

The SilenceError and SilenceUsage additions work around spf13/cobra#340.

I've also add -input=false to our Terraform invocations, details on that in the 2b5aee7 commit message. I can spin that off into a separate PR if it makes review easier.

/assign @abhinavdahiya

Terraform can die part-way through launching a cluster for many
reasons, and we want it to be easy for users to clean up.
destroy-cluster gets its teardown information from metadata.json, but
before this commit we were only writing the file after a successful
Terraform run.  That left users with failed Terraform runs scrambling
to delete their cluster on their own (e.g. with virsh-cleanup.sh or
other external tools).

One solution to this problem would be to move the Metadata asset
before the Cluster asset in targetAssets.  But Abhinav wants:

* To provide metadata about the cluster (e.g. bootstrap and master
  IPs) that is only available after Terraform wraps up.

* To minimize the number of assets, so no pre-terraform-metadata.json
  and post-terraform-metadata.json as separate assets.

This commit squashes the metadata file into the Cluster asset, so we
can fill it in as we go.

I've also updated openshift-install to write any files from failed
assets before exiting, so we can land the metadata and recovered
Terraform state where the user can find them before we die.  To avoid
nil-dereference panics after this change, I've also updated a number
of Files() implementations to avoid returning a [<nil>] slice.

The SilenceError and SilenceUsage additions work around [1].

[1]: spf13/cobra#340
Avoid things like:

  $ openshift-install cluster
  ...
  var.tectonic_base_domain
    The base DNS domain of the cluster. It must NOT contain a trailing period. Some
    DNS providers will automatically add this if necessary.

    Example: `openshift.example.com`.

    Note: This field MUST be set manually prior to creating the cluster.
    This applies only to cloud platforms.

    Enter a value:
  ...

With this commit, we instead get:

  $ openshift-install cluster
  INFO Fetching OS image...
  INFO Using Terraform to create cluster...
  ERROR Failed to read tfstate: open /tmp/openshift-install-716419668/terraform.tfstate: no such file or directory
  ERROR Error: Required variable not set: tectonic_base_domain
  FATAL Error executing openshift-intall: exit status 1

with an introduced mismatch:

  --- a/pkg/tfvars/tfvars.go
  +++ b/pkg/tfvars/tfvars.go
  @@ -17,7 +17,7 @@ import (
   type config struct {
   			ClusterID  string `json:"tectonic_cluster_id,omitempty"`
   			Name       string `json:"tectonic_cluster_name,omitempty"`
  -			BaseDomain string `json:"tectonic_base_domain,omitempty"`
  +			BaseDomain string `json:"xtectonic_base_domain,omitempty"`
   			Masters    int    `json:"tectonic_master_count,omitempty"`
   			Workers    int    `json:"tectonic_worker_count,omitempty"`

Ideally we won't have any mismatches, but it's good to handle them
cleanly when we do instead of allowing the caller to work around them.
Generated with:

  $ openshift-install graph | dot -Tsvg >docs/design/resource_dep.svg

using:

  $ dot -V
  dot - graphviz version 2.30.1 (20170916.1124)
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 11, 2018
@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 11, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 6d9337d into openshift:master Oct 11, 2018
@wking wking deleted the move-metadata-into-cluster-asset branch October 11, 2018 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants