Skip to content
This repository has been archived by the owner on May 13, 2024. It is now read-only.

Fix Anubis Setup Destroy #768

Merged
merged 6 commits into from
Aug 23, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,8 @@ bff/bin/anubis --results <ACTION_ID>
./anubis-setup --region us-east-1 --prefix-list-id pl-xxxxxxxx --destroy
```

*Note - There is a bug in terraform where security group don't get revoked before security group deletion causing timeout https://github.com/hashicorp/terraform/issues/8617*

## Great, what's next?

Write your own benchmarks!
Expand Down
4 changes: 4 additions & 0 deletions baictl/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ destroy-infra: check-parameter-region venv _destroy-infra
_destroy-infra:
./baictl destroy infra --aws-region=$(AWS_REGION)

sync-infra: check-parameter-region venv _sync-infra
_sync-infra:
./baictl sync infra --aws-region=$(AWS_REGION) --mode=pull

publish:
echo "Nothing to publish"

Expand Down
79 changes: 57 additions & 22 deletions ci/anubis-driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,12 +120,12 @@ def add_args(cls, parser):
def s3_remote_state_bucket(config, region, session):
# Ensure bucket exists for remote state
sts = session.client("sts")
if os.path.exists(".terraform/ci-backend-config"):
if os.path.exists(os.path.join(os.path.dirname(__file__), ".terraform/ci-backend-config")):
Chancebair marked this conversation as resolved.
Show resolved Hide resolved
ci_backend_config = open(".terraform/ci-backend-config", "r").read()
if sts.get_caller_identity()["Account"] not in ci_backend_config:
os.remove(".terraform/ci-backend-config")
config["bucket"] = None
if os.path.exists(".terraform/terraform.tfstate"):
if os.path.exists(os.path.join(os.path.dirname(__file__), ".terraform/terraform.tfstate")):
Chancebair marked this conversation as resolved.
Show resolved Hide resolved
os.remove(".terraform/terraform.tfstate")

if config["bucket"] is None:
Expand Down Expand Up @@ -179,6 +179,35 @@ def add_current_user_arn(config, session):
config["extra_users"] = ",".join(extra_users_config)


def sync_baictl(session, region):
os.environ["AWS_REGION"] = region

print(f"=> Calling `./baictl sync infra --aws-region={region}` in baictl to get kubeconfig")

if os.path.exists(os.path.join(os.path.dirname(__file__), "../baictl/drivers/aws/cluster/.terraform")):
return_code = subprocess.call(["rm", "-rf", "drivers/aws/cluster/.terraform"], cwd="../baictl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.join(os.path.dirname(file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have relative references based on the users CWD in your code.

if return_code != 0:
raise Exception(f"Failure calling `rm -rf drivers/aws/cluster/.terraform` in baictl: {return_code}")

return_code = subprocess.call(["make", "sync-infra"], cwd="../baictl")
if return_code != 0:
raise Exception(f"Failure calling `make sync-infra` in baictl: {return_code}")


def undeploy_services():
parent_dir = os.path.join(os.path.dirname(__file__), "..")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about guys, who are not deployable?

puller/kafka-utils/metrics-pusher?

Do they have stubs, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently its broken on #688 so I can't get any further to test

service_dirs = [
f
for f in os.listdir(parent_dir)
if os.path.isdir(os.path.join(parent_dir, f)) and os.path.exists(os.path.join(parent_dir, f) + "/Makefile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't repeat expressions, plz.
Extract variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This references the for loop iterator, so can't be broken out

]

for folder in service_dirs:
return_code = subprocess.call(["make", "undeploy"], cwd=f"../{folder}")
if return_code != 0:
raise Exception(f"Failure calling `make undeploy` in {folder}: {return_code}")


def main():
print(Figlet().renderText("anubis setup"))
parser = argparse.ArgumentParser()
Expand Down Expand Up @@ -212,7 +241,9 @@ def main():

# Destroy pipeline and infrastructure
if args.destroy:
sync_baictl(session, region)
destroy_pipeline(region)
Chancebair marked this conversation as resolved.
Show resolved Hide resolved
undeploy_services()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right order of execution? I'd expect undeploy services to happen first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline and the services are mutually exclusive, it doesn't matter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

destroy_infrastructure(region)
return

Expand All @@ -228,7 +259,6 @@ def main():
# Create pipeline which creates infrastructure and deploys services
print(f"=> Calling `terraform plan --out=terraform.plan`")
return_code = subprocess.call(["terraform", "plan", "--out=terraform.plan"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR.
Please ensure, that terraform python API doesn't satisfy our needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the terraform-python package and it was garbage


if return_code != 0:
raise Exception(f"Failure calling `terraform plan`: {return_code}")
print("=> Calling `terraform apply`")
Expand All @@ -252,33 +282,38 @@ def rm(path):


def destroy_pipeline(region):
# HACK: Rules don't get revoked causing timeout on security group destroy
group_id = subprocess.check_output(["terraform", "output", "blackbox_vpc_default_group_id"]).strip()
source_group = subprocess.check_output(["terraform", "output", "blackbox_public_group_id"]).strip()
return_code = subprocess.call(
[
"aws",
"ec2",
"revoke-security-group-ingress",
"--region",
region,
"--group-id",
group_id,
"--source-group",
source_group,
"--protocol",
"all",
]
)
# HACK: Rules don't get revoked causing timeout on security group destroy https://github.com/hashicorp/terraform/issues/8617
devnull = open(os.devnull, "w")
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that this is the right thing to do. I rather feel like we're working around an issue here, so I'd like to see this elaborated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu
Your objection is reasonable and noted - Now, could you, please offer an alternative solution that can be implemented within a reasonable time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to understand which resource specifically has this security group assigned. So far, the situation is not very transparent. As soon as I have all the data, I'm happy to assist with a solution. But so far, I'm still at the information-gathering stage.

subprocess.call(["terraform", "output", "blackbox_vpc_default_group_id"], stderr=devnull) == 0
Chancebair marked this conversation as resolved.
Show resolved Hide resolved
and subprocess.call(["terraform", "output", "blackbox_public_group_id"], stderr=devnull) == 0
):
group_id = subprocess.check_output(["terraform", "output", "blackbox_vpc_default_group_id"]).strip()
source_group = subprocess.check_output(["terraform", "output", "blackbox_public_group_id"]).strip()
return_code = subprocess.call(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why this is necessary? We should not use the aws cli in any case because any resource that exists in this account, was created by some kind of stateful automation. The removal should then happen through the same automation, as otherwise we're corrupting the state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform isn't smart enough to disassociate dependent rules before deleting security groups, therefore it will just hang on "deleting security group" when the console prompts that there are dependent references of that security group elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who created and associated these security groups?

Copy link
Contributor Author

@Chancebair Chancebair Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terraform

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu I get where you are coming from... we should walk the same path forward and we do backward. This shows that an out-of-band actor is inserted in the reverse path. It turns out that this is somewhat avoidable as there is not this affordance directly in TF.
So this is unfortunately the way we need to go. @Chancebair Please document this caveat so that posterity is aware of this decision point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented the hack in the script

Copy link
Contributor

@marcoabreu marcoabreu Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which resource specifically has these groups assigned so that we run into that bug? Is it an instance, an ELB or something else? Also, it would be great if you could point me to the line where this "buggy" reference is being created in the first place.

[
"aws",
"ec2",
"revoke-security-group-ingress",
"--region",
region,
"--group-id",
group_id,
"--source-group",
source_group,
"--protocol",
"all",
]
)
print("=> Calling `terraform destroy` to destroy pipeline")
return_code = subprocess.call(["terraform", "destroy", "-auto-approve"])
if return_code != 0:
raise Exception(f"Failure calling `terraform destroy`: {return_code}")


def destroy_infrastructure(region):
print("=> Calling `make destroy-infra` in baictl to destroy infrastructure")
os.environ["AWS_REGION"] = region
print("=> Calling `make destroy-infra` in baictl to destroy infrastructure")
return_code = subprocess.call(["make", "destroy-infra"], cwd="../baictl")
if return_code != 0:
raise Exception(f"Failure calling `make destroy` in baictl: {return_code}")
Expand Down