Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding in black hole subnets for proxy testing #10355

Merged

Conversation

ewolinetz
Copy link
Contributor

@ewolinetz ewolinetz commented Jul 20, 2020

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 20, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2020
@ewolinetz ewolinetz force-pushed the blackhole_proxy_multistage branch 6 times, most recently from 4c30c4b to 26c826a Compare July 20, 2020 20:13
@ewolinetz
Copy link
Contributor Author

/refresh

@ewolinetz ewolinetz force-pushed the blackhole_proxy_multistage branch 2 times, most recently from 142bc5f to f315eb3 Compare July 20, 2020 21:10
@ewolinetz
Copy link
Contributor Author

logs from the bootstrap node

Jul 20 23:34:58 ip-10-0-2-115 bootkube.sh[10306]: Skipped "secret-kube-apiserver-to-kubelet-signer.yaml" secrets.v1./kube-apiserver-to-kubelet-signer -n openshift-kube-apiserver-operator as it already exists
Jul 20 23:34:58 ip-10-0-2-115 bootkube.sh[10306]: Skipped "secret-loadbalancer-serving-signer.yaml" secrets.v1./loadbalancer-serving-signer -n openshift-kube-apiserver-operator as it already exists
Jul 20 23:34:59 ip-10-0-2-115 bootkube.sh[10306]: Skipped "secret-localhost-serving-signer.yaml" secrets.v1./localhost-serving-signer -n openshift-kube-apiserver-operator as it already exists
Jul 20 23:34:59 ip-10-0-2-115 bootkube.sh[10306]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists
Jul 20 23:35:16 ip-10-0-2-115 bootkube.sh[10306]: E0720 23:35:16.138238       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
Jul 20 23:35:16 ip-10-0-2-115 bootkube.sh[10306]: E0720 23:35:16.152156       1 reflector.go:251] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to watch *v1.Pod: Get https://localhost:6443/api/v1/pods?watch=true: dial tcp [::1]:6443: connect: connection refused
Jul 20 23:35:17 ip-10-0-2-115 bootkube.sh[10306]: E0720 23:35:17.152941       1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods: dial tcp [::1]:6443: connect: connection refused

from the install logs:

level=info msg="Waiting up to 30m0s for bootstrapping to complete..."
E0720 23:35:16.393103      47 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: Get "https://api.ci-op-dlxy481c-0ad99.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3763&timeoutSeconds=574&watch=true": dial tcp 54.176.14.91:6443: connect: connection refused
E0720 23:35:17.553806      47 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: Get "https://api.ci-op-dlxy481c-0ad99.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=3763&timeoutSeconds=407&watch=true": dial tcp 54.176.14.91:6443: connect: connection refused

@wking those nodes should have ingress access to them, right? is there a way we can check that on the ec2 console?

…updating proxy e2e to use this instead

Using openshift@1b21187 for reference

Populated by running:

for REGION in us-east-1 us-east-2 us-west-1 us-west-2
do
  COUNT=3
  if test us-west-1 = "${REGION}"
  then
    COUNT=2
  fi
  for INDEX in 1
  do
    NAME="do-not-delete-shared-vpc-blackhole-${INDEX}"
    aws --region "${REGION}" cloudformation create-stack --stack-name "${NAME}" --template-body "$(cat ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc.yaml)" --parameters "ParameterKey=AvailabilityZoneCount,ParameterValue=${COUNT}" >/dev/null
    aws --region "${REGION}" cloudformation wait stack-create-complete --stack-name "${NAME}"
    SUBNETS="$(aws --region "${REGION}" cloudformation describe-stacks --stack-name "${NAME}" | jq -c '[.Stacks[].Outputs[] | select(.OutputKey | endswith("SubnetIds")).OutputValue | split(",")[]]' | sed "s/\"/'/g")"
    echo "${REGION}_$((INDEX - 1))) subnets=\"${SUBNETS}\";;"
  done
done
@ewolinetz ewolinetz changed the title [WIP] adding in black hole subnets for proxy testing adding in black hole subnets for proxy testing Jul 21, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2020
@ewolinetz
Copy link
Contributor Author

currently known bz during install: https://bugzilla.redhat.com/show_bug.cgi?id=1859360

@wking
Copy link
Member

wking commented Jul 21, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ewolinetz, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 868e27e into openshift:master Jul 21, 2020
@openshift-ci-robot
Copy link
Contributor

@ewolinetz: Updated the following 2 configmaps:

  • step-registry configmap in namespace ci at cluster api.ci using the following files:
    • key OWNERS using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/OWNERS
    • key blackhole_vpc_yaml.md using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml.md
    • key ipi-conf-aws-blackholenetwork-chain.yaml using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/ipi-conf-aws-blackholenetwork-chain.yaml
    • key ipi-conf-aws-blackholenetwork-commands.sh using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/ipi-conf-aws-blackholenetwork-commands.sh
    • key ipi-conf-aws-blackholenetwork-ref.yaml using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/ipi-conf-aws-blackholenetwork-ref.yaml
    • key ipi-conf-aws-proxy-chain.yaml using file ci-operator/step-registry/ipi/conf/aws/proxy/ipi-conf-aws-proxy-chain.yaml
  • step-registry configmap in namespace ci at cluster app.ci using the following files:
    • key OWNERS using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/OWNERS
    • key blackhole_vpc_yaml.md using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml.md
    • key ipi-conf-aws-blackholenetwork-chain.yaml using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/ipi-conf-aws-blackholenetwork-chain.yaml
    • key ipi-conf-aws-blackholenetwork-commands.sh using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/ipi-conf-aws-blackholenetwork-commands.sh
    • key ipi-conf-aws-blackholenetwork-ref.yaml using file ci-operator/step-registry/ipi/conf/aws/blackholenetwork/ipi-conf-aws-blackholenetwork-ref.yaml
    • key ipi-conf-aws-proxy-chain.yaml using file ci-operator/step-registry/ipi/conf/aws/proxy/ipi-conf-aws-proxy-chain.yaml

In response to this:

This is an effort to close out #5308 and instead move it to the step registry

Addresses:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/openshift-release that referenced this pull request Sep 17, 2020
…_yaml: Add EC2 endpoint

The machine-API currently ignores the proxy configuration, although
future machine-API might grow support for it [1].  That means CI jobs
in the blackhole VPC die on i/o timeouts trying to reach
https://ec2.${region}.amazonaws.com/ while provisioning compute
machines, and the install subsequently dies because we fail to
schedule monitoring, ingress, and other compute-hosted workloads [2].
This commit adds a VPC endpoint to allow EC2 access from inside the
cluster [3].  It's similar to the existing S3 VPC endpoint, but:

* It's an interface type, while S3 needs the older gateway type.  This
  avoids:

    Endpoint type (Gateway) does not match available service types
    ([Interface]). (Service: AmazonEC2; Status Code: 400; Error Code:
    InvalidParameter; Request ID: ...; Proxy: null)

  while creating the stack.

* There are no RouteTableIds, because the interface type does not
  support them.  This avoids:

    Route table IDs are only supported for Gateway type VPC
    Endpoint. (Service: AmazonEC2; Status Code: 400; Error Code:
    InvalidParameter; Request ID: ...; Proxy: null)

  while creating the stack.

* I've created a new security group allowing HTTPS connections to the
  endpoint, because SecurityGroupIds is required for interface
  endpoints [3].  I've also placed the network interfaces in the
  public subnets, because SubnetIds is requried for interface
  endpoints [3].

* I've set PrivateDnsEnabled [3] so the machine-API operator doesn't
  have to do anything special to get DNS routing it towards the
  endpoint interfaces.

Rolled out to the CI account following 9b39dd2 (Creating private
subnets without direct external internet access and updating proxy e2e
to use this instead, 2020-07-20, openshift#10355):

  for REGION in us-east-1 us-east-2 us-west-1 us-west-2
  do
    COUNT=3
    if test us-west-1 = "${REGION}"
    then
      COUNT=2
    fi
    for INDEX in 1
    do
      NAME="do-not-delete-shared-vpc-blackhole-${INDEX}"
      aws --region "${REGION}" cloudformation update-stack --stack-name "${NAME}" --template-body "$(cat ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml.md)" --parameters "ParameterKey=AvailabilityZoneCount,ParameterValue=${COUNT}" >/dev/null
      aws --region "${REGION}" cloudformation wait stack-update-complete --stack-name "${NAME}"
      SUBNETS="$(aws --region "${REGION}" cloudformation describe-stacks --stack-name "${NAME}" | jq -c '[.Stacks[].Outputs[] | select(.OutputKey | endswith("SubnetIds")).OutputValue | split(",")[]]' | sed "s/\"/'/g")"
      echo "${REGION}_$((INDEX - 1))) subnets=\"${SUBNETS}\";;"
    done
  done

We could also have deleted the previous stacks, used 'create-stack'
instead of 'update-stack', and used 'stack-create-complete' instead of
'stack-update-complete'.

Unsurprisingly, since we were not updating the subnets themselves, the
output has not changed:

  us-east-1_0) subnets="['subnet-0a7491aa76f9b88d7','subnet-0f0b2dcccdcbc7c1d','subnet-0680badf68cbf198c','subnet-02b25dd65f806e41b','subnet-010235a3bff34cf6f','subnet-085c78d8c562b5a51']";;
  us-east-2_0) subnets="['subnet-0ea117d9499ef624f','subnet-00adc83d4719d4176','subnet-0b9399990fa424d7f','subnet-060d997b25f5bb922','subnet-015f4e65b0ef1b0e1','subnet-02296b47817923bfb']";;
  us-west-1_0) subnets="['subnet-0d003f08a541855a2','subnet-04007c47f50891b1d','subnet-02cdb70a3a4beb754','subnet-0d813eca318034290']";;
  us-west-2_0) subnets="['subnet-05d8f8ae35e720611','subnet-0f3f254b13d40e352','subnet-0e23da17ea081d614','subnet-0f380906f83c55df7','subnet-0a2c5167d94c1a5f8','subnet-01375df3b11699b77']";;

so no need to update ipi-conf-aws-blackholenetwork-commands.sh.

I generated the reaper keep-list following 1b21187 (ci-operator:
Fresh AWS shared subnets for us-east-2, etc., 2020-01-30, openshift#6949):

  for REGION in us-east-1 us-east-2 us-west-1 us-west-2
  do
    for INDEX in 1
    do
      NAME="do-not-delete-shared-vpc-blackhole-${INDEX}"
      aws --region "${REGION}" resourcegroupstaggingapi get-resources --tag-filters "Key=aws:cloudformation:stack-name,Values=${NAME}" --query 'ResourceTagMappingList[].ResourceARN[]' | jq -r ".[] | . + \"  # CI exclusion per DPP-5789, ${REGION} ${NAME}\""
    done
  done | sort

and passed that along to the Developer Productivity Platform (DPP)
folks so they can update their reaper config.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1769223
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1875773
[3]: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-vpcendpoint.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants