Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kops-controller can't determine AWS region #9856

Closed
w3irdrobot opened this issue Sep 1, 2020 · 7 comments · Fixed by #9857 or #9575
Closed

kops-controller can't determine AWS region #9856

w3irdrobot opened this issue Sep 1, 2020 · 7 comments · Fixed by #9857 or #9575

Comments

@w3irdrobot
Copy link

w3irdrobot commented Sep 1, 2020

1. What kops version are you running? The command kops version, will display
this information.

➜ kops version
Version 1.18.0 (git-698bf974d8)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

➜ kubectl version --short
Client Version: v1.19.0
Server Version: v1.18.8

3. What cloud provider are you using?

AWS GovCloud

4. What commands did you run? What is the simplest way to reproduce this issue?

This was done as part of an upgrade from Kubernetes v1.17.10 created using kops v1.17.1 to Kubernetes v1.18.8 using kops v1.18.0. We upgraded the cluster and removed the pinned Docker version (originally added to fix Docker issues on Amazon Linux). So it went something like the following:

kops upgrade cluster --yes
kops edit cluster # removed the docker stuff
kops update cluster --yes
kops rolling-update cluster --yes

5. What happened after the commands executed?

Everything came up again smoothly except the role labels weren't being added to the worker nodes. Looking into the kops-controller, we were getting errors like this:

I0901 17:32:15.966765       1 s3context.go:325] unable to read /sys/devices/virtual/dmi/id/product_uuid, assuming not running on EC2: open /sys/devices/virtual/dmi/id/product_uuid: permission denied
I0901 17:32:15.967033       1 s3context.go:170] defaulting region to "us-east-1"
I0901 17:32:16.311839       1 s3context.go:191] unable to get bucket location from region "us-east-1"; scanning all regions: InvalidToken: The provided token is malformed or otherwise invalid.
     status code: 400, request id: 4Z4P1MCS0S6K8P0R, host id: HOST-ID
 E0901 17:32:16.647080       1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="unable to load cluster object for node ip-172-20-40-221.us-gov-west-1.compute.internal: error loading Cluster \"s3://cipher
 morph-com-k8s-local-kops/ciphermorph.com.k8s.local/cluster.spec\": Unable to list AWS regions: AuthFailure: AWS was not able to validate the provided access credentials\n\tstatus code: 401, request id: 870266f7-a2ac-417e-9fb3-e7496e5ba6
 29"  "controller"="node" "request"={"Namespace":"","Name":"ip-172-20-40-221.us-gov-west-1.compute.internal"}

6. What did you expect to happen?

I expected the kops-controller to be able to determine I'm running in by reading that file and not error out. Therefore, the kops-controller would be able to proceed and add the labels to the node.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2020-04-11T19:12:40Z"
  generation: 7
  name: our.cluster.k8s.local
spec:
  additionalPolicies:
    node: |
      [
        {"Effect":"Allow","Action":["autoscaling:DescribeAutoScalingGroups","autoscaling:DescribeAutoScalingInstances","autoscaling:DescribeLaunchConfigurations","autoscaling:DescribeTags","autoscaling:SetDesiredCapacity","autoscaling:TerminateInstanceInAutoScalingGroup"],"Resource":"*"}
      ]
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://our-bucket/our.cluster.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-us-gov-west-1a
      name: a
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-us-gov-west-1a
      name: a
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.18.8
  masterInternalName: api.internal.our.cluster.k8s.local
  masterPublicName: api.our.cluster.k8s.local
  networkCIDR: 172.172.0.0/16
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 127.0.0.1/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  - cidr: 172.172.172.0/19
    name: us-gov-west-1a
    type: Public
    zone: us-gov-west-1a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

This is running in AWS GovCloud region us-gov-west-1.

Digging into the code, I found the place where this check it happening. By setting the AWS_REGION environment variable in the Daemonset, I was able to get things working. However, this won't be a long-term solution since I assume this will get overwritten when we do another upgrade.

I looked at the file /sys/devices/virtual/dmi/id/product_uuid directly on the node since it is the file the controller can't open and saw that it is owned by root. So Initially I was thinking maybe Amazon Linux makes root own it but it isn't on other distros. This is incorrect. I spun up an Ubuntu instance, and it too had that file owned by root.

I also tried updating to the most recent AMI of Amazon Linux, but that didn't fix anything either.

@hakman
Copy link
Member

hakman commented Sep 2, 2020

@SearsAW I think I have a fix for this. In case you want to try it, let me know.

@w3irdrobot
Copy link
Author

@hakman I'm down for trying.

@hakman
Copy link
Member

hakman commented Sep 2, 2020

Would it be ok to create a new cluster based on Kops 1.19 like the one below?

$ wget https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws/pull-095d7b1c0f/linux/amd64/kops
$ export KOPS_BASE_URL=https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws/pull-095d7b1c0f
$ kops create cluster ...

@johngmyers
Copy link
Member

#9575 also fixes this.

@w3irdrobot
Copy link
Author

Would it be ok to create a new cluster based on Kops 1.19 like the one below?

$ wget https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws/pull-095d7b1c0f/linux/amd64/kops
$ export KOPS_BASE_URL=https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws/pull-095d7b1c0f
$ kops create cluster ...

I will give this a shot tonight after work.

@w3irdrobot
Copy link
Author

@hakman I get the following error when creating/updating the cluster.

➜ kops update cluster
W0902 18:51:13.320481   30222 urls.go:91] Using base url from KOPS_BASE_URL env var: "https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws/pull-095d7b1c0f"
I0902 18:51:13.466500   30222 context.go:272] hit maximum retries 1 with error file does not exist
I0902 18:51:13.531857   30222 context.go:272] hit maximum retries 1 with error file does not exist
I0902 18:51:14.176505   30222 context.go:272] hit maximum retries 2 with error file does not exist
I0902 18:51:14.823044   30222 context.go:272] hit maximum retries 2 with error file does not exist
I0902 18:51:16.542754   30222 context.go:272] hit maximum retries 3 with error file does not exist
I0902 18:51:18.230344   30222 context.go:272] hit maximum retries 3 with error file does not exist

cannot determine hash for "https://storage.googleapis.com/kops-ci/pulls/pull-kops-e2e-kubernetes-aws/pull-095d7b1c0f/images/protokube.tar.gz" (have you specified a valid file location?)

@hakman
Copy link
Member

hakman commented Sep 3, 2020

@SearsAW you seem to be using the kops binary from your $PATH. Maybe try using the downloaded one ./kops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants