Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an AWS deployer user to fetch kubernetes credentials in our deployer #673

Merged
merged 7 commits into from
Sep 14, 2021

Conversation

damianavila
Copy link
Contributor

Summary:

The auth_aws function I recently added to the deployer is now authenticating kops-based clusters.
Farallon would be the first one.
You can see more details in the commit messages.
You may find interesting the usage of if and else keyword in the schema to properly validate the config file.

Btw, this one still needs to be more tested but I believe is complete so putting it out there to get early feedback.

Closes #381

We currently manage EKS and kops clusters on AWS land. With this commit
we are adding support and validation (by jsonschema) for the kops-based
cluster that requires the state store to properly get the state of the
cluster and succeed in the next step at the time to get export the
kubeconfig.

Notice I am using the if and then keywords [1] to require the stateStore
property only if the kops value is configured.

The encrypted key file referenced in the condfig file will be provided
in a later commit.

[1] https://json-schema.org/understanding-json-schema/reference/conditionals.html#if-then-else
We identified a 2i2c-engineers groups in Farallon AWS and we have
created a deployer user under that group. We used common awscli commands
to perform this task and retrieve the credentials. More details in [1].

Finally, we encrypted the file with sops accordingly with the current
established workflows to manage secret files.

[1] #381 (comment)
Previously, we have added an auth_aws function to retrieve the
kubeconfig and connect to EKS-based cluster. With this commit, we are
adding a new subprocess "brach" to conditionally `kops export` the
kubeconfig (if the kops option is configured) and get access to
kops-based clusters, such as the Farallon one.
I am still not sure how the action performs for real  but it is the only
one available in the marketplace and we should probably give it a try.
I have also added a commented openscapes line for the future ;-)
Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally awesome! Thanks @damianavila

.github/workflows/deploy-hubs.yaml Show resolved Hide resolved
.github/workflows/deploy-hubs.yaml Outdated Show resolved Hide resolved
config/hubs/schema.yaml Show resolved Hide resolved
@damianavila
Copy link
Contributor Author

damianavila commented Sep 13, 2021

Thanks for the feedback, @GeorgianaElena!

Btw, local testing on top of staging seems to be working as expected:

$ python3 deployer deploy farallon staging --skip-hub-health-test
kops has set your kubectl context to farallonhub.k8s.local
Running helm upgrade --install --create-namespace --wait --namespace staging staging hub-templates/daskhub -f /var/folders/mn/8h0hm_395l31nrtn235h29900000gn/T/tmpt92t1imr -f /var/folders/mn/8h0hm_395l31nrtn235h29900000gn/T/tmp903tmgl3
Release "staging" has been upgraded. Happy Helming!
NAME: staging
LAST DEPLOYED: Mon Sep 13 16:16:16 2021
NAMESPACE: staging
STATUS: deployed
REVISION: 13
TEST SUITE: None
(pilot-hubs) Damians-MacBook-Pro:pilot-hubs damian$ kubectl -n staging get pods
NAME                                               READY   STATUS    RESTARTS   AGE
api-staging-dask-gateway-f47fb549-p9vf4            1/1     Running   0          82d
autohttps-6b8c745849-vcq7r                         2/2     Running   0          82s
controller-staging-dask-gateway-54fdd4bfdc-drjrw   1/1     Running   0          99d
hub-8567fc7d8f-dkcn5                               2/2     Running   0          50s
proxy-56cdf9b78c-kk86b                             1/1     Running   0          72s
traefik-staging-dask-gateway-7df6cffd45-8jdl2      1/1     Running   2          99d

Also, run successfully with the test enabled 😉

Running hub health check...
.                                                                                                                                                                                                                                      [100%]
1 passed in 332.02s (0:05:32)

Copy link
Member

@sgibson91 sgibson91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for this work @damianavila!

@damianavila
Copy link
Contributor Author

Tested the kops action in a test repo and it is not working 😢
Going with a custom step to install kops... something along these lines: https://github.com/damianavila/test/blob/main/.github/workflows/test_kops_gha.yml

@damianavila
Copy link
Contributor Author

@GeorgianaElena and @sgibson91, I pushed another commit that I think you would be interested in.
I am not a GH actions user, so let me know if you have any further feedback about it.
Thanks!

@sgibson91
Copy link
Member

@damianavila Looks good to me! That's a totally valid workflow step :)

@sgibson91
Copy link
Member

sgibson91 commented Sep 13, 2021

The only thing we might want to do is, instead of always pulling the latest release of kops, is use an environment variable to pin the version we want to use for now instead. Then we're in control of bumping to the latest version when we want. See what we do with the HELM_VERSION variable over in mybinder.org-deploy https://github.com/jupyterhub/mybinder.org-deploy/blob/89cfc75dff4f711bb9427da7cb9af0a24a28dfa9/.github/workflows/cd.yml#L29

@damianavila
Copy link
Contributor Author

@sgibson91, I think the last commit addressed what you ask for, let me know if that is not the case.

@damianavila
Copy link
Contributor Author

OK, I will proceed with the merge since I have at least one approval and nobody is working in the prod hub now.
Let's see how this one goes...

@damianavila damianavila merged commit e547635 into master Sep 14, 2021
@damianavila damianavila deleted the aws_kops_auth branch September 14, 2021 00:34
@damianavila
Copy link
Contributor Author

And it obviously failed 😛
Looking at it soon...

@damianavila
Copy link
Contributor Author

@sgibson91, I am surprised this is failing because, accordingly to the docs that should work

If you use the workflow file's run key to read environment variables from within the runner operating system (as shown in the example above), the variable is substituted in the runner operating system after the job is sent to the runner.

unless I am missing something, which is obviously the case 😉
Thoughts? Maybe this is related to the fact we are using a matrix somehow?

@damianavila
Copy link
Contributor Author

Well... testing the same on my test repo works as expected... mmm...

https://github.com/damianavila/test/blob/main/.github/workflows/test_kops_gha.yml

@damianavila
Copy link
Contributor Author

So, it seems there is some permission error preventing curl to write the file on this repo... wondering why the behavior is different from my repo.

@choldgraf
Copy link
Member

Hmmm could it be that we have run into https://github.blog/changelog/2021-04-20-github-actions-control-permissions-for-github_token/ ?

I believe @consideRatio had set the default permissions of all repos to be read only

@sgibson91
Copy link
Member

@choldgraf i would agree except GITHUB_TOKEN isn't being invoked 🤔

@sgibson91
Copy link
Member

This could be a closed pipe issue and I suspect that if we reran CI, it would be a flaky test and pass sometimes and not others

helm/helm#2802

@consideRatio
Copy link
Member

@choldgraf no change made by me regarding that, but I suggest that permissions for GITHUB_TOKEN are explicitly specified - especially in any templates passed to others, so our org and other orgs can narrow down permissions as a security measure and still have the workflows functional.

This is #359

@damianavila
Copy link
Contributor Author

I think this is curl-specific, I will switch to wget and try again 😉

@damianavila
Copy link
Contributor Author

damianavila commented Sep 17, 2021

For posterity, this was not a curl nor a wget issue.
The problem was we already have a kops directory in the repo and I was trying to overwrite it at the time to install kops 🤦 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use an AWS IAM Role / User to fetch kubernetes credentials in our deployer
5 participants