Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOI deployment failing #759

Closed
TomAugspurger opened this issue Sep 28, 2020 · 11 comments
Closed

OOI deployment failing #759

TomAugspurger opened this issue Sep 28, 2020 · 11 comments

Comments

@TomAugspurger
Copy link
Member

The OOI deployment is failing somewhere in hubploy.deploy.

Deleting outdated charts
Traceback (most recent call last):
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 66, in helm_upgrade
    kubernetes.config.load_kube_config(config_file=kubeconfig)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 739, in load_kube_config
    persist_config=persist_config)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 695, in _get_kube_config_loader_for_yaml_file
    kcfg = KubeConfigMerger(filename)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 650, in __init__
    self.load_config(path)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 664, in load_config
    config_merged[item] = []
TypeError: 'NoneType' object does not support item assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    load_entry_point('hubploy==0.1.1', 'console_scripts', 'hubploy')()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 139, in main
    args.cleanup_on_fail,
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 187, in deploy
    cleanup_on_fail,
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 68, in helm_upgrade
    kubernetes.config.load_incluster_config()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 94, in load_incluster_config
    cert_filename=SERVICE_CERT_FILENAME).load_and_set()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 45, in load_and_set
    self._load_config()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 51, in _load_config
    raise ConfigException("Service host/port is not set.")
kubernetes.config.config_exception.ConfigException: Service host/port is not set.

Exited with code exit status 1

@salvis2 any guesses? IIRC you were working with kubernetes auth stuff recently in hubploy?

@salvis2
Copy link
Member

salvis2 commented Sep 28, 2020

I was looking at that a bit last Thursday. The error seems familiar but I can't remember what the issue was. Yeah, I did some of the auth updates to hubploy for AWS. I was looking into that as well last week to see if there were obvious changes to make in the Azure side of things.

@salvis2
Copy link
Member

salvis2 commented Sep 28, 2020

@TomAugspurger
Copy link
Member Author

I might temporarily revert to the older version of hubploy just to get a working deploy done if that's OK. Will that break anything on the AWS deploy?

@salvis2
Copy link
Member

salvis2 commented Sep 28, 2020

I think it will be fine. It was working before the hubploy update.

TomAugspurger added a commit that referenced this issue Sep 28, 2020
@salvis2
Copy link
Member

salvis2 commented Sep 28, 2020

So we've reverted to commit berkeley-dsep-infra/hubploy@d619b2d in hubploy, which happened March 13th, 2020.

The most relevant commit to helm.py is here: berkeley-dsep-infra/hubploy@2622456, which happened April 8th, 2020. It really just adds the option to specify the environment variable KUBECONFIG and use it to load_kube_config().

So the OOI deployment works when it assumes that the kubeconfig file is at the normal location (from https://app.circleci.com/pipelines/github/pangeo-data/pangeo-cloud-federation/1397/workflows/1b88e9cc-8c22-493a-b96b-7914f43b4ba5/jobs/1560):

Merged "ooi-pangeo" as current context in /home/circleci/.kube/config

The newer version of hubploy also gives the above message in CI (https://app.circleci.com/pipelines/github/pangeo-data/pangeo-cloud-federation/1394/workflows/475a8afa-f77a-427d-99c9-210ac3bf0dcd/jobs/1557). However, the kubeconfig file is supposed to be at a different location, relevant hubploy code seen here: https://github.com/yuvipanda/hubploy/blob/6742809fc1d8676859fe5442478c95c41c7ad050/hubploy/auth.py#L209-L213

        temp_kubeconfig = tempfile.NamedTemporaryFile()
        orig_kubeconfig = os.environ.get("KUBECONFIG", None)

        try:
            os.environ["KUBECONFIG"] = temp_kubeconfig.name

If I go to the second CI link (where OOI failed to deploy), we see the following for the AWS deploy step:

Added new context arn:aws:eks:us-west-2:783380859522:cluster/pangeo to /tmp/tmpdkim24d3

This is definitely a name for the temporary kubeconfig file that I'm expecting. Now to look into why the OOI deployment doesn't do that.

@salvis2
Copy link
Member

salvis2 commented Sep 28, 2020

Ok, so the ak eks get-credentials command has an optional argument:

--file -f

Kubernetes configuration file to update. Use "-" to print YAML to stdout instead.
default value: ~/.kube/config

So I think if we add this to the hubploy command, it will actually support a temporary kubeconfig file. I guess this isn't an issue for AWS or GCP?

@tjcrone
Copy link
Contributor

tjcrone commented Sep 28, 2020

Just to weigh in here, wherever we are right now, both OOI prod and OOI staging are up and working well. It would be great if we could keep it like this, if you think this is a potentiality? I have discussed the possibility of a stable version of this repo where we don't do a ton of bleeding edge. Is this something that would be of interest?

@tjcrone
Copy link
Contributor

tjcrone commented Sep 28, 2020

Also thank you all very much for helping to get us back up and running. Woot!

@salvis2
Copy link
Member

salvis2 commented Sep 28, 2020

Yeah @tjcrone it would be nice if you didn't have to worry beyond this point.

I'm wondering if we should break the CI for each hub into its own step, so they could have different hubploy versions, etc. This is a step in the direction of what you are suggesting (or having a different repo for each hub). We were also looking to move into GitHub Actions which might help make the CI into bite-sized pieces, but we can leave OOI out of that while the class is ongoing.

@tjcrone
Copy link
Contributor

tjcrone commented Sep 28, 2020

Isolating CI components is an interesting idea that I think we should consider. It might be possible to run each deploy step in a separate image. Or at least define versions of tools like hubploy that might be different for different deployments. We could also look into only running the CI for the deployments where changed occurred. This sort of structure might be much better than forked repos. Since you are in the process of migrating to GitHub actions, this seems like a great time to explore these various options. I'm happy to discuss further and help.

@salvis2
Copy link
Member

salvis2 commented Oct 14, 2020

The OOI deployment is successfully deploying!

@salvis2 salvis2 closed this as completed Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants