Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator service account is deleted by mistake during the upgrade #1879

Closed
horis233 opened this issue Nov 25, 2020 · 0 comments · Fixed by #1881
Closed

operator service account is deleted by mistake during the upgrade #1879

horis233 opened this issue Nov 25, 2020 · 0 comments · Fixed by #1881
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@horis233
Copy link
Contributor

horis233 commented Nov 25, 2020

Bug Report

The issue can be observed intermittently during the operator upgrade.

What did you do?
A clear and concise description of the steps you took (or insert a code snippet).

In order to test the upgrade, we install our operators with an old version, then switching the catalogsource image to test the upgrade.

What did you expect to see?
A clear and concise description of what you expected to happen (or insert a code snippet).

We expect to see all the operators are upgraded to the target version.

What did you see instead? Under which circumstances?
A clear and concise description of what you expected to happen (or insert a code snippet).

We will see some operators hang in installing status and operator pods complain the service account token secrets can't be found.

Let me use 5 why analysis to explain the cause:

Why1: Why it happens? We made some investigations and found the reason why service account token secrets can't be found is that the original service account is deleted

Why2: Why the service account is deleted? After checking the audit log, it is deleted by the garbage collector.

Why3: Why the garbage collector deleted it? The reason why the service account is deleted is the old version operator CSV is deleted.

Why4: Why deleting the old version operator CSV will impact the service account? The owner of the service account is the old version operator CSV.

Why5: Why the owner of the service account isn't updated to refer to the new CSV? I guess it is because the update of the service account is managed by catalog operator and CSV deletion is managed by OLM operator. It is just my guess. I haven't checked the OLM code to verify the idea. Please OLM team to analyze the root cause and how to avoid this issue.

cc @hchenxa @chenzhiwei @DanielXLee @cheewaio

Environment

  • operator-lifecycle-manager version:

From clusteroperator

    - name: operator-lifecycle-manager
      version: 0.16.1
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0+d59ce34", GitCommit:"d59ce3486ae3ca3a0c36e5498e56f51594076596", GitTreeState:"clean", BuildDate:"2020-10-08T15:58:07Z", GoVersion:"go1.15.0", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:
OCP

Possible Solution

Workaroud: delete the operator pod to get the token of the new service account

Additional context
Add any other context about the problem here.

@horis233 horis233 added the kind/bug Categorizes issue or PR as related to a bug. label Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant