You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue can be observed intermittently during the operator upgrade.
What did you do?
A clear and concise description of the steps you took (or insert a code snippet).
In order to test the upgrade, we install our operators with an old version, then switching the catalogsource image to test the upgrade.
What did you expect to see?
A clear and concise description of what you expected to happen (or insert a code snippet).
We expect to see all the operators are upgraded to the target version.
What did you see instead? Under which circumstances?
A clear and concise description of what you expected to happen (or insert a code snippet).
We will see some operators hang in installing status and operator pods complain the service account token secrets can't be found.
Let me use 5 why analysis to explain the cause:
Why1: Why it happens? We made some investigations and found the reason why service account token secrets can't be found is that the original service account is deleted
Why2: Why the service account is deleted? After checking the audit log, it is deleted by the garbage collector.
Why3: Why the garbage collector deleted it? The reason why the service account is deleted is the old version operator CSV is deleted.
Why4: Why deleting the old version operator CSV will impact the service account? The owner of the service account is the old version operator CSV.
Why5: Why the owner of the service account isn't updated to refer to the new CSV? I guess it is because the update of the service account is managed by catalog operator and CSV deletion is managed by OLM operator. It is just my guess. I haven't checked the OLM code to verify the idea. Please OLM team to analyze the root cause and how to avoid this issue.
Bug Report
The issue can be observed intermittently during the operator upgrade.
What did you do?
A clear and concise description of the steps you took (or insert a code snippet).
In order to test the upgrade, we install our operators with an old version, then switching the catalogsource image to test the upgrade.
What did you expect to see?
A clear and concise description of what you expected to happen (or insert a code snippet).
We expect to see all the operators are upgraded to the target version.
What did you see instead? Under which circumstances?
A clear and concise description of what you expected to happen (or insert a code snippet).
We will see some operators hang in installing status and operator pods complain the service account token secrets can't be found.
Let me use 5 why analysis to explain the cause:
Why1: Why it happens? We made some investigations and found the reason why service account token secrets can't be found is that the original service account is deleted
Why2: Why the service account is deleted? After checking the audit log, it is deleted by the garbage collector.
Why3: Why the garbage collector deleted it? The reason why the service account is deleted is the old version operator CSV is deleted.
Why4: Why deleting the old version operator CSV will impact the service account? The owner of the service account is the old version operator CSV.
Why5: Why the owner of the service account isn't updated to refer to the new CSV? I guess it is because the update of the service account is managed by catalog operator and CSV deletion is managed by OLM operator. It is just my guess. I haven't checked the OLM code to verify the idea. Please OLM team to analyze the root cause and how to avoid this issue.
cc @hchenxa @chenzhiwei @DanielXLee @cheewaio
Environment
From clusteroperator
Possible Solution
Workaroud: delete the operator pod to get the token of the new service account
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: