-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Agents stop update managedcluster status when clock is out of sync. #770
base: main
Are you sure you want to change the base?
Conversation
1aae1d5
to
54fc61e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #770 +/- ##
==========================================
+ Coverage 63.60% 63.77% +0.16%
==========================================
Files 192 192
Lines 18433 18596 +163
==========================================
+ Hits 11725 11860 +135
- Misses 5735 5757 +22
- Partials 973 979 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
what are the consequences of not been able to update the ManagedCluster status? Do we lost any kind of feature? |
@jgato The OCM is not designed to work under the situation when clocks are not synced, if clocks are not synced, the first priority is to fix it. This PR prevents status flapping by ensuring managed clusters remain in Unknown state instead of alternating between Unknown and Available. |
/assign @qiujian16 |
/assign @elgnay |
54fc61e
to
8ff2ec6
Compare
8ff2ec6
to
973330f
Compare
973330f
to
e4d8de9
Compare
We need also document this change in the website. |
6da70c8
to
429463a
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: qiujian16, xuezhaojun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Agree, will add tasks for it. |
Signed-off-by: xuezhaojun <zxue@redhat.com>
429463a
to
528c61e
Compare
stop = runAgent(managedClusterName, agentOptions, commOptions, spokeCfg) | ||
defer stop() | ||
|
||
assertAvailableCondition(managedClusterName, metav1.ConditionTrue, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qiujian16 After run agent again, check if conditions become True
, please take another look.
@qiujian16 A task of updating doc is added: open-cluster-management-io/open-cluster-management-io.github.io#443 |
Summary
This patch aim to stop agents from updating managed cluster status when clock is already out of sync.
The issue occurs when the hub sets the managed cluster status to unknown due to an outdated lease, while the agent simultaneously sets the managed cluster status to available because the API server is accessible. This conflict causes the managed cluster status to flap back and forth frequently, triggering numerous reconciliations and resulting in performance issues.