-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): fix race condition in updating ephemeral metadata #3975
fix(controller): fix race condition in updating ephemeral metadata #3975
Conversation
Published E2E Test Results 4 files 4 suites 3h 15m 11s ⏱️ For more details on these failures, see this check. Results for commit 4a0dc71. ♻️ This comment has been updated with latest results. |
Published Unit Test Results2 280 tests 2 280 ✅ 2m 59s ⏱️ Results for commit 4a0dc71. ♻️ This comment has been updated with latest results. |
Signed-off-by: Youssef Rabie <youssef.rabie@procore.com>
3690743
to
4a0dc71
Compare
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3975 +/- ##
==========================================
+ Coverage 82.69% 82.72% +0.02%
==========================================
Files 163 163
Lines 22895 22903 +8
==========================================
+ Hits 18934 18947 +13
+ Misses 3087 3084 -3
+ Partials 874 872 -2 ☔ View full report in Codecov by Sentry. |
…rgoproj#3975) Signed-off-by: Youssef Rabie <youssef.rabie@procore.com>
…rgoproj#3975) Signed-off-by: Youssef Rabie <youssef.rabie@procore.com>
Checklist:
"fix(controller): Updates such and such. Fixes #1234"
.Currently, the order of the two steps for updating ephemeral metadata can cause some stable pods to end up having the old ephemeral metadata. As per the comment in the code, if we are fetching the set of pods first and updating them ----> then updating the replicaset template spec, then any pods created in the interim will be using the un-updated replicaset template spec, that is, the old ephemeral metadata.
And, since we have at the beginning of the function
The next sync won't go in and update those dangling pods, since the replicaset is actually updated and hasn't been modified.
This happened with me on a scale of 500 pods, with Karpenter consolidation active evicting pods and causing new ones to be created, where this sort of race condition is likely to happen.