-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect driver.FlagDefaults even if --extra-config is set #7509
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: medyagh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report
@@ Coverage Diff @@
## master #7509 +/- ##
=======================================
Coverage 36.69% 36.69%
=======================================
Files 146 146
Lines 8976 8976
=======================================
Hits 3294 3294
Misses 5290 5290
Partials 392 392
|
/ok-to-test |
Error: running mkcmp: exit status 1 |
All Times minikube: [ 67.486090 63.650628 65.347021] Average minikube: 65.494580 Averages Time Per Log
|
if err := ExpectAppsRunning(cs, expected); err != nil { | ||
if time.Since(start) > minLogCheckTime { | ||
glog.Infof("error waiting for apps to be running: %v", err) | ||
time.Sleep(kconst.APICallRetryInterval * 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this multiply the standard retry interval by 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was something I stole from the simmilar func system pods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do it to avoid pegging the system because they are running:
announceProblems(r, bs, cfg, cr)
You may want to call the same function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont have the parameters that announceProblems needs in this function. (r cruntime.Manager, bs bootstrapper.Bootstrapper,)
Can you give this PR a title that reflects the change? It doesn't seem to be specific to containerd. |
if err := ExpectAppsRunning(cs, expected); err != nil { | ||
if time.Since(start) > minLogCheckTime { | ||
glog.Infof("error waiting for apps to be running: %v", err) | ||
time.Sleep(kconst.APICallRetryInterval * 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do it to avoid pegging the system because they are running:
announceProblems(r, bs, cfg, cr)
You may want to call the same function here.
All Times Minikube (PR 7509): [ 65.755934 65.131846 64.926113] Average minikube: 64.917641 Averages Time Per Log
|
TestStartStop/group/old-docker is related to Cert Flake #6956 |
All Times minikube: [ 69.184964 65.362801 67.889831] Average minikube: 67.479199 Averages Time Per Log
|
this one line change is result of a full day of deep investigation and yes it fixes it !
here is what was happpenning .
Before this PR:
was failiing on waiting because coreDNS container was stuck in Creating:
because
since we added a new --wait component "aps_running" the 'TestStartStop' for 'Crio' and 'Containerd' revealed the hidden failure that we had not seen but always existed (on our previous stable releases it exists)
infact users had reported this issue #7354 but it was hidden to our eyes before the new --wait flag.
What was happening ?
for crio and contanerd on docker driver we need a CNI to make it work.
and we automatically set the Extra Options for the Pod CIDR to be routed to use the kic CNI (overlay).
however ---> if someone adds an Extra Option through the flag , we that messed up the auto setting the CNI Extra option!
(thank you so much anyone who wrote that Integration Tests to start with extra options, you did a great job)
This one line change was found thanks to the new --wait=true that waits for more stuff and revealed all the secret bugs we had covering under a lot of dust.
(also tweaking the logs and the retry logic to be more efficient and produce less noise)
Cheers !
After this PR :
Works like a Charm !
this PR fixes this issue for Docker and Podman Drivers.
I still hope one day we unify the CNI solution across VM and container and Multinode I hope we do this in the future : #7428
not fixed in this PR
this PR still does not fix the CRIO problem with CNI on docker driver. that would be subject for another investigation. see more here #7522