-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(activator): Correctly return noop value from probePodIPs based on changes #14347
Conversation
Hi @arsenetar. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #14347 +/- ##
==========================================
- Coverage 86.09% 86.08% -0.02%
==========================================
Files 196 196
Lines 14783 14871 +88
==========================================
+ Hits 12728 12802 +74
- Misses 1749 1759 +10
- Partials 306 310 +4
☔ View full report in Codecov by Sentry. |
The probePodIPs sometimes (depending on configuration) will return a true for noop when in fact there are changes. This is due to changes to the healthy endpoints being possible outside of probing. - Change the unchanged value to just compare the existing healthy set with the new one. - Add tests to cover most of the different cases of behavior for the probePodIPs function. NOTE: There is one test case `no changes without probes` that now shows different behavior than prior code. Prior code would return a false for noop. After reviewing the calling code this did not seem to make sense for a non-probing non-updating call to update the endpoints (given the other non-probe changes are now accounted for). So this was left as it is now with the simple unchanged value logic.
b2de4aa
to
8229a1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR
looks good - just some minor stuff. Can you also fix the deprecation warnings the linter noted
}, | ||
}, | ||
{ | ||
name: "only ready pods healthy without probe optimization", // NOTE: prior test is effectively this one with probe optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non blocking question:
I guess to confirm - probe optimization=false doesn't have an effect and probing happens
I thought probe optimization when disabled would not probe at all - (since the revision has a container with an exec probe and the queue proxy can't intercept the container's healthy state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The probe optimization in this probePodIPs
context just allows non-ready pods to be marked as healthy after successful probe. Without the optimization enabled the pod needs to have a successful probe and be ready to be marked healthy. Interestingly enough it still probes the pod regardless. Definitely could update to just only probe the ones it could actually mark healthy... but I don't really want to start adding changes unrelated to the fix here.
I think part of the current behavior has to do possibly with the mesh mode Auto behavior for Mesh detection. Since it needs to probe to detect.
@dprotaso The |
- Rename one test case - Move both rw and fakeRT inside test function
Fair enough - let me know if you want to tackle this otherwise i'll make a new issue /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arsenetar, dprotaso The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override "style / Golang / Lint" |
@dprotaso: Overrode contexts on behalf of dprotaso: style / Golang / Lint In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override "test (v1.27.x, istio-ambient, e2e)" |
@dprotaso: Overrode contexts on behalf of dprotaso: test (v1.27.x, istio-ambient, e2e) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- Backport activator fixes from knative#14303 and knative#14347 from 1.12 - Add custom patches for logs and probe durations - Update to go 1.20 - Add patch from knative#14022 - Add custom CI workflows
Proposed Changes
NOTE: This builds off of #14303 and includes those commits as well.
Update the activator
probePodIPs
to handle cases where the healthy pods are updated, but the update is not propagated due to thenoop
return value being set to true. This issue is specific to certain combinations of activator/revision options such as the mesh mode and probe optimization. Instead of specific unchanged logic for just the probes this uses the setEqual
operation to check for unchanged which covers all possible code paths for updates.Tests have been expanded to include more explicit coverage of different combinations of options and states to probePodIPs to help cover this behavior moving forward. There is one test case
no changes without probes
that now shows different behavior than prior code. Prior code would return a false for noop. After reviewing the calling code this did not seem to make sensefor a non-probing non-updating call to update the endpoints (especially given the other non-probe changes are now accounted for). So this was left as it is now with the simple unchanged value logic.
This patch has been running in a production environment without issues for awhile now and has not shown the prior problematic behavior from #14200 thus far.
Ref #14200
Release Note