fix(activator): Correctly return noop value from probePodIPs based on changes #14347

arsenetar · 2023-09-08T14:48:26Z

Proposed Changes

NOTE: This builds off of #14303 and includes those commits as well.

Update the activator probePodIPs to handle cases where the healthy pods are updated, but the update is not propagated due to the noop return value being set to true. This issue is specific to certain combinations of activator/revision options such as the mesh mode and probe optimization. Instead of specific unchanged logic for just the probes this uses the set Equal operation to check for unchanged which covers all possible code paths for updates.

Tests have been expanded to include more explicit coverage of different combinations of options and states to probePodIPs to help cover this behavior moving forward. There is one test case no changes without probes that now shows different behavior than prior code. Prior code would return a false for noop. After reviewing the calling code this did not seem to make sense
for a non-probing non-updating call to update the endpoints (especially given the other non-probe changes are now accounted for). So this was left as it is now with the simple unchanged value logic.

This patch has been running in a production environment without issues for awhile now and has not shown the prior problematic behavior from #14200 thus far.

Ref #14200

Release Note

Activator correctly propagates pod health when triggered by changes other than pod probes.

knative-prow · 2023-09-08T14:48:36Z

Hi @arsenetar. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

codecov · 2023-09-11T01:41:40Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (18b3f4f) 86.09% compared to head (5a6a187) 86.08%.
Report is 50 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #14347      +/-   ##
==========================================
- Coverage   86.09%   86.08%   -0.02%     
==========================================
  Files         196      196              
  Lines       14783    14871      +88     
==========================================
+ Hits        12728    12802      +74     
- Misses       1749     1759      +10     
- Partials      306      310       +4

Files	Coverage Δ
pkg/activator/net/revision_backends.go	`92.75% <100.00%> (-0.07%)`	⬇️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The probePodIPs sometimes (depending on configuration) will return a true for noop when in fact there are changes. This is due to changes to the healthy endpoints being possible outside of probing. - Change the unchanged value to just compare the existing healthy set with the new one. - Add tests to cover most of the different cases of behavior for the probePodIPs function. NOTE: There is one test case `no changes without probes` that now shows different behavior than prior code. Prior code would return a false for noop. After reviewing the calling code this did not seem to make sense for a non-probing non-updating call to update the endpoints (given the other non-probe changes are now accounted for). So this was left as it is now with the simple unchanged value logic.

dprotaso

thanks for the PR

looks good - just some minor stuff. Can you also fix the deprecation warnings the linter noted

pkg/activator/net/revision_backends_test.go

dprotaso · 2023-09-21T19:47:15Z

pkg/activator/net/revision_backends_test.go

+			},
+		},
+		{
+			name: "only ready pods healthy without probe optimization", // NOTE: prior test is effectively this one with probe optimization


non blocking question:

I guess to confirm - probe optimization=false doesn't have an effect and probing happens

I thought probe optimization when disabled would not probe at all - (since the revision has a container with an exec probe and the queue proxy can't intercept the container's healthy state)

The probe optimization in this probePodIPs context just allows non-ready pods to be marked as healthy after successful probe. Without the optimization enabled the pod needs to have a successful probe and be ready to be marked healthy. Interestingly enough it still probes the pod regardless. Definitely could update to just only probe the ones it could actually mark healthy... but I don't really want to start adding changes unrelated to the fix here.

I think part of the current behavior has to do possibly with the mesh mode Auto behavior for Mesh detection. Since it needs to probe to detect.

arsenetar · 2023-09-22T23:47:10Z

@dprotaso The sets.String linter error is due to matching the types used in the code for the tests, if we wanted to fix the linter issue there then the usage of those values which propagate throughout the codebase would also have to migrate to the new sets.Set[string] type. That seems a bit beyond the intention of this PR.

- Rename one test case - Move both rw and fakeRT inside test function

dprotaso · 2023-09-25T18:25:02Z

That seems a bit beyond the intention of this PR.

Fair enough - let me know if you want to tackle this otherwise i'll make a new issue

/lgtm
/approve

knative-prow · 2023-09-25T18:25:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arsenetar, dprotaso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dprotaso]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dprotaso · 2023-09-25T18:29:37Z

/override "style / Golang / Lint"

knative-prow · 2023-09-25T18:29:41Z

@dprotaso: Overrode contexts on behalf of dprotaso: style / Golang / Lint

In response to this:

/override "style / Golang / Lint"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dprotaso · 2023-09-25T18:29:55Z

/override "test (v1.27.x, istio-ambient, e2e)"

knative-prow · 2023-09-25T18:29:58Z

@dprotaso: Overrode contexts on behalf of dprotaso: test (v1.27.x, istio-ambient, e2e)

In response to this:

/override "test (v1.27.x, istio-ambient, e2e)"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

- Backport activator fixes from knative#14303 and knative#14347 from 1.12 - Add custom patches for logs and probe durations - Update to go 1.20 - Add patch from knative#14022 - Add custom CI workflows

knative-prow bot requested review from jsanin-vmw and kauana September 8, 2023 14:48

knative-prow bot added area/autoscale area/networking needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 8, 2023

nak3 added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 11, 2023

arsenetar force-pushed the as/activator-pod-probes branch from b2de4aa to 8229a1c Compare September 12, 2023 16:52

dprotaso mentioned this pull request Sep 15, 2023

Activator not updating healthy pods when probe fails #13531

Closed

arsenetar mentioned this pull request Sep 20, 2023

Activator can fail to route requests to backing pods during scaling events #14200

Closed

dprotaso reviewed Sep 21, 2023

View reviewed changes

fix(tests): Refactor new tests

5a6a187

- Rename one test case - Move both rw and fakeRT inside test function

knative-prow bot assigned dprotaso Sep 25, 2023

knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Sep 25, 2023

knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 25, 2023

knative-prow bot merged commit 9e7956f into knative:main Sep 25, 2023
63 of 65 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(activator): Correctly return noop value from probePodIPs based on changes #14347

fix(activator): Correctly return noop value from probePodIPs based on changes #14347

arsenetar commented Sep 8, 2023

knative-prow bot commented Sep 8, 2023

codecov bot commented Sep 11, 2023 •

edited

Loading

dprotaso left a comment

dprotaso Sep 21, 2023

arsenetar Sep 22, 2023

arsenetar commented Sep 22, 2023

dprotaso commented Sep 25, 2023

knative-prow bot commented Sep 25, 2023

dprotaso commented Sep 25, 2023

knative-prow bot commented Sep 25, 2023

dprotaso commented Sep 25, 2023

knative-prow bot commented Sep 25, 2023

fix(activator): Correctly return noop value from probePodIPs based on changes #14347

fix(activator): Correctly return noop value from probePodIPs based on changes #14347

Conversation

arsenetar commented Sep 8, 2023

Proposed Changes

knative-prow bot commented Sep 8, 2023

codecov bot commented Sep 11, 2023 • edited Loading

Codecov Report

dprotaso left a comment

Choose a reason for hiding this comment

dprotaso Sep 21, 2023

Choose a reason for hiding this comment

arsenetar Sep 22, 2023

Choose a reason for hiding this comment

arsenetar commented Sep 22, 2023

dprotaso commented Sep 25, 2023

knative-prow bot commented Sep 25, 2023

dprotaso commented Sep 25, 2023

knative-prow bot commented Sep 25, 2023

dprotaso commented Sep 25, 2023

knative-prow bot commented Sep 25, 2023

codecov bot commented Sep 11, 2023 •

edited

Loading