Tweak probing retry timings. #8083

JRBANCEL · 2020-05-27T02:37:11Z

Context

Until now, the probing is using the default rate limiter workqueue.DefaultControllerRateLimiter:

10 QPS
Exponential back-off with a 5 ms base (10, 20, 40, 80, etc...)

In the most optimal scenario (single node, no load, 1 Envoy Pod, 1 Knative Service), it takes at least 150 ms for Istio to apply a VirtualService change. The timeline of probing is (because we mistakenly call AddRateLimited the first time enqueueing so the rate limiting is applied):

t=10 ms : 🔴
t=30 ms : 🔴
t=70 ms : 🔴
t=150 ms : 🟢

Proposed Changes

Do not use AddRateLimited when enqueueing the first time (i.e. only throttle retries)
Add a 200 ms delay when enqueueing the first time (trying before the ~150 ms Istio lower bound is pointless)
Increase the base of the back-off from 5 ms to 50 ms (5 ms is super aggressive)
Increase the QPS limit to 50 QPS (it is limited to 10 go-routines anyway)

Result

In the optimal scenario, only a single request is now necessary.
In other scenarios, more requests can be executed by unit of time while per VirtualService fewer requests will be executed.

/cc @yuzisun

knative-prow-robot · 2020-05-27T02:37:12Z

@JRBANCEL: GitHub didn't allow me to request PR reviews from the following users: yuzisun.

Note that only knative members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Fixes #8054

Context

Until now, the probing is using the default rate limiter workqueue.DefaultControllerRateLimiter:

10 QPS

Exponential back-off with a 5 ms base (10, 20, 40, 80, etc...)

In the most optimal scenario (single node, no load, 1 Envoy Pod, 1 Knative Service), it takes ~150 ms for Istio to apply a VirtualService change. The timeline of probing is (because we mistakenly call AddRateLimited the first time enqueueing so the rate limiting is applied):

t=10 ms : 🔴

t=30 ms : 🔴

t=70 ms : 🔴

t=150 ms : 🟢

Proposed Changes

Do not use AddRateLimited when enqueueing the first time (i.e. only throttle retries)

Add a 200 ms delay when enqueueing the first time (trying before the ~150 ms Istio lower bound is pointless)

Increase the base of the back-off from 5 ms to 50 ms (5 ms is super aggressive)

Increase the QPS limit to 50 QPS (it is limited to 10 go-routines anyway)

Result

In the optimal scenario, only a single request is now necessary.
In other scenarios, more requests can be executed simultaneously while per VirtualService fewer requests will be executed.

/cc @yuzisun

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot

@JRBANCEL: 0 warnings.

In response to this:

Fixes #8054

Context

Until now, the probing is using the default rate limiter workqueue.DefaultControllerRateLimiter:

10 QPS

Exponential back-off with a 5 ms base (10, 20, 40, 80, etc...)

In the most optimal scenario (single node, no load, 1 Envoy Pod, 1 Knative Service), it takes ~150 ms for Istio to apply a VirtualService change. The timeline of probing is (because we mistakenly call AddRateLimited the first time enqueueing so the rate limiting is applied):

t=10 ms : 🔴

t=30 ms : 🔴

t=70 ms : 🔴

t=150 ms : 🟢

Proposed Changes

Do not use AddRateLimited when enqueueing the first time (i.e. only throttle retries)

Add a 200 ms delay when enqueueing the first time (trying before the ~150 ms Istio lower bound is pointless)

Increase the base of the back-off from 5 ms to 50 ms (5 ms is super aggressive)

Increase the QPS limit to 50 QPS (it is limited to 10 go-routines anyway)

Result

In the optimal scenario, only a single request is now necessary.
In other scenarios, more requests can be executed simultaneously while per VirtualService fewer requests will be executed.

/cc @yuzisun

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ZhiminXiang · 2020-05-27T02:43:06Z

/lgtm
/approve

vagababov · 2020-05-27T03:42:46Z

Need to update the go.mod.

knative-test-reporter-robot · 2020-05-27T19:41:24Z

The following jobs failed:

Test name	Triggers	Retries
pull-knative-serving-integration-tests		0/3
pull-knative-serving-unit-tests		0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-integration-tests:

test/e2e/istio.TestClusterLocalAuthorization
test/e2e/istio.TestIstioProbing
test/conformance/api/v1.TestUpdateConfigurationMetadata
test/conformance/api/v1.TestContainerErrorMsg
test/conformance/api/v1.TestContainerExitingMsg
test/conformance/api/v1.TestTranslation
test/conformance/api/v1.TestServiceAccountValidation
test/conformance/api/v1.TestAnnotationPropagation

and 46 more.

knative-metrics-robot · 2020-05-27T19:55:59Z

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/network/status/status.go	98.4%	98.4%	0.1

JRBANCEL · 2020-05-27T21:10:32Z

@ZhiminXiang please re-approve, I fixed go.mod.

ZhiminXiang · 2020-05-27T21:17:00Z

/lgtm
/approve

ZhiminXiang · 2020-05-27T21:19:07Z

/assign @tcnghia

tcnghia · 2020-06-02T17:04:29Z

/approve

knative-prow-robot · 2020-06-02T17:04:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JRBANCEL, tcnghia, ZhiminXiang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [tcnghia]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tcnghia · 2020-06-02T20:25:58Z

/test pull-knative-serving-upgrade-tests

@chizhg do @JRBANCEL need to rebase with master to get my change #8158 ?

chizhg · 2020-06-02T21:22:03Z

/test pull-knative-serving-upgrade-tests

@chizhg do @JRBANCEL need to rebase with master to get my change #8158 ?

The PR is now ready to be merged, so Tide will merge it with HEAD of master and retrigger the tests, so the tests could pass. But if the PR is not in the merge pool, the tests will be directly run against the commit in the PR.
It's a lightly weird behavior for Prow..

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label May 27, 2020

knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 27, 2020

knative-prow-robot reviewed May 27, 2020

View reviewed changes

knative-prow-robot added the area/networking label May 27, 2020

knative-prow-robot assigned ZhiminXiang May 27, 2020

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2020

Tweak probing retry timings.

fc84aa9

JRBANCEL force-pushed the increase-probing-qps branch from da72ee0 to bfd46dc Compare May 27, 2020 18:09

knative-prow-robot removed lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 27, 2020

JRBANCEL force-pushed the increase-probing-qps branch from bfd46dc to d53d2e5 Compare May 27, 2020 18:23

knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 27, 2020

Rebase & add dependency to go.mod & go mod vendor.

f0ff75b

JRBANCEL force-pushed the increase-probing-qps branch from d53d2e5 to f0ff75b Compare May 27, 2020 19:49

knative-prow-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 27, 2020

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2020

knative-prow-robot assigned tcnghia May 27, 2020

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2020

knative-prow-robot merged commit e8fd05b into knative:master Jun 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak probing retry timings. #8083

Tweak probing retry timings. #8083

JRBANCEL commented May 27, 2020 •

edited

Loading

knative-prow-robot commented May 27, 2020

Context

Proposed Changes

Result

knative-prow-robot left a comment

ZhiminXiang commented May 27, 2020

vagababov commented May 27, 2020

knative-test-reporter-robot commented May 27, 2020

knative-metrics-robot commented May 27, 2020

JRBANCEL commented May 27, 2020

ZhiminXiang commented May 27, 2020

ZhiminXiang commented May 27, 2020

tcnghia commented Jun 2, 2020

knative-prow-robot commented Jun 2, 2020

tcnghia commented Jun 2, 2020

chizhg commented Jun 2, 2020

Tweak probing retry timings. #8083

Tweak probing retry timings. #8083

Conversation

JRBANCEL commented May 27, 2020 • edited Loading

Context

Proposed Changes

Result

knative-prow-robot commented May 27, 2020

Context

Proposed Changes

Result

knative-prow-robot left a comment

Choose a reason for hiding this comment

Context

Proposed Changes

Result

ZhiminXiang commented May 27, 2020

vagababov commented May 27, 2020

knative-test-reporter-robot commented May 27, 2020

knative-metrics-robot commented May 27, 2020

JRBANCEL commented May 27, 2020

ZhiminXiang commented May 27, 2020

ZhiminXiang commented May 27, 2020

tcnghia commented Jun 2, 2020

knative-prow-robot commented Jun 2, 2020

tcnghia commented Jun 2, 2020

chizhg commented Jun 2, 2020

JRBANCEL commented May 27, 2020 •

edited

Loading