-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak probing retry timings. #8083
Tweak probing retry timings. #8083
Conversation
@JRBANCEL: GitHub didn't allow me to request PR reviews from the following users: yuzisun. Note that only knative members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JRBANCEL: 0 warnings.
In response to this:
Fixes #8054
Context
Until now, the probing is using the default rate limiter
workqueue.DefaultControllerRateLimiter
:
- 10 QPS
- Exponential back-off with a 5 ms base (10, 20, 40, 80, etc...)
In the most optimal scenario (single node, no load, 1 Envoy Pod, 1 Knative Service), it takes ~150 ms for Istio to apply a VirtualService change. The timeline of probing is (because we mistakenly call
AddRateLimited
the first time enqueueing so the rate limiting is applied):
t=10 ms
: 🔴t=30 ms
: 🔴t=70 ms
: 🔴t=150 ms
: 🟢Proposed Changes
- Do not use
AddRateLimited
when enqueueing the first time (i.e. only throttle retries)- Add a 200 ms delay when enqueueing the first time (trying before the ~150 ms Istio lower bound is pointless)
- Increase the base of the back-off from 5 ms to 50 ms (5 ms is super aggressive)
- Increase the QPS limit to 50 QPS (it is limited to 10 go-routines anyway)
Result
In the optimal scenario, only a single request is now necessary.
In other scenarios, more requests can be executed simultaneously while per VirtualService fewer requests will be executed./cc @yuzisun
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/lgtm |
Need to update the go.mod. |
da72ee0
to
bfd46dc
Compare
bfd46dc
to
d53d2e5
Compare
The following jobs failed:
Failed non-flaky tests preventing automatic retry of pull-knative-serving-integration-tests:
and 46 more. |
d53d2e5
to
f0ff75b
Compare
The following is the coverage report on the affected files.
|
@ZhiminXiang please re-approve, I fixed |
/lgtm |
/assign @tcnghia |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JRBANCEL, tcnghia, ZhiminXiang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The PR is now ready to be merged, so Tide will merge it with HEAD of master and retrigger the tests, so the tests could pass. But if the PR is not in the merge pool, the tests will be directly run against the commit in the PR. |
Fixes #8054
Context
Until now, the probing is using the default rate limiter
workqueue.DefaultControllerRateLimiter
:In the most optimal scenario (single node, no load, 1 Envoy Pod, 1 Knative Service), it takes at least 150 ms for Istio to apply a VirtualService change. The timeline of probing is (because we mistakenly call
AddRateLimited
the first time enqueueing so the rate limiting is applied):t=10 ms
: 🔴t=30 ms
: 🔴t=70 ms
: 🔴t=150 ms
: 🟢Proposed Changes
AddRateLimited
when enqueueing the first time (i.e. only throttle retries)Result
In the optimal scenario, only a single request is now necessary.
In other scenarios, more requests can be executed by unit of time while per VirtualService fewer requests will be executed.
/cc @yuzisun