-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve tunnel availability #375
Conversation
@aholic: GitHub didn't allow me to assign the following users: your_reviewer. Note that only openyurtio members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @Fei-Guo |
/lgtm |
@@ -150,7 +150,7 @@ func (o *ServerOptions) Config() (*config.Config, error) { | |||
if err != nil { | |||
return nil, err | |||
} | |||
cfg.SharedInformerFactory = informers.NewSharedInformerFactory(cfg.Client, 10*time.Second) | |||
cfg.SharedInformerFactory = informers.NewSharedInformerFactory(cfg.Client, 24*time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you change the resync time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my situation, i saw thousands tunnel certificate-sign-requests was in pending/approved status. All of them was enqueued every 10 seconds, which full filled the work-queue. The pending certificate-sign-requests could not be approved(after 24hours pending, it was deleted automatically). So i did two things:
- make the re-sync period longer. I think there's no need for the re-sync period to be so short, it's just a method to fix some unexpected situation.
- filter before enqueue, only do enqueue for pending tunnel certificate-sign-requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Alternatively, we may set it to 0 to disable resync .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, 0 is also ok. let's just make it a relative long period, in case of some unexpected situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fei-Guo Set a relative long period(like 24 * time.Hour) for resync is more reasonable than disabling resync, in case of some unexpected situation.
cmd/yurt-tunnel-agent/app/start.go
Outdated
@@ -91,6 +94,17 @@ func Run(cfg *config.CompletedConfig, stopCh <-chan struct{}) error { | |||
} | |||
agentCertMgr.Start() | |||
|
|||
// 2.1. waiting for the certificate is generated | |||
_ = wait.PollUntil(15*time.Second, func() (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 15 second poll interval too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it to reduce qps pressure of api-server. I don't want to see an avalanche when thousands of node comes(or other situations). 5 second or 15 second will not hurt user-experience, but an avalanche will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everybody waits 15 seconds does not eliminated the bursty quests but only delay them. You will need to add random delay to smooth the QPS or just give a reasonable delay for getting a signed certificate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's set it to 5 second first
i will update it to a random delay sometime later, when I'm free..
/lgtm |
@Fei-Guo If you have not any other comments, i'd like to approve this pr. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aholic, Fei-Guo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
/assign @rambohe-ch
Does this PR introduce a user-facing change?
other Note