Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a keepalive timeout on the locket client #12

Merged
merged 1 commit into from
Apr 19, 2023

Conversation

klapkov
Copy link
Contributor

@klapkov klapkov commented Jan 5, 2023

With this change we add a keepalive timeout to the locket gRPC client. We also make the keepalive "Time" property part of the locket client config, so both of them can be changed. This change brings great improvements to the locket client's ability to reconnect to another server faster under network delays.

More info in the Diego issue: cloudfoundry/diego-release#627
PR in the Diego repo for the changes in the jobs config: cloudfoundry/diego-release#675

Copy link

@PlamenDoychev PlamenDoychev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Correlates to the following PR: cloudfoundry/diego-release#675

@klapkov klapkov reopened this Apr 11, 2023
@geofffranks
Copy link
Contributor

geofffranks commented Apr 18, 2023

Just making sure I understand this fully + why there's no default in locket itself:

If config values are omitted entirely from a locket client, the behavior will be unchanged from current (10s default interval applied by grpc, and infinite timeout). If clients add this config, the behaviors will be changed to what was desired in diego-release/#627. Correct?

Are there any downsides associated with modifying other (non-diego) locket clients outside of diego to add these defaults?

@klapkov
Copy link
Contributor Author

klapkov commented Apr 18, 2023

@geofffranks Your assumption is partially right. If someone does not configure the client, the interval will be 10s and the timeout will be 22s. The value for the default timeout was proposed by @ameowlia and seems reasonable to me as well. With this approach, we leave the door open for users of the locket client to configure it how they want. As to any downsides, from what I know, there should be none.

@geofffranks
Copy link
Contributor

@geofffranks Your assumption is partially right. If someone does not configure the client, the interval will be 10s and the timeout will be 22s.

But this is only because of the defaults set in diego-release's jobs' spec files, correct? If something like tps or policy-server-asg-syncer connected to locket without specifying this in their configs, it would get the GRPC defaults (10s + infinity) which were the same as before this patch, right?

1 similar comment
@geofffranks
Copy link
Contributor

@geofffranks Your assumption is partially right. If someone does not configure the client, the interval will be 10s and the timeout will be 22s.

But this is only because of the defaults set in diego-release's jobs' spec files, correct? If something like tps or policy-server-asg-syncer connected to locket without specifying this in their configs, it would get the GRPC defaults (10s + infinity) which were the same as before this patch, right?

@klapkov
Copy link
Contributor Author

klapkov commented Apr 18, 2023

Yes, I think this would happen, but to be honest I have not tested this. Will do it tomorrow

@klapkov
Copy link
Contributor Author

klapkov commented Apr 19, 2023

So I tested it with the tps_watcher job. Both the timeout and keepalive time were 0 when set in the gRPC client. But I am not sure it matters. As seen here https://github.com/grpc/grpc-go/blob/v1.54.0/dialoptions.go#L471, if the time is less than 10s, it still set's it to 10. As to the timeout, I think 0 just means infinity again. So yes, even though both params are 0, It will most probably behave the way it used to before.

@geofffranks geofffranks merged commit 1f63346 into cloudfoundry:main Apr 19, 2023
PlamenDoychev added a commit to PlamenDoychev/community that referenced this pull request Mar 9, 2024
[DIEGO-RELEASE] Add locket client keepalive time and timeout to jobs: cloudfoundry/diego-release#722
[DIEGO-RELEASE] Make max_containers prop configurable: cloudfoundry/diego-release#876
[BBS] Add request metrics for BBS - still open: cloudfoundry/bbs#80
[BBS] Use scheduling info instead of the whole desiredLRP: cloudfoundry/bbs#79
[BBS] Add routing info endpoint: 
cloudfoundry/bbs#66
[BBS] Remove cpu_weight limit: 
cloudfoundry/bbs#81
[EXECUTOR] Improve error handling on process start - still open: cloudfoundry/executor#91
[LOCKET]  Add a keepalive timeout on the locket client: cloudfoundry/locket#12
[GROOTFS] UsedVolumesSize and UsedStoreInBytes metrics: cloudfoundry/grootfs#155
[ROUTE-EMITTER] Use routing info bbs endpoint when syncing: cloudfoundry/route-emitter#23
[ROUTE-EMITTER]  Use routing_info for desired_lrp's when there are missing actual_lrp's: 
cloudfoundry/route-emitter#26
[SILK-RELEASE] Deduplicate Iptables Rules with Dynamic ASG's: cloudfoundry/silk-release#101
[SILK-RELEASE] Make container_metadata_file_check_timeout on silk-shutdown configurable: 
cloudfoundry/silk-release#111
ameowlia pushed a commit to cloudfoundry/community that referenced this pull request Mar 12, 2024
[DIEGO-RELEASE] Add locket client keepalive time and timeout to jobs: cloudfoundry/diego-release#722
[DIEGO-RELEASE] Make max_containers prop configurable: cloudfoundry/diego-release#876
[BBS] Add request metrics for BBS - still open: cloudfoundry/bbs#80
[BBS] Use scheduling info instead of the whole desiredLRP: cloudfoundry/bbs#79
[BBS] Add routing info endpoint:
cloudfoundry/bbs#66
[BBS] Remove cpu_weight limit:
cloudfoundry/bbs#81
[EXECUTOR] Improve error handling on process start - still open: cloudfoundry/executor#91
[LOCKET]  Add a keepalive timeout on the locket client: cloudfoundry/locket#12
[GROOTFS] UsedVolumesSize and UsedStoreInBytes metrics: cloudfoundry/grootfs#155
[ROUTE-EMITTER] Use routing info bbs endpoint when syncing: cloudfoundry/route-emitter#23
[ROUTE-EMITTER]  Use routing_info for desired_lrp's when there are missing actual_lrp's:
cloudfoundry/route-emitter#26
[SILK-RELEASE] Deduplicate Iptables Rules with Dynamic ASG's: cloudfoundry/silk-release#101
[SILK-RELEASE] Make container_metadata_file_check_timeout on silk-shutdown configurable:
cloudfoundry/silk-release#111

wip
ameowlia pushed a commit to cloudfoundry/community that referenced this pull request Mar 12, 2024
[DIEGO-RELEASE] Add locket client keepalive time and timeout to jobs: cloudfoundry/diego-release#722
[DIEGO-RELEASE] Make max_containers prop configurable: cloudfoundry/diego-release#876
[BBS] Add request metrics for BBS - still open: cloudfoundry/bbs#80
[BBS] Use scheduling info instead of the whole desiredLRP: cloudfoundry/bbs#79
[BBS] Add routing info endpoint:
cloudfoundry/bbs#66
[BBS] Remove cpu_weight limit:
cloudfoundry/bbs#81
[EXECUTOR] Improve error handling on process start - still open: cloudfoundry/executor#91
[LOCKET]  Add a keepalive timeout on the locket client: cloudfoundry/locket#12
[GROOTFS] UsedVolumesSize and UsedStoreInBytes metrics: cloudfoundry/grootfs#155
[ROUTE-EMITTER] Use routing info bbs endpoint when syncing: cloudfoundry/route-emitter#23
[ROUTE-EMITTER]  Use routing_info for desired_lrp's when there are missing actual_lrp's:
cloudfoundry/route-emitter#26
[SILK-RELEASE] Deduplicate Iptables Rules with Dynamic ASG's: cloudfoundry/silk-release#101
[SILK-RELEASE] Make container_metadata_file_check_timeout on silk-shutdown configurable:
cloudfoundry/silk-release#111

wip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants