Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sockmem-plugin): unified solution for TCP memory limitation #365 #366

Merged
merged 4 commits into from
Nov 22, 2023

Conversation

lubinszARM
Copy link
Contributor

@lubinszARM lubinszARM commented Nov 15, 2023

What type of PR is this?
Features

What this PR does / why we need it:
The feature provides the unified solution for TCP memory limitation in cgroup and global level.

Which issue(s) this PR fixes:
In our production environment, there is a critical case where certain services heavily consumed tcpmem(cgroupv1 pods with default setting have unlimited tcpmem), causing the global tcpmem to reach the limitation. As a result, the network performance of the entire machine was crash.

Special notes for your reviewer:
The feature includes 3 parts:
1, Set global tcpmem limit by changing net.ipv4.tcp_mem.
The default value is 20% of host toal memory.
2, Do nothing under cgroupv2.
3, Set pod tcpmem limit under cgroupv1.
The default value is same with memory.limit_in_bytes.

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

Attention: 82 lines in your changes are missing coverage. Please review.

Comparison is base (e333907) 53.31% compared to head (cd1f144) 53.49%.

Files Patch % Lines
...m-plugins/memory/handlers/sockmem/sockmem_linux.go 44.32% 50 Missing and 4 partials ⚠️
...qrm-plugins/memory/handlers/sockmem/utils_linux.go 77.50% 6 Missing and 3 partials ⚠️
...g/agent/qrm-plugins/memory/dynamicpolicy/policy.go 12.50% 6 Missing and 1 partial ⚠️
...md/katalyst-agent/app/options/qrm/memory_plugin.go 57.14% 6 Missing ⚠️
pkg/util/cgroup/manager/v1/fs_linux.go 0.00% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #366      +/-   ##
==========================================
+ Coverage   53.31%   53.49%   +0.17%     
==========================================
  Files         437      439       +2     
  Lines       48155    48322     +167     
==========================================
+ Hits        25674    25848     +174     
+ Misses      19574    19553      -21     
- Partials     2907     2921      +14     
Flag Coverage Δ
unittest 53.49% <50.89%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lubinszARM lubinszARM force-pushed the pr_sockmem_handler branch 5 times, most recently from c089dbe to 86808e6 Compare November 16, 2023 02:18
@lubinszARM
Copy link
Contributor Author

Hi @waynepeking348
I have reorged the codes based on the method of 'RegisterPeriodicalHandler'.

@lubinszARM lubinszARM changed the title feat(mem-plugin): unified solution for TCP memory limitation #365 feat(sockmem-plugin): unified solution for TCP memory limitation #365 Nov 16, 2023
// register qrm memory-handlers registered in adapter
func init() {
var errList []error
errList = append(errList, periodicalhandler.RegisterPeriodicalHandler(qrm.QRMMemoryPluginPeriodicalHandlerGroupName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be better for this strategy applied through cgroup manager or cri cc @csfldf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csfldf Any idea?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the registration code to DynamicPolicy.Start, and refine it like below:

if p.enableXxx {
err := periodicalhandler.RegisterPeriodicalHandler(...)

if err != nil {
return fmt.Error("xxx failed with error: %v", err);
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pkg/util/cgroup/manager/v1/fs_linux.go Show resolved Hide resolved
@waynepeking348
Copy link
Collaborator

pls DON'T close and reopen PR arbitrarily, we need to track the comments @lubinszARM

@waynepeking348 waynepeking348 added workflow/need-review review: test succeeded, need to review enhancement New feature or request labels Nov 20, 2023
@lubinszARM lubinszARM force-pushed the pr_sockmem_handler branch 2 times, most recently from c12b4d9 to af42f13 Compare November 20, 2023 06:04
@lubinszARM lubinszARM force-pushed the pr_sockmem_handler branch 7 times, most recently from e446446 to b9239b6 Compare November 21, 2023 14:11
Signed-off-by: Robin Lu <robin.lu@bytedance.com>
Signed-off-by: Robin Lu <robin.lu@bytedance.com>
Signed-off-by: Robin Lu <robin.lu@bytedance.com>
Signed-off-by: Robin Lu <robin.lu@bytedance.com>
@waynepeking348 waynepeking348 merged commit c7f2e91 into kubewharf:main Nov 22, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request workflow/need-review review: test succeeded, need to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants