-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise loop and chunk limit for newer kernels #1795
Conversation
547ba54
to
e4697eb
Compare
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
e4697eb
to
47b7bef
Compare
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
09409f1
to
dd11bcb
Compare
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@@ -431,6 +432,20 @@ auto SocketTraceConnector::InitPerfBufferSpecs() { | |||
} | |||
|
|||
Status SocketTraceConnector::InitBPF() { | |||
// set BPF loop limit and chunk limit based on kernel version | |||
auto kernel = system::GetCachedKernelVersion(); | |||
int loop_limit = 42; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be gflags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to gflags
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
bb6822d
to
2a7f0f5
Compare
// Kernels >= 5.1 have higher BPF instruction limits (1 million for verifier). | ||
// This enables a 21x increase to our loop and chunk limits | ||
FLAGS_stirling_bpf_loop_limit = 882; | ||
FLAGS_stirling_bpf_chunk_limit = 84; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you pick these numbers? How close are we to the instruction limit with these numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually raised the loop/chunk limits to see when the verifier would throw an error and chose values just below that upper bound, which is around 22x our previous limits.
bpf: Argument list too long. Program too large (... insns), at most 4096 insns
Confusingly, the bpf(2)
syscall returns the same error whether the program size or its complexity exceeds the limits. See stackoverflow
Summary: Dynamically increase the loop limit for newer kernels with higher instruction limits (1 million for kernels > 5.1) by 21x to reduce data loss and raise ingest. More details in #1755.
One open question is whether we want to add vizier flag to toggle this behavior in case there are unforseen performance bottlenecks for certain clusters.
Type of change: /kind feature
Test Plan: Existing targets + perf/demo tests outlined in #1755.