Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise loop and chunk limit for newer kernels #1795

Merged
merged 4 commits into from
Jan 23, 2024

Conversation

benkilimnik
Copy link
Member

@benkilimnik benkilimnik commented Nov 29, 2023

Summary: Dynamically increase the loop limit for newer kernels with higher instruction limits (1 million for kernels > 5.1) by 21x to reduce data loss and raise ingest. More details in #1755.

One open question is whether we want to add vizier flag to toggle this behavior in case there are unforseen performance bottlenecks for certain clusters.

Type of change: /kind feature

Test Plan: Existing targets + perf/demo tests outlined in #1755.

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik changed the title Raise loop and chunk limit for kernels >5.1 by 21x Raise loop and chunk limit for kernels >5.1 Dec 6, 2023
@benkilimnik benkilimnik changed the title Raise loop and chunk limit for kernels >5.1 Raise loop and chunk limit for newer kernels Dec 6, 2023
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik requested a review from a team December 7, 2023 18:30
@benkilimnik benkilimnik marked this pull request as ready for review December 7, 2023 18:30
Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@@ -431,6 +432,20 @@ auto SocketTraceConnector::InitPerfBufferSpecs() {
}

Status SocketTraceConnector::InitBPF() {
// set BPF loop limit and chunk limit based on kernel version
auto kernel = system::GetCachedKernelVersion();
int loop_limit = 42;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be gflags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to gflags

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
@benkilimnik benkilimnik requested a review from a team January 18, 2024 19:34
// Kernels >= 5.1 have higher BPF instruction limits (1 million for verifier).
// This enables a 21x increase to our loop and chunk limits
FLAGS_stirling_bpf_loop_limit = 882;
FLAGS_stirling_bpf_chunk_limit = 84;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick these numbers? How close are we to the instruction limit with these numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually raised the loop/chunk limits to see when the verifier would throw an error and chose values just below that upper bound, which is around 22x our previous limits.

bpf: Argument list too long. Program too large (... insns), at most 4096 insns

Confusingly, the bpf(2) syscall returns the same error whether the program size or its complexity exceeds the limits. See stackoverflow

@JamesMBartlett JamesMBartlett merged commit 8cacddc into pixie-io:main Jan 23, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants