-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new(modern_bpf): [EXPERIMENTAL] support variable number of ring buffers and online CPUs #820
Conversation
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
/milestone 0.10.1 |
❤️ I like this approach and btw thanks for fixing this. During tests the extra memory allocated made a huge difference for instance for machines with 64 processors ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful!
/approve
LGTM label has been added. Git tree hash: 93d3a3537791e593df0ec9889a7b8ccc39f8b0a0
|
@Andreagit97 Not sure if this question is a bit out-of-scope... but how will be the behavior in case of CPU hot plugging/un-plugging? |
This is a good question indeed. The modern probe, unlikely the old drivers, is able to manage the hot-plug because it opens a ring buffer for every possible CPU in the system, and this concept is kept also in this patch. So let's consider an example: (X) means offline CPU
CPU 0 (X) \
RING BUF 0
CPU 1 (X) /
CPU 2 \
RING BUF 1
CPU 3 /
CPU 4 (X) \
RING BUF 2
CPU 5 / In this case we have a ring buffer for every CPU pair, but the rationale is the same for all other cases. As you can notice both CPU 0 and CPU 1 are offline but the ring buffer is allocated for this pair, because these CPUs are available in the system and maybe they will become online in the next future. Same for CPU 4, the CPU is liked to RING BUF 2 and when it will become on-line it will have the ring buffer already in place for sending events Not sure this will answer your question, if not feel free to ask :) |
/hold |
It helped and I have seen that the PR uses
|
Thank you very much for this suggestion!
Let's say the final solution that I see here is: allocate a ring buffer array with The other hidden dream is to use a single ring buffer for all possible CPUs (so Let's say this PR is a sort of, let's provide users with many configurations since we are in an experimental phase, let's gather some useful info about winning deployment solutions and then release something really production ready! In this sense I would like to add to this PR the support for only |
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
0fc5abe
to
2c91e41
Compare
I added the online CPUs feature, sorry for the rebase but since we need to cherry-pick it without tests it was better to do like that |
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
2c91e41
to
e7d5ef9
Compare
ringubuf_array_fd = bpf_map__fd(g_state.skel->maps.ringbuf_maps); | ||
if(ringubuf_array_fd <= 0) | ||
/* CPU 0 is always online */ | ||
if(cpu_id == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I would say this is assumption is true on a single-CPU system. Otherwise, CPU 0 could be set off-line, e.g., chcpu -d 0
CPU(s): 4
On-line CPU(s) list: 1,2
Off-line CPU(s) list: 0,3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tha's true the real issue here is that cpu0
has not the /sys/devices/system/cpu/cpu0/online
file that we use here... same for old BPF https://github.com/falcosecurity/libs/blob/master/userspace/libscap/engine/bpf/scap_bpf.c#L1471
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might have changed these days:
# cat /sys/devices/system/cpu/cpu0/online
1
However, fine if that's a problem on older systems. Thanks for clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmh cpu0/online does not exist for me either, on
uname -r
6.1.4-arch1-1
Perhaps some cpus allow the cpu0 to be disabled and the kernel then exposes their online
file too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be or it could be also related to CONFIG_HOTPLUG_CPU
kernel config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is interesting, I've the CONFIG_HOTPLUG_CPU=y
config but no online file for cpu0
Perhaps some cpus allow the cpu0 to be disabled and the kernel then exposes their online file too?
It could be 🤔
@@ -194,7 +194,7 @@ void open_engine(sinsp& inspector) | |||
} | |||
else if(!engine_string.compare(MODERN_BPF_ENGINE)) | |||
{ | |||
inspector.open_modern_bpf(buffer_bytes_dim, ppm_sc, tp_set); | |||
inspector.open_modern_bpf(buffer_bytes_dim, DEFAULT_CPU_FOR_EACH_BUFFER, true, ppm_sc, tp_set); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I see this right, in scap-open
the default is false
where for libsinsp
it is true
. I like just to confirm this difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah IMHO the default should be only online
CPUs just to be compliant with the other 2 drivers, but this is a good point I can change it in the scap-open thank you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are welcome. Thank you!
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it> Co-authored-by: Hendrik Brueckner <brueckner@de.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: b929d88e7e6919dda9f0c17be7e3c9e8f06b2510
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Andreagit97, FedeDP, hbrueckner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
I want to highlight that these features are experimental, which means they could change over libs releases. More in general all the modern probe is still experimental so don't expect to have a stable interface at least for this release :)
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
/area driver-modern-bpf
/area libscap-engine-modern-bpf
/area libpman
/area tests
Does this PR require a change in the driver versions?
No
What this PR does / why we need it:
The way in which the modern probe references memory is a little bit different from the old BPF probe.
Usually, we ask our drivers to open an 8 MB buffer for every CPU, so the kernel allocates 8 MB, and then the userspace map this memory 2 times in order to efficiently read collected events. This is not completely true in the modern BPF probe. When we ask to allocate 8 MB, the kernel under the hood maps this dimension 2 times (https://github.com/torvalds/linux/blob/5a41237ad1d4b62008f93163af1d9b1da90729d8/kernel/bpf/ringbuf.c#L107-L123), in this way both kernel and user-space implementations are simplified and they are also more efficient. Then the userspace will map the entire 16 MB exposed by the kernel. So, in the end, the virtual space of the process will be the same with modern and old BPF, what changes is the referenced memory since in one case we have only 8 MB while in the other we have 16 MB!
Here we have an example just to be surer explicit: we have 8 available CPUs and we allocate a 1 GB ring buffer for each one:
This could become an issue if we have many CPUs or if we want buffer greater than 8 MB. This is the reason behind this patch!
Now the modern bpf engine exposes 2 new params:
cpus_for_each_buffer
, allows users to specify how many CPUs they want to associate with a ring buffer. For example,cpus_for_each_buffer = 1
means that we want a ring buffer for every CPU, so like today,cpus_for_each_buffer = 2
means that we want a ring buffer every 2 CPUs, so a ring buffer will be shared between 2 CPUs.0
is a special value and means that we want only a ring buffer shared between all the CPUs.The rule is:
cpus_for_each_buffer
must>=0
and<=max_possible_CPUs_in_the_system
.cpus_for_each_buffer=0
andcpus_for_each_buffer=max_possible_CPUs_in_the_system
do exactly the same thing, 0 is just a simpler way to specify it without knowing the available CPU of our system.allocate_online_only
allows user to allocate ring buffers only for online CPUs. Before this patch, the modern probe allocated a ring buffer for every available CPU and not for online CPUs! This param can be used in combination withcpus_for_each_buffer
so also in this case we can associate more than one CPU to a ring buffer.I think that these 2 new parameters could offer great flexibility to the end user.
Let's consider a final example to clarify the solution. Let's imagine we have a system with
8
CPUs. With the old probe, we will have an 8 MB buffer for every CPU (therefore a total of 64 MB). With the modern probe, we have different ways to obtain the same referenced memory:Which is the best solution really depends on your system load, but now you have the power to find the optimal solution for your deployment
Which issue(s) this PR fixes:
Special notes for your reviewer:
The PR seems huge but most of the lines are due to tests or comments :)
Does this PR introduce a user-facing change?: