Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(modern_bpf): [EXPERIMENTAL] support variable number of ring buffers and online CPUs #820

Merged
merged 6 commits into from
Jan 16, 2023

Conversation

Andreagit97
Copy link
Member

@Andreagit97 Andreagit97 commented Jan 10, 2023

I want to highlight that these features are experimental, which means they could change over libs releases. More in general all the modern probe is still experimental so don't expect to have a stable interface at least for this release :)

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area driver-modern-bpf

/area libscap-engine-modern-bpf

/area libpman

/area tests

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

The way in which the modern probe references memory is a little bit different from the old BPF probe.
Usually, we ask our drivers to open an 8 MB buffer for every CPU, so the kernel allocates 8 MB, and then the userspace map this memory 2 times in order to efficiently read collected events. This is not completely true in the modern BPF probe. When we ask to allocate 8 MB, the kernel under the hood maps this dimension 2 times (https://github.com/torvalds/linux/blob/5a41237ad1d4b62008f93163af1d9b1da90729d8/kernel/bpf/ringbuf.c#L107-L123), in this way both kernel and user-space implementations are simplified and they are also more efficient. Then the userspace will map the entire 16 MB exposed by the kernel. So, in the end, the virtual space of the process will be the same with modern and old BPF, what changes is the referenced memory since in one case we have only 8 MB while in the other we have 16 MB!

Here we have an example just to be surer explicit: we have 8 available CPUs and we allocate a 1 GB ring buffer for each one:

PID    USER      PR  NI    VIRT    RES    SHR  S     %CPU       %MEM     TIME+ COMMAND   
32257 root      20   0     16,0g  16,0g  16,0g S      3,0       51,6     0:01.31 scap-open (with modern bpf) 
32647 root      20   0     16,1g   8,1g   8,1g S      1,7       26,2     0:01.73 scap-open (with old probe)

This could become an issue if we have many CPUs or if we want buffer greater than 8 MB. This is the reason behind this patch!
Now the modern bpf engine exposes 2 new params:

  • cpus_for_each_buffer, allows users to specify how many CPUs they want to associate with a ring buffer. For example, cpus_for_each_buffer = 1 means that we want a ring buffer for every CPU, so like today, cpus_for_each_buffer = 2 means that we want a ring buffer every 2 CPUs, so a ring buffer will be shared between 2 CPUs. 0 is a special value and means that we want only a ring buffer shared between all the CPUs.
    The rule is: cpus_for_each_buffer must >=0 and <=max_possible_CPUs_in_the_system.
    cpus_for_each_buffer=0 and cpus_for_each_buffer=max_possible_CPUs_in_the_system do exactly the same thing, 0 is just a simpler way to specify it without knowing the available CPU of our system.
  • allocate_online_only allows user to allocate ring buffers only for online CPUs. Before this patch, the modern probe allocated a ring buffer for every available CPU and not for online CPUs! This param can be used in combination with cpus_for_each_buffer so also in this case we can associate more than one CPU to a ring buffer.

I think that these 2 new parameters could offer great flexibility to the end user.
Let's consider a final example to clarify the solution. Let's imagine we have a system with 8 CPUs. With the old probe, we will have an 8 MB buffer for every CPU (therefore a total of 64 MB). With the modern probe, we have different ways to obtain the same referenced memory:

  • a 4 MB ring buffer for every CPU
  • an 8 MB ring buffer for every CPU pair
  • a 32 MB ring buffer shared between all the 8 CPUs

Which is the best solution really depends on your system load, but now you have the power to find the optimal solution for your deployment

Which issue(s) this PR fixes:

Special notes for your reviewer:

The PR seems huge but most of the lines are due to tests or comments :)

Does this PR introduce a user-facing change?:

new(modern_bpf):  support variable number of ring buffers and online CPUs

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@FedeDP
Copy link
Contributor

FedeDP commented Jan 10, 2023

/milestone 0.10.1

@poiana poiana added this to the 0.10.1 milestone Jan 10, 2023
@incertum
Copy link
Contributor

❤️ I like this approach and btw thanks for fixing this. During tests the extra memory allocated made a huge difference for instance for machines with 64 processors ...

@FedeDP FedeDP mentioned this pull request Jan 11, 2023
FedeDP
FedeDP previously approved these changes Jan 11, 2023
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful!
/approve

@poiana
Copy link
Contributor

poiana commented Jan 11, 2023

LGTM label has been added.

Git tree hash: 93d3a3537791e593df0ec9889a7b8ccc39f8b0a0

@hbrueckner
Copy link
Contributor

@Andreagit97 Not sure if this question is a bit out-of-scope... but how will be the behavior in case of CPU hot plugging/un-plugging?

@Andreagit97
Copy link
Member Author

Andreagit97 commented Jan 11, 2023

@Andreagit97 Not sure if this question is a bit out-of-scope... but how will be the behavior in case of CPU hot plugging/un-plugging?

This is a good question indeed. The modern probe, unlikely the old drivers, is able to manage the hot-plug because it opens a ring buffer for every possible CPU in the system, and this concept is kept also in this patch. So let's consider an example:

(X) means offline CPU

CPU 0 (X) \
           RING BUF 0
CPU 1 (X) /

CPU 2  \
           RING BUF 1
CPU 3  /

CPU 4 (X) \
           RING BUF 2
CPU 5     /

In this case we have a ring buffer for every CPU pair, but the rationale is the same for all other cases. As you can notice both CPU 0 and CPU 1 are offline but the ring buffer is allocated for this pair, because these CPUs are available in the system and maybe they will become online in the next future. Same for CPU 4, the CPU is liked to RING BUF 2 and when it will become on-line it will have the ring buffer already in place for sending events

Not sure this will answer your question, if not feel free to ask :)

@Andreagit97
Copy link
Member Author

/hold

@hbrueckner
Copy link
Contributor

This is a good question indeed. The modern probe, unlikely the old drivers, is able to manage the hot-plug because it opens a ring buffer for every possible CPU in the system, and this concept is kept also in this patch. So let's consider an example:

Not sure this will answer your question, if not feel free to ask :)

It helped and I have seen that the PR uses sysconf(_SC_NPROCESSORS_CONF) to set up the ringbuffer for configured CPUs. There is the corner case to attach additional CPUs and configure them later. In that case, the number of configured CPUs would increase over time. Here is a small example:

# lscpu |grep -A2 '^CPU(s):'
CPU(s):                          3
On-line CPU(s) list:             0,1
Off-line CPU(s) list:            2

# vmcp def cpu 04
CPU 04 defined

# chcpu -r
Triggered rescan of CPUs

# lscpu |grep -A2 '^CPU(s):'
CPU(s):                          4
On-line CPU(s) list:             0,1
Off-line CPU(s) list:            2,3

@Andreagit97
Copy link
Member Author

It helped and I have seen that the PR uses sysconf(_SC_NPROCESSORS_CONF) to set up the ringbuffer for configured CPUs. There is the corner case to attach additional CPUs and configure them later. In that case, the number of configured CPUs would increase over time. Here is a small example:

Thank you very much for this suggestion! libbpf uses under-the-hood /sys/devices/system/cpu/possible (https://github.com/libbpf/libbpf/blob/3423d5e7cdab356d115aef7f987b4a1098ede448/src/libbpf.c#L12188). But I think it is still vulnerable to you corner case (https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu):

		possible: cpus that have been allocated resources and can be
		brought online if they are present.

Let's say the final solution that I see here is: allocate a ring buffer array with all possible CPUs entries (not sure what is the reliable way to do that 🤔 /sys/devices/system/cpu/kernel_max is too big unfortunately) and then according to what happens in the system, remove and add new ring buffers a run-time according to various hot-plugs. When a CPU starts and doesn't have a ringbuf we can receive a notification in userspace in order to create a dedicated buffer. We can try to do that in the next future!

The other hidden dream is to use a single ring buffer for all possible CPUs (so cpus_for_each_buffer = 0), this would be amazing but I don't think we can really support it unless we reduce the number of events we send to userspace, right now we have huge through of data :(

Let's say this PR is a sort of, let's provide users with many configurations since we are in an experimental phase, let's gather some useful info about winning deployment solutions and then release something really production ready! In this sense I would like to add to this PR the support for only online CPUs, this could be a game changer in huge environments with many CPUs disabled

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@Andreagit97 Andreagit97 changed the title new(modern_bpf): support variable number of ring buffers new(modern_bpf): support variable number of ring buffers and online CPUs Jan 13, 2023
@poiana poiana removed the lgtm label Jan 13, 2023
@poiana poiana requested a review from FedeDP January 13, 2023 10:02
@Andreagit97
Copy link
Member Author

I added the online CPUs feature, sorry for the rebase but since we need to cherry-pick it without tests it was better to do like that

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@Andreagit97 Andreagit97 changed the title new(modern_bpf): support variable number of ring buffers and online CPUs new(modern_bpf): [EXPERIMENTAL] support variable number of ring buffers and online CPUs Jan 13, 2023
ringubuf_array_fd = bpf_map__fd(g_state.skel->maps.ringbuf_maps);
if(ringubuf_array_fd <= 0)
/* CPU 0 is always online */
if(cpu_id == 0)
Copy link
Contributor

@hbrueckner hbrueckner Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I would say this is assumption is true on a single-CPU system. Otherwise, CPU 0 could be set off-line, e.g., chcpu -d 0

CPU(s):                  4
  On-line CPU(s) list:   1,2
  Off-line CPU(s) list:  0,3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tha's true the real issue here is that cpu0 has not the /sys/devices/system/cpu/cpu0/online file that we use here... same for old BPF https://github.com/falcosecurity/libs/blob/master/userspace/libscap/engine/bpf/scap_bpf.c#L1471

Copy link
Contributor

@hbrueckner hbrueckner Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might have changed these days:

# cat /sys/devices/system/cpu/cpu0/online 
1

However, fine if that's a problem on older systems. Thanks for clarification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmh cpu0/online does not exist for me either, on

uname -r
6.1.4-arch1-1

Perhaps some cpus allow the cpu0 to be disabled and the kernel then exposes their online file too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be or it could be also related to CONFIG_HOTPLUG_CPU kernel config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting, I've the CONFIG_HOTPLUG_CPU=y config but no online file for cpu0

Perhaps some cpus allow the cpu0 to be disabled and the kernel then exposes their online file too?

It could be 🤔

@@ -194,7 +194,7 @@ void open_engine(sinsp& inspector)
}
else if(!engine_string.compare(MODERN_BPF_ENGINE))
{
inspector.open_modern_bpf(buffer_bytes_dim, ppm_sc, tp_set);
inspector.open_modern_bpf(buffer_bytes_dim, DEFAULT_CPU_FOR_EACH_BUFFER, true, ppm_sc, tp_set);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see this right, in scap-open the default is false where for libsinsp it is true. I like just to confirm this difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah IMHO the default should be only online CPUs just to be compliant with the other 2 drivers, but this is a good point I can change it in the scap-open thank you :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are welcome. Thank you!

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Co-authored-by: Hendrik Brueckner <brueckner@de.ibm.com>
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana poiana added the lgtm label Jan 16, 2023
@poiana
Copy link
Contributor

poiana commented Jan 16, 2023

LGTM label has been added.

Git tree hash: b929d88e7e6919dda9f0c17be7e3c9e8f06b2510

Copy link
Contributor

@hbrueckner hbrueckner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!
/approve

@poiana
Copy link
Contributor

poiana commented Jan 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, hbrueckner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Andreagit97
Copy link
Member Author

/unhold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants