Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adopt specific-engine open methods (new libsinsp APIs) #2164

Closed

Conversation

Andreagit97
Copy link
Member

What type of PR is this?

/kind cleanup

/kind feature

Any specific area of the project related to this PR?

/area build

/area engine

What this PR does / why we need it:

This PR is based on 👉 falcosecurity/libs#540.
It introduces 4 main novelties:

  1. a new option in the configuration file single_buffer_dimension, this will be the dimension that a single buffer in our drivers will have. (BPF, kmod, modern BPF).
  2. the FALCO_BPF_PROBE env variable is now managed by the consumer (i.e Falco) and no more by the libraries. BTW the behavior is always the same so the final user won't notice any difference.
  3. a new command line flag to enable the modern_bpf engine, right now you can only try it by using FALCOSECURITY_LIBS_SOURCE_DIR and DRIVER_SOURCE_DIR options.
  4. every engine has now a dedicated wrapper method to open the inspector, see the libs PR to have more information on this update(libsinsp)!: merge all the opening methods into just one libs#540.

Which issue(s) this PR fixes:

Special notes for your reviewer:

This PR is based on 👉 falcosecurity/libs#540, the [WIP] will be removed when the libs PR will be merged

Does this PR introduce a user-facing change?:

new(cmdline/configurations): new flag `--modern-bpf` and new configuration option `single_buffer_dimension`

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@poiana
Copy link
Contributor

poiana commented Aug 12, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Andreagit97
Once this PR has been reviewed and has the lgtm label, please assign fededp for approval by writing /assign @fededp in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Andreagit97 and others added 5 commits August 12, 2022 10:18
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
the input plugin will be set during the inspector open phase

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
Co-authored-by: Jason Dellaluce <jasondellaluce@gmail.com>
every engine has a dedicated open method

Signed-off-by: Andrea Terzolo <andrea.terzolo@polito.it>
@Andreagit97 Andreagit97 force-pushed the falco_refactor_open_args branch from 497588b to c74793c Compare August 12, 2022 10:19
@incertum
Copy link
Contributor

OMG this is amazing 💯 . Thanks for all the hard work! This is a significant and much needed cleanup and also extends configurability.

Few suggestions re single_buffer_dimension:

(1) How to choose correct values? Would this for example for eBPF mean the total_size value, so it's not just ring_size but also includes header_size? How do we know an end user provided value does not break things? Would a mathematical sanitization and correction to nearest best value in the source code be safer (see (3))?

https://github.com/falcosecurity/libs/blob/master/userspace/libscap/engine/bpf/scap_bpf.c#L149

static const int BUF_SIZE_PAGES = 2048;

int page_size = getpagesize();
int ring_size = page_size * BUF_SIZE_PAGES;
int header_size = page_size;
int total_size = ring_size * 2 + header_size;

(2) How about writing down the "mathematical equations" for each driver in the yaml parameter explanation and even show how the default value is calculated? For driver-bpf for example:

Assuming page_size is 4096 bytes we get a total of ~16.78 MB hard-coded buffer size
 [(4096 bytes * 2048 * 2) + 4096 bytes]

Almost wonder if the parameter should just be the multiple of page_size and not in bytes as page_size is typically fetched dynamically? No need to change the parameter name - single_buffer_dimension would still work.

(3) For eBPF ring buffer for examples reading that ring size has to be a power of 2. How does this translate to kmod and current perf buffer in driver-bpf? Never seen any hard coded values not power of 2 AFAIK ... see (1) user input sanitization.

https://www.kernel.org/doc/html/latest/bpf/ringbuf.html?highlight=ring+buffer
Key and value sizes are enforced to be zero. max_entries is used to specify the size of ring buffer and has to be a power of 2 value.

(4) Lastly after clarifying (1) - (3), could we give 5 concrete recommendations in the yaml parameter definition? Maybe 1-2 for decrease and 3-4 good steps to increase the value? End users probably wouldn't want to become mathematicians. Maybe also include some notes on what effects of increasing or decreasing buffer sizes are? E.g. too large of a buffer on little server load could slow down the engine. A buffer too small can cause more kernel side event drops. We can iterate on such recommendations as we gather field experience, happy to help, because some of what I just mentioned may be based more on reputation and is not backed up by data and perf studies.

[This feedback applies to other parameters in falco.yaml as well so it's not specific to this PR, I think end users would appreciate even more guidance on what can happen if you increase or decrease some of the values ...]

@@ -212,6 +212,10 @@ void falco_configuration::init(string conf_filename, const vector<string> &cmdli
m_webserver_ssl_enabled = m_config->get_scalar<bool>("webserver.ssl_enabled", false);
m_webserver_ssl_certificate = m_config->get_scalar<string>("webserver.ssl_certificate", "/etc/falco/falco.pem");

// we put this value in the configuration file because in this way we can change the dimension
// at every reload.
m_single_buffer_dimension = m_config->get_scalar<uint64_t>("single_buffer_dimension", 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should or could we be doing more to ensure end user parameter input is not bogus? This would apply to any parameter and is not specific to this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what concern the validity checks for sure we have to implement them in the final drivers since they are the end users of this info, we can put also something in Falco, BTW if we use the page number as a value there is not too much to check probably that it is greater than 0 and that it is a power of 2, right?

@Andreagit97
Copy link
Member Author

Andreagit97 commented Aug 19, 2022

Hey @incertum these are all good points:

(1)(2) This was also my doubt, not sure what could be the best choice for the end user, specify the value in bytes or specify the number of pages (in this second case we need to provide the page size of the system in some way, maybe through a simple flag --print-page-size that only prints this info (?)). The second approach (number of page) would probably be better since it simplifies all the checks and is also less error-prone. Any opinion on this @jasondellaluce @FedeDP @leogr?

(3) yeah in both BPF probes we always need a power of 2, in the kmod I have to check, if I remember well we don't have this constraint here.

(4) Absolutely agree with that, probably at the beginning they would be quite silly suggestions but as we gather more experience we can enrich them as you said!

# - This number is expressed in bytes.
# - This number must be a multiple of your system page size, otherwise the allocation will fail.
# - If you leave `0`, every driver will set its internal default dimension.
single_buffer_dimension: 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
single_buffer_dimension: 0
syscall_buffer_size: 0

Since this option affects the syscall data source only, I'd prefix it with syscall_ and move it close to other syscall_ options.
Also, I'd simplify the name and explain in the comment what that size means and how it affects the operations (eg. the driver will create a buffer per CPU, syscall_buffer_size specifies the size in bytes of each buffer...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely agree with the new name!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to have this param be just a multiple of page size need to adjust the _size at the end and explain.

[nit] More additional explanations for end users less familiar with architecture. Something like:

Falco uses a shared buffer between kernel and userspace to serve events up to userspace for subsequent processing and rules matching. With the TBD_BUFFER_PARAM_NAME you can increase or decrease the default buffer size for either driver (BPF, kmod, modern BPF) wrt the system call tracing functionality. In more detail, Falco uses one buffer for every online CPU, but you generically adjust the size once with this parameter. Increasing the buffer can help reducing kernel side syscall drops while decreasing the buffer can help speed up the entire engine.

Copy link
Contributor

@incertum incertum Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought that just occurred to me, maybe have default value be actual default value and not 0 to be consistent with other parameters? @Andreagit97 @leogr
But now I can see an issue for kmod vs eBPF default values, hmmm ... and we should always gracefully start up Falco with default value like it's the case right now.

@leogr
Copy link
Member

leogr commented Aug 19, 2022

Hey @Andreagit97 and @incertum

Since the buffer size is not just a driver thing but also directly affects the performance of the userspace program, I would like the parameter to behave consistently regardless of the driver used. This would also simplify the user experience. However, I am not yet clear about the best way to achieve this.

(1)(2) This was also my doubt, not sure what could be the best choice for the end user, specify the value in bytes or specify the number of pages (in this second case we need to provide the page size of the system in some way, maybe through a simple flag --print-page-size that only prints this info (?)). The second approach (number of page) would probably be better since it simplifies all the checks and is also less error-prone. Any opinion on this @jasondellaluce @FedeDP @leogr?

Considering what I said above, specifying the value in bytes is probably not the best choice.
Does the number of pages as a unit work well with all drivers? If yes, can we explain that to the user simply and straightforwardly? 🤔

Moreover, are there other alternatives?

(3) yeah in both BPF probes we always need a power of 2, in the kmod I have to check, if I remember well we don't have this constraint here.

If possible, I'd not introduce a special case for a particular type of driver. I'd directly or indirectly force the result of the math to be a power of 2 unless a compelling use case exists for the kmod that does not work with this constraint. IMO these are the kind of questions we need to ask ourselves before making such a decision.

(4) Absolutely agree with that, probably at the beginning they would be quite silly suggestions but as we gather more experience we can enrich them as you said!

👍

In general, I'd prefer to keep things as simple as possible. Although that may be opinionated, I'd also validate the value against hard-coded min and max to make the configuration less error-prone (if we don't want to enforce limits, we can just emit warnings). I guess we can calculate the min value in some way. For the max limit, we may choose a reasonably high value. We are already enforcing arbitrary limits in similar situations. For example 👇

if(m_metadata_download_max_mb > 1024)
{
throw logic_error("Error reading config file(" + m_config_file + "): metadata download maximum size should be < 1024 Mb");
}

@leogr
Copy link
Member

leogr commented Aug 19, 2022

/milestone 0.33.0

@poiana poiana added this to the 0.33.0 milestone Aug 19, 2022
@incertum
Copy link
Contributor

@Andreagit97 @leogr @jasondellaluce @FedeDP

Go with the multiples of page sizes as parameter?
Hard-code an array of let's say 10-12 values, communicate the default value as anchor point?
That way it's on us to do the math. In addition sanitization check would be very simplified and user experience simplest and safest?

Agreed with it needing to be consistent across drivers @leogr and very simple.

@Andreagit97
Copy link
Member Author

As said here falcosecurity/libs#540 (comment) i will continue the discussion here:

Go with the multiples of page sizes as parameter?

Yeah, I think this is the right way.

Hard-code an array of let's say 10-12 values, communicate the default value as anchor point? That way it's on us to do the math.

Not sure about that it, this could be too much restrictive i think that upper and lower bound + check the power of 2 could be enough, WDYT?

In addition sanitization check would be very simplified and user experience simplest and safest?

As said in the previous point we need at least these 2 checks:

  • upper and lower bound
  • check the power of 2

And I think we need it in Falco as a consumer, but also in scap/sinsp since they are the final users of this size, so at least 2 levels

Agreed with it needing to be consistent across drivers @leogr and very simple.

Yeah also agree on that it would be easier and clearer to manage them all in the same way!

Comment on lines +171 to +172
#ifdef HAS_MODERN_BPF
("modern-bpf", "Use BPF modern probe to capture new events.", cxxopts::value(modern_bpf)->default_value("false"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to write here something about the fact that is experimental

@incertum
Copy link
Contributor

As said in the previous point we need at least these 2 checks:

  • upper and lower bound
  • check the power of 2

And I think we need it in Falco as a consumer, but also in scap/sinsp since they are the final users of this size, so at least 2 levels

Works and agreed on performing the checks twice.

[nit]
Think it would be very important to be transparent about communicating the default values and give guidance re how to best adjust the values (see previous comments).

@Andreagit97
Copy link
Member Author

Close it in favor of #2201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants