Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(falco): introduce new metrics w/ Falco internal: metrics snapshot option and new metrics config #2333

Merged
merged 16 commits into from
May 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions cmake/modules/falcosecurity-libs.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ if(MUSL_OPTIMIZED_BUILD)
endif()

set(SCAP_HOST_ROOT_ENV_VAR_NAME "HOST_ROOT")
set(SCAP_HOSTNAME_ENV_VAR "FALCO_HOSTNAME")
incertum marked this conversation as resolved.
Show resolved Hide resolved
set(SINSP_AGENT_CGROUP_MEM_PATH_ENV_VAR "FALCO_CGROUP_MEM_PATH")

if(NOT LIBSCAP_DIR)
set(LIBSCAP_DIR "${FALCOSECURITY_LIBS_SOURCE_DIR}")
Expand Down
111 changes: 110 additions & 1 deletion falco.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -564,4 +564,113 @@ metadata_download:
#
base_syscalls:
custom_set: []
repair: false
repair: false

# metrics: [EXPERIMENTAL] periodic metric snapshots
# (including stats and resource utilization) captured at regular intervals
#
# --- [Description]
#
# Consider these key points about the `metrics` feature in Falco:
#
# - It introduces a redesigned stats/metrics system.
# - Native support for resource utilization metrics and specialized performance metrics.
# - Metrics are emitted as monotonic counters at predefined intervals (snapshots).
# - All metrics are consolidated into a single log message, adhering to the established
# rules schema and naming conventions.
# - Additional info fields complement the metrics and facilitate customized
# statistical analyses and correlations.
# - The metrics framework is designed for easy future extension.
#
# The `metrics` feature follows a specific schema and field naming convention. All metrics
# are collected as subfields under the `output_fields` key, similar to regular Falco rules.
# Each metric field name adheres to the grammar used in Falco rules.
# There are two new field classes introduced: `falco.` and `scap.`.
# The `falco.` class represents userspace counters, statistics, resource utilization,
# or useful information fields.
# The `scap.` class represents counters and statistics mostly obtained from Falco's
# kernel instrumentation before events are sent to userspace, but can include scap
# userspace stats as well.
#
# It's important to note that the output fields and their names can be subject to change
# until the metrics feature reaches a stable release.
#
# To customize the hostname in Falco, you can set the environment variable `FALCO_HOSTNAME`
# to your desired hostname. This is particularly useful in Kubernetes deployments
# where the hostname can be set to the pod name.
#
# --- [Usage]
#
# `enabled`:
# Disabled by default.
incertum marked this conversation as resolved.
Show resolved Hide resolved
#
# `interval`:
# The stats interval in Falco follows the time duration definitions used by Prometheus.
# https://prometheus.io/docs/prometheus/latest/querying/basics/#time-durations
#
# Time durations are specified as a number, followed immediately by one of the following units:
# ms - milliseconds
# s - seconds
# m - minutes
# h - hours
# d - days - assuming a day has always 24h
# w - weeks - assuming a week has always 7d
# y - years - assuming a year has always 365d
#
# Example of a valid time duration: 1h30m20s10ms
#
# A minimum interval of 100ms is enforced for metric collection. However, for production environments,
# we recommend selecting one of the following intervals for optimal monitoring:
# 15m
# 30m
# 1h
# 4h
# 6h
#
# `output_rule`:
# To enable seamless metrics and performance monitoring, we recommend emitting metrics as the rule
# "Falco internal: metrics snapshot." This option is particularly useful when Falco logs are preserved
# in a data lake.
# Please note that to use this option, the `log_level` must be set to `info` at a minimum.
#
# `output_file`:
# Append stats to a `jsonl` file. Use with caution in production as Falco does not automatically rotate the file.
#
# `resource_utilization_enabled`:
# Emit CPU and memory usage metrics. CPU usage is reported as a percentage of one CPU and
# can be normalized to the total number of CPUs to determine overall usage.
# Memory metrics are provided in raw units (`kb` for `RSS`, `PSS` and `VSZ` or
# `bytes` for `container_memory_used`) and can be uniformly converted
# to megabytes (MB) using the `convert_memory_to_mb` functionality.
# In environments such as Kubernetes, it is crucial to track Falco's container memory usage.
# To customize the path of the memory metric file, you can create an environment variable
# named `FALCO_CGROUP_MEM_PATH` and set it to the desired file path. By default, Falco uses
# the file `/sys/fs/cgroup/memory/memory.usage_in_bytes` to monitor container memory usage,
# which aligns with Kubernetes' `container_memory_working_set_bytes` metric.
#
# `kernel_event_counters_enabled`:
# Emit kernel side event and drop counters, as an alternative to `syscall_event_drops`,
# but with some differences. These counters reflect monotonic values since Falco's start
# and are exported at a constant stats interval.
#
# `libbpf_stats_enabled`:
# Exposes statistics similar to `bpftool prog show`, providing information such as the number
# of invocations of each BPF program attached by Falco and the time spent in each program
# measured in nanoseconds.
# To enable this feature, the kernel must be >= 5.1, and the kernel configuration `/proc/sys/kernel/bpf_stats_enabled`
# must be set. This option, or an equivalent statistics feature, is not available for non `*bpf*` drivers.
# Additionally, please be aware that the current implementation of `libbpf` does not
# support granularity of statistics at the bpf tail call level.
#
# todo: prometheus export option
# todo: syscall_counters_enabled option

metrics:
enabled: false
interval: 1h
output_rule: true
# output_file: /tmp/falco_stats.jsonl
resource_utilization_enabled: true
kernel_event_counters_enabled: true
libbpf_stats_enabled: true
convert_memory_to_mb: true
32 changes: 32 additions & 0 deletions unit_tests/engine/test_falco_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,35 @@ TEST(FalcoUtils, is_unix_scheme)
char url_char[] = "unix:///falco.sock";
ASSERT_EQ(falco::utils::network::is_unix_scheme(url_char), true);
}

TEST(FalcoUtils, parse_prometheus_interval)
{
/* Test matrix around correct time conversions. */
ASSERT_EQ(falco::utils::parse_prometheus_interval("1ms"), 1UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1s"), 1000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1m"), 60000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1h"), 3600000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1d"), 86400000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1w"), 604800000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1y"), 31536000000UL);

ASSERT_EQ(falco::utils::parse_prometheus_interval("300ms"), 300UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("255s"), 255000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("5m"), 300000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("15m"), 900000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("30m"), 1800000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("60m"), 3600000UL);

/* Test matrix for concatenated time interval examples. */
ASSERT_EQ(falco::utils::parse_prometheus_interval("1h3m2s1ms"), 3600000UL + 3 * 60000UL + 2 * 1000UL + 1UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1y1w1d1h1m1s1ms"), 31536000000UL + 604800000UL + 86400000UL + 3600000UL + 60000UL + 1000UL + 1UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("2h5m"), 2 * 3600000UL + 5 * 60000UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("2h 5m"), 2 * 3600000UL + 5 * 60000UL);

ASSERT_EQ(falco::utils::parse_prometheus_interval("200"), 200UL);

/* Invalid, non prometheus compliant time ordering will result in 0ms. */
ASSERT_EQ(falco::utils::parse_prometheus_interval("1ms1y"), 0UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1t1y"), 0UL);
ASSERT_EQ(falco::utils::parse_prometheus_interval("1t"), 0UL);
}
5 changes: 1 addition & 4 deletions userspace/engine/banned.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2019 The Falco Authors.
Copyright (C) 2023 The Falco Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -37,9 +37,6 @@ limitations under the License.
#undef strcat
#define strcat(a, b) BAN(strcat)

#undef strncat
#define strncat(a, b, c) BAN(strncat)

#undef strncpy
#define strncpy(a, b, c) BAN(strncpy)

Expand Down
97 changes: 97 additions & 0 deletions userspace/engine/falco_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,109 @@ limitations under the License.
#include "utils.h"
#include "banned.h" // This raises a compilation error when certain functions are used

#include <re2/re2.h>

#define RGX_PROMETHEUS_TIME_DURATION "^((?P<y>[0-9]+)y)?((?P<w>[0-9]+)w)?((?P<d>[0-9]+)d)?((?P<h>[0-9]+)h)?((?P<m>[0-9]+)m)?((?P<s>[0-9]+)s)?((?P<ms>[0-9]+)ms)?$"

// using pre-compiled regex
static re2::RE2 s_rgx_prometheus_time_duration(RGX_PROMETHEUS_TIME_DURATION);

// Prometheus time durations: https://prometheus.io/docs/prometheus/latest/querying/basics/#time-durations
#define PROMETHEUS_UNIT_Y "y" ///> assuming a year has always 365d
#define PROMETHEUS_UNIT_W "w" ///> assuming a week has always 7d
#define PROMETHEUS_UNIT_D "d" ///> assuming a day has always 24h
#define PROMETHEUS_UNIT_H "h" ///> hour
#define PROMETHEUS_UNIT_M "m" ///> minute
#define PROMETHEUS_UNIT_S "s" ///> second
#define PROMETHEUS_UNIT_MS "ms" ///> millisecond

// standard time unit conversions to milliseconds
#define ONE_MS_TO_MS 1UL
#define ONE_SECOND_TO_MS 1000UL
#define ONE_MINUTE_TO_MS ONE_SECOND_TO_MS * 60UL
#define ONE_HOUR_TO_MS ONE_MINUTE_TO_MS * 60UL
#define ONE_DAY_TO_MS ONE_HOUR_TO_MS * 24UL
#define ONE_WEEK_TO_MS ONE_DAY_TO_MS * 7UL
#define ONE_YEAR_TO_MS ONE_DAY_TO_MS * 365UL

namespace falco
{

namespace utils
{

uint64_t parse_prometheus_interval(std::string interval_str)
{
uint64_t interval = 0;
/* Sanitize user input, remove possible whitespaces. */
interval_str.erase(remove_if(interval_str.begin(), interval_str.end(), isspace), interval_str.end());

if(!interval_str.empty())
{
/* Option 1: Passing interval directly in ms. Will be deprecated in the future. */
if(std::all_of(interval_str.begin(), interval_str.end(), ::isdigit))
{
/* todo: deprecate for Falco 0.36. */
interval = std::stoull(interval_str, nullptr, 0);
}
/* Option 2: Passing a Prometheus compliant time duration.
* https://prometheus.io/docs/prometheus/latest/querying/basics/#time-durations
*/
else
{
re2::StringPiece input(interval_str);
std::string args[14];
re2::RE2::Arg arg0(&args[0]);
re2::RE2::Arg arg1(&args[1]);
re2::RE2::Arg arg2(&args[2]);
re2::RE2::Arg arg3(&args[3]);
re2::RE2::Arg arg4(&args[4]);
re2::RE2::Arg arg5(&args[5]);
re2::RE2::Arg arg6(&args[6]);
re2::RE2::Arg arg7(&args[7]);
re2::RE2::Arg arg8(&args[8]);
re2::RE2::Arg arg9(&args[9]);
re2::RE2::Arg arg10(&args[10]);
re2::RE2::Arg arg11(&args[11]);
re2::RE2::Arg arg12(&args[12]);
re2::RE2::Arg arg13(&args[13]);
const re2::RE2::Arg* const matches[14] = {&arg0, &arg1, &arg2, &arg3, &arg4, &arg5, &arg6, &arg7, &arg8, &arg9, &arg10, &arg11, &arg12, &arg13};

const std::map<std::string, int>& named_groups = s_rgx_prometheus_time_duration.NamedCapturingGroups();
int num_groups = s_rgx_prometheus_time_duration.NumberOfCapturingGroups();
re2::RE2::FullMatchN(input, s_rgx_prometheus_time_duration, matches, num_groups);

static const char* all_prometheus_units[7] = {
PROMETHEUS_UNIT_Y, PROMETHEUS_UNIT_W, PROMETHEUS_UNIT_D, PROMETHEUS_UNIT_H,
PROMETHEUS_UNIT_M, PROMETHEUS_UNIT_S, PROMETHEUS_UNIT_MS };

static const uint64_t all_prometheus_time_conversions[7] = {
ONE_YEAR_TO_MS, ONE_WEEK_TO_MS, ONE_DAY_TO_MS, ONE_HOUR_TO_MS,
ONE_MINUTE_TO_MS, ONE_SECOND_TO_MS, ONE_MS_TO_MS };

for(size_t i = 0; i < sizeof(all_prometheus_units) / sizeof(const char*); i++)
{
std::string cur_interval_str;
uint64_t cur_interval = 0;
const auto &group_it = named_groups.find(all_prometheus_units[i]);
if(group_it != named_groups.end())
{
cur_interval_str = args[group_it->second - 1];
if(!cur_interval_str.empty())
{
cur_interval = std::stoull(cur_interval_str, nullptr, 0);
}
if(cur_interval > 0)
{
interval += cur_interval * all_prometheus_time_conversions[i];
}
}
}
}
}
return interval;
}

std::string wrap_text(const std::string& in, uint32_t indent, uint32_t line_len)
{
std::istringstream is(in);
Expand Down
2 changes: 2 additions & 0 deletions userspace/engine/falco_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ namespace falco
namespace utils
{

uint64_t parse_prometheus_interval(std::string interval_str);

std::string wrap_text(const std::string& in, uint32_t indent, uint32_t linelen);

void readfile(const std::string& filename, std::string& data);
Expand Down
24 changes: 24 additions & 0 deletions userspace/falco/app/actions/load_config.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,32 @@ limitations under the License.
*/

#include "actions.h"
#include "falco_utils.h"

using namespace falco::app;
using namespace falco::app::actions;

// applies legacy/in-deprecation options to the current config
static void apply_deprecated_options(
const falco::app::options& opts,
const std::shared_ptr<falco_configuration>& cfg)
{
if (!opts.stats_output_file.empty() || !opts.stats_interval.empty())
{
falco_logger::log(LOG_WARNING, "Options '-s' and '--stats-interval' will be deprecated in the future, metrics must be configured through config file");
if (!opts.stats_output_file.empty())
{
cfg->m_metrics_enabled = true;
cfg->m_metrics_output_file = opts.stats_output_file;
if (!opts.stats_interval.empty())
{
cfg->m_metrics_interval_str = opts.stats_interval;
cfg->m_metrics_interval = falco::utils::parse_prometheus_interval(cfg->m_metrics_interval_str);
}
}
}
}

falco::app::run_result falco::app::actions::load_config(falco::app::state& s)
{
try
Expand Down Expand Up @@ -51,6 +73,8 @@ falco::app::run_result falco::app::actions::load_config(falco::app::state& s)

s.config->m_buffered_outputs = !s.options.unbuffered_outputs;

apply_deprecated_options(s.options, s.config);

return run_result::ok();
}

Expand Down
Loading