Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco "syscall event drop" #2657

Closed
shalevpenker97 opened this issue Jun 26, 2023 · 8 comments
Closed

Falco "syscall event drop" #2657

shalevpenker97 opened this issue Jun 26, 2023 · 8 comments
Assignees
Labels
Milestone

Comments

@shalevpenker97
Copy link

Describe the bug

When deploying Falco on Kubernetes we can see drop of syscalls, but it takes time for the falco pod to start dropping syscalls event , when it start dropping the event it doesnt stop until the pod is restarted, there is no different behavior to the pods running on Kubernetes in term on syscalls.

How to reproduce it

Deploy Falco at scale with these configuration:

syscall_event_drops:
# -- The messages are emitted when the percentage of dropped system calls
# with respect the number of events in the last second
# is greater than the given threshold (a double in the range [0, 1]).
threshold: .1
# -- Actions to be taken when system calls were dropped from the circular buffer.
actions:
- log
- alert
# -- Rate at which log/alert messages are emitted.
rate: .03333
# -- Max burst of messages emitted.
max_burst: 1
# -- Flag to enable drops for debug purposes.
simulate_drops: false
# -- Buffer size .
syscall_buf_size_preset: 10
# -- Custom syscalls.
base_syscalls:
custom_set: [clone, clone3, fork, vfork, execve, execveat, close]
repair: false
# -- Number of cpus for buffer.
modern_bpf:
cpus_for_each_syscall_buffer: 2

We expected the syscall event drop to trigger faster (not to take 2H) or not to happen at all.
You can see in the image below that there was high drop rate from the Falco logs and after restarting the pods at 17:00 it took another 1.5 Hours until the drop started again at around 18:35

Screenshot 2023-06-26 at 13 11 47

Environment

  • Falco version:

0.35.0

  • System info:

Mon Jun 26 09:59:35 2023: Falco version: 0.35.0 (x86_64)
Mon Jun 26 09:59:35 2023: Falco initialized with configuration file: /etc/falco/falco.yaml
Mon Jun 26 09:59:35 2023: Loading plugin 'k8saudit' from file /usr/share/falco/plugins/libk8saudit.so
Mon Jun 26 09:59:35 2023: Loading plugin 'json' from file /usr/share/falco/plugins/libjson.so
Mon Jun 26 09:59:35 2023: Loading rules from file /etc/falco/falco_rules.yaml
Mon Jun 26 09:59:35 2023: Loading rules from file /etc/falco/k8s_audit_rules.yaml
{
"machine": "x86_64",
"nodename": "falco-v9bh2",
"release": "5.10.167-200.el7.x86_64",
"sysname": "Linux",
"version": "#1 SMP Sun Feb 12 13:08:57 UTC 2023"
}

  • Cloud provider or hardware configuration:

On prem deployment - 40 cores server with 190GB memory

  • OS:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Kernel:

5.10.149-200.el7.x86_64 #1 SMP Sun Oct 23 08:59:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

  • Installation method:

Kubernetes

@Andreagit97
Copy link
Member

That's interesting thank you for reporting!

Side question:
Looking at your config I saw this

# -- Buffer size .
syscall_buf_size_preset: 10
# -- Custom syscalls.
base_syscalls:
custom_set: [clone, clone3, fork, vfork, execve, execveat, close]
repair: false

Are you using the -k option? it seems quite strange to see this huge number of drops with just 6 syscalls enabled and huge buffers like in your case 🤔

@shalevpenker97
Copy link
Author

Hi
Yes im using the -k option

- /usr/bin/falco
- --modern-bpf
- --cri
- /run/containerd/containerd.sock
- -K
- /var/run/secrets/kubernetes.io/serviceaccount/token
- -k
- https://$(KUBERNETES_SERVICE_HOST)
- --k8s-node
- $(FALCO_K8S_NODE_NAME)
- -pk

@Andreagit97
Copy link
Member

Oh ok, that's not the initial scope of the issue, but if you want to drastically reduce drops I suggest you disable it. We are working on fixing the k8s client, the actual one doesn't work so well, sorry

@shalevpenker97
Copy link
Author

I have disabled it and the drops did not reduce.

@Andreagit97 Andreagit97 added this to the 0.36.0 milestone Aug 31, 2023
@Andreagit97
Copy link
Member

ei @shalevpenker97 do you mind trying to collect some metrics with the metric config?

metrics:

In this way, we could try to understand from which syscalls drops come and why...thank you

@Andreagit97 Andreagit97 modified the milestones: 0.36.0, 0.37.0 Sep 2, 2023
@leogr
Copy link
Member

leogr commented Sep 11, 2023

cross-linking #1403

@Andreagit97
Copy link
Member

any update #2657 (comment) ?

@Andreagit97
Copy link
Member

I will close this since without further information is a duplicate of #1403. Please feel free to re-open if you have further details

@Andreagit97 Andreagit97 self-assigned this Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants