Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent syscall event drops when falco runs as a binary #615

Closed
devdua opened this issue May 21, 2019 · 14 comments
Closed

Frequent syscall event drops when falco runs as a binary #615

devdua opened this issue May 21, 2019 · 14 comments
Labels
kind/bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@devdua
Copy link

devdua commented May 21, 2019

Falco has been generating the below alerts frequently now, I wish to understand what can be done to reduce syscall drops? Is there an ideal way/configuration that I have to consider?

I am running falco as a binary right now.

Tue May 21 12:55:58 2019: Falco internal: syscall event drop. 1 system calls dropped in last second.12:55:58.461500387: Critical Falco internal: syscall event drop. 1 system calls dropped in last second.(ebpf_enabled=0 n_drops=1 n_drops_buffer=1 n_drops_bug=0 n_drops_pf=0 n_evts=33814)
Tue May 21 12:56:45 2019: Falco internal: syscall event drop. 2 system calls dropped in last second.12:56:45.211539735: Critical Falco internal: syscall event drop. 2 system calls dropped in last second.(ebpf_enabled=0 n_drops=2 n_drops_buffer=2 n_drops_bug=0 n_drops_pf=0 n_evts=33249)
Tue May 21 12:59:59 2019: Falco internal: syscall event drop. 3 system calls dropped in last second.12:59:59.370921648: Critical Falco internal: syscall event drop. 3 system calls dropped in last second.(ebpf_enabled=0 n_drops=3 n_drops_buffer=3 n_drops_bug=0 n_drops_pf=0 n_evts=30425)
nc13:00:47.102841198: Notice Network tool launched in container (user=root command=nc parent_process=bash container_id=f44277b1534a container_name=dazzling_saha image=nginx:latest)
13:00:56.863274771: Warning Bulk data has been removed from disk (user=root command=shred file=<NA>)
Tue May 21 13:02:59 2019: Falco internal: syscall event drop. 3 system calls dropped in last second.13:02:59.075831072: Critical Falco internal: syscall event drop. 3 system calls dropped in last second.(ebpf_enabled=0 n_drops=3 n_drops_buffer=3 n_drops_bug=0 n_drops_pf=0 n_evts=34591)
@fntlnz
Copy link
Contributor

fntlnz commented May 29, 2019

@devdua can you please tell us more about your deployment? Where is it running, how your falco.yml looks like? What is the kind of load you have there?

What version of Falco are you running?

I need a way to understand how to reproduce the issue you are reporting.

That message was introduced by this PR #561

@zwicker
Copy link

zwicker commented Jun 17, 2019

Also have this running in K8s, falco.yml below

#
# Copyright (C) 2016-2018 Draios Inc dba Sysdig.
#
# This file is part of falco .
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# File(s) or Directories containing Falco rules, loaded at startup.
# The name "rules_file" is only for backwards compatibility.
# If the entry is a file, it will be read directly. If the entry is a directory,
# every file in that directory will be read, in alphabetical order.
#
# falco_rules.yaml ships with the falco package and is overridden with
# every new software version. falco_rules.local.yaml is only created
# if it doesn't exist. If you want to customize the set of rules, add
# your customizations to falco_rules.local.yaml.
#
# The files will be read in the order presented here, so make sure if
# you have overrides they appear in later files.
rules_file:
 - /etc/falco/falco_rules.yaml
 - /etc/falco/falco_rules.local.yaml
 - /etc/falco/k8s_audit_rules.yaml
 - /etc/falco/rules.d

# If true, the times displayed in log messages and output messages
# will be in ISO 8601. By default, times are displayed in the local
# time zone, as governed by /etc/localtime.
time_format_iso_8601: false

# Whether to output events in json or text
json_output: false

# When using json output, whether or not to include the "output" property
# itself (e.g. "File below a known binary directory opened for writing
# (user=root ....") in the json output.
json_include_output_property: true

# Send information logs to stderr and/or syslog Note these are *not* security
# notification logs! These are just Falco lifecycle (and possibly error) logs.
log_stderr: true
log_syslog: true

# Minimum log level to include in logs. Note: these levels are
# separate from the priority field of rules. This refers only to the
# log level of falco's internal logging. Can be one of "emergency",
# "alert", "critical", "error", "warning", "notice", "info", "debug".
log_level: info

# Minimum rule priority level to load and run. All rules having a
# priority more severe than this level will be loaded/run.  Can be one
# of "emergency", "alert", "critical", "error", "warning", "notice",
# "info", "debug".
priority: debug

# Whether or not output to any of the output channels below is
# buffered. Defaults to false
buffered_outputs: false

# Falco uses a shared buffer between the kernel and userspace to pass
# system call information. When falco detects that this buffer is
# full and system calls have been dropped, it can take one or more of
# the following actions:
#   - "ignore": do nothing. If an empty list is provided, ignore is assumed.
#   - "log": log a CRITICAL message noting that the buffer was full.
#   - "alert": emit a falco alert noting that the buffer was full.
#   - "exit": exit falco with a non-zero rc.
#
# The rate at which log/alert messages are emitted is governed by a
# token bucket. The rate corresponds to one message every 30 seconds
# with a burst of 10 messages.

syscall_event_drops:
  actions:
    - log
    - alert
  rate: .03333
  max_burst: 10

# A throttling mechanism implemented as a token bucket limits the
# rate of falco notifications. This throttling is controlled by the following configuration
# options:
#  - rate: the number of tokens (i.e. right to send a notification)
#    gained per second. Defaults to 1.
#  - max_burst: the maximum number of tokens outstanding. Defaults to 1000.
#
# With these defaults, falco could send up to 1000 notifications after
# an initial quiet period, and then up to 1 notification per second
# afterward. It would gain the full burst back after 1000 seconds of
# no activity.

outputs:
  rate: 1
  max_burst: 1000

# Where security notifications should go.
# Multiple outputs can be enabled.

syslog_output:
  enabled: true

# If keep_alive is set to true, the file will be opened once and
# continuously written to, with each output message on its own
# line. If keep_alive is set to false, the file will be re-opened
# for each output message.
#
# Also, the file will be closed and reopened if falco is signaled with
# SIGUSR1.

file_output:
  enabled: false
  keep_alive: false
  filename: ./events.txt

stdout_output:
  enabled: true

# Falco contains an embedded webserver that can be used to accept K8s
# Audit Events. These config options control the behavior of that
# webserver. (By default, the webserver is disabled).
#
# The ssl_certificate is a combination SSL Certificate and corresponding
# key contained in a single file. You can generate a key/cert as follows:
#
# $ openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 365 -out certificate.pem
# $ cat certificate.pem key.pem > falco.pem
# $ sudo cp falco.pem /etc/falco/falco.pem

webserver:
  enabled: true
  listen_port: 8765
  k8s_audit_endpoint: /k8s_audit
  ssl_enabled: false
  ssl_certificate: /etc/falco/falco.pem

# Possible additional things you might want to do with program output:
#   - send to a slack webhook:
#         program: "jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXX"
#   - logging (alternate method than syslog):
#         program: logger -t falco-test
#   - send over a network connection:
#         program: nc host.example.com 80

# If keep_alive is set to true, the program will be started once and
# continuously written to, with each output message on its own
# line. If keep_alive is set to false, the program will be re-spawned
# for each output message.
#
# Also, the program will be closed and reopened if falco is signaled with
# SIGUSR1.
program_output:
  enabled: false
  keep_alive: false
  program: "jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXX"

http_output:
  enabled: false
  url: http://some.url

@fntlnz
Copy link
Contributor

fntlnz commented Jun 17, 2019

Hi @zwicker , can you provide the following about your env?

  • Falco version (use falco --version):
  • System info
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools (e.g. in kubernetes, rpm, deb, from source):
  • Others:

@fntlnz
Copy link
Contributor

fntlnz commented Jun 17, 2019

related to #669

@stale
Copy link

stale bot commented Aug 16, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 16, 2019
@stale stale bot closed this as completed Aug 23, 2019
@fntlnz
Copy link
Contributor

fntlnz commented Aug 23, 2019

Reopening this, we still want to fix

@fntlnz fntlnz reopened this Aug 23, 2019
@stale stale bot removed the wontfix label Aug 23, 2019
@fntlnz
Copy link
Contributor

fntlnz commented Aug 23, 2019

/kind bug

@stale
Copy link

stale bot commented Oct 22, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 22, 2019
@leodido
Copy link
Member

leodido commented Oct 25, 2019 via email

@stale stale bot removed the wontfix label Oct 25, 2019
@heydonovan
Copy link

heydonovan commented Oct 29, 2019

We are experiencing the same thing as well. Here is how our environment looks:

root@falco-nzqsh:/# falco --version
Falco version: 0.17.1
root@falco-nzqsh:/# falco --support | jq .system_info
Rule company_whitelist: warning (trailing-evttype):
container and container_started and not company_trusted_containers
         does not have all evt.type restrictions at the beginning of the condition,
         or uses a negative match (i.e. "not"/"!=") for some evt.type restriction.
         This has a performance penalty, as the rule can not be limited to specific event types.
         Consider moving all evt.type restrictions to the beginning of the rule and/or
         replacing negative matches with positive matches if possible.
{
  "machine": "x86_64",
  "nodename": "falco-nzqsh",
  "release": "4.14.133-113.105.amzn2.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP Wed Jul 10 16:57:02 UTC 2019"
}
root@falco-nzqsh:/# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux bullseye/sid"
NAME="Debian GNU/Linux"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@falco-nzqsh:/# uname -a
Linux falco-nzqsh 4.14.133-113.105.amzn2.x86_64 #1 SMP Wed Jul 10 16:57:02 UTC 2019 x86_64 GNU/Linux
ok{"output":"Falco internal: syscall event drop. 1 system calls dropped in last second.","output_fields":{"ebpf_enabled":"0","n_drops":"1","n_drops_buffer":"1","n_drops_bug":"0","n_drops_pf":"0","n_evts":"32132"},"priority":"Critical" │
│ ,"rule":"Falco internal: syscall event drop","time":"2019-10-29T23:32:03.078002564Z"}

It's the latest stable release of falco deployed via helm chart on EKS v1.13

@fntlnz
Copy link
Contributor

fntlnz commented Nov 5, 2019

Can you test it out with falco 0.18.0 ? It was a big release and we had a lot of performance and stability improvements in it. @heydonovan

/triage needs-information

@poiana poiana added the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 5, 2019
@devdua
Copy link
Author

devdua commented Nov 13, 2019

Hi @fntlnz I can close this issue for the moment as the project I was working on has been shelved for now. Thanks for the support!

@devdua devdua closed this as completed Nov 13, 2019
@alistaircross
Copy link

@fntlnz When do you predict a helm stable/falco release for 0.18.0?

@alistaircross
Copy link

Hi there, Using the tagged update: 0.18.0 with the helm charts from 0.17.1 I still see the following in the logs:

03:43:37.355343721: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=22642)
03:44:53.415220943: Critical Falco internal: syscall event drop. 2 system calls dropped in last second. (ebpf_enabled=0 n_drops=2 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=2 n_evts=21023)
03:44:58.420431312: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=20232)
03:45:37.446802423: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=34899)
03:45:47.453500536: Critical Falco internal: syscall event drop. 2 system calls dropped in last second. (ebpf_enabled=0 n_drops=2 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=2 n_evts=33029)
03:46:37.489517368: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=24936)
03:46:51.498312215: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=23206)
03:46:53.499545137: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=20005)
03:47:07.510469096: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=1 n_drops_bug=0 n_drops_pf=0 n_evts=17898)
03:47:12.513330275: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=21208)
03:47:38.532467226: Critical Falco internal: syscall event drop. 1 system calls dropped in last second. (ebpf_enabled=0 n_drops=1 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=1 n_evts=17701)
03:47:50.538770653: Critical Falco internal: syscall event drop. 2 system calls dropped in last second. (ebpf_enabled=0 n_drops=2 n_drops_buffer=0 n_drops_bug=0 n_drops_pf=2 n_evts=15813)
03:48:07.552425136: Critical Falco internal: syscall event drop. 2 system calls dropped in last second. (ebpf_enabled=0 n_drops=2 n_drops_buffer=2 n_drops_bug=0 n_drops_pf=0 n_evts=23296)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

7 participants