Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auditbeat fails on Google Container-OS >=69 / GKE >=1.10 #8523

Closed
jordansissel opened this issue Oct 1, 2018 · 6 comments
Closed

Auditbeat fails on Google Container-OS >=69 / GKE >=1.10 #8523

jordansissel opened this issue Oct 1, 2018 · 6 comments

Comments

@jordansissel
Copy link
Contributor

jordansissel commented Oct 1, 2018

Testing fails with auditbeat 6.4.1 on Google Container OS (COS) version 69

Symptom: auditbeat is unable to receive any audit logs

Possibly relevant log message:

2018-09-19T22:48:31.893Z        ERROR   [auditd]        auditd/audit_linux.go:153       Failure receiving audit events  {"error": "failed to set audit PID. An audit process is already running (PID 91)"}

auditbeat show auditd-status output:

$ docker run -it --cap-add=AUDIT_CONTROL --cap-add=AUDIT_READ --pid=host docker.elastic.co/beats/auditbeat:6.4.1 show auditd-status 
enabled 1
failure 0
pid 91
rate_limit 0
backlog_limit 8192
lost 0
backlog 0
backlog_wait_time 0
features 0x3f

$ ps -fp 91
UID          PID    PPID  C STIME TTY          TIME CMD
root          91       1  0 Sep20 ?        00:02:29 /usr/lib/systemd/systemd-journald

Background: We are using Google's Kubernetes Engine (GKE) and have currently deployed GKE 1.9.7 where we run auditbeat as a daemonset. This configuration works well! However, when testing GKE 1.10.7, we noticed that auditbeat is unable to collect audit logs. Testing in isolation, GKE 1.9.7 uses COS 65 where auditbeat works; GKE 1.10.7 uses COS 69 where auditbeat fails.

I compared systemd-journald configuration on COS 65 (where it works) and COS 69 (where it fails) and was unable to find anything indicative.

@jordansissel
Copy link
Contributor Author

I've been working with Google's support team on troubleshooting this and have no answers at this time. However, we have a possible workaround which is to completely disable journald's audit system (keeping in mind, this is only necessary on COS 69, not COS 65):

systemctl stop systemd-journald-audit.socket
systemctl mask systemd-journald-audit.socket
systemctl restart systemd-journald

@jordansissel
Copy link
Contributor Author

Sample auditbeat.yml to reproduce:


auditbeat.modules:
- module: auditd
  audit_rules: |
    -a always,exit -F arch=b64 -S execve -S execveat -S exit -S exit_group -S fork -S clone -S vfork -S accept -S accept4 -S connect -S bind -S listen

output.console:
  pretty: true

Sample output:

auditd log output
jls@auditbeat-test-cos-69 ~ $ docker run -it --cap-add=AUDIT_CONTROL --cap-add=AUDIT_READ --pid=host -v $PWD/auditbeat.yml:/e
/auditbeat/auditbeat.yml docker.elastic.co/beats/auditbeat:6.4.1  -c /etc/auditbeat/auditbeat.yml -e
2018-10-01T17:08:57.995Z        INFO    instance/beat.go:544    Home path: [/usr/share/auditbeat] Config path: [/usr/share/audi
tbeat] Data path: [/usr/share/auditbeat/data] Logs path: [/usr/share/auditbeat/logs]
2018-10-01T17:08:57.998Z        INFO    instance/beat.go:551    Beat UUID: 0790907d-a5cd-42fb-a4c3-1b01684f1c51
2018-10-01T17:08:57.999Z        INFO    [seccomp]       seccomp/seccomp.go:116  Syscall filter successfully installed
2018-10-01T17:08:57.999Z        INFO    [beat]  instance/beat.go:768    Beat info       {"system_info": {"beat": {"path": {"con
fig": "/usr/share/auditbeat", "data": "/usr/share/auditbeat/data", "home": "/usr/share/auditbeat", "logs": "/usr/share/auditbeat/logs"}, "type": "auditbeat", "uuid": "0790907d-a5cd-42fb-a4c3-1b01684f1c51"}}}                                               2018-10-01T17:08:58.000Z        INFO    [beat]  instance/beat.go:777    Build info      {"system_info": {"build": {"commit": "37b5f2d2a20f2734b2373a454b4b4cbb2627e841", "libbeat": "6.4.1", "time": "2018-09-13T21:23:13.000Z", "version": "6.4.1"}}}        2018-10-01T17:08:58.000Z        INFO    [beat]  instance/beat.go:780    Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":1,"version":"go1.10.3"}}}                                                                             2018-10-01T17:08:58.002Z        INFO    [beat]  instance/beat.go:784    Host info       {"system_info": {"host": {"architecture
":"x86_64","boot_time":"2018-09-20T16:53:39Z","containerized":false,"hostname":"a009eb576d4b","ips":["127.0.0.1/8","::1/128","1
72.17.0.2/16","fe80::42:acff:fe11:2/64"],"kernel_version":"4.14.65+","mac_addresses":["02:42:ac:11:00:02"],"os":{"family":"redh
at","platform":"centos","name":"CentOS Linux","version":"7 (Core)","major":7,"minor":5,"patch":1804,"codename":"Core"},"timezon
e":"UTC","timezone_offset_sec":0,"id":"14759c8d771e43a2b10f7402e8060d8a"}}}
2018-10-01T17:08:58.003Z        INFO    [beat]  instance/beat.go:813    Process info    {"system_info": {"process": {"capabilit
ies": {"inheritable":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw",
"sys_chroot","mknod","audit_write","audit_control","setfcap","audit_read"],"permitted":["chown","dac_override","fowner","fsetid
","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","audit_control","setfcap",
"audit_read"],"effective":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","audit_control","setfcap","audit_read"],"bounding":["chown","dac_override","fowner","fs
etid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","audit_control","setfc
ap","audit_read"],"ambient":null}, "cwd": "/usr/share/auditbeat", "exe": "/usr/share/auditbeat/auditbeat", "name": "auditbeat",
 "pid": 113551, "ppid": 113535, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2018-10-01T17:08:56.900Z"}}}
2018-10-01T17:08:58.003Z        INFO    instance/beat.go:273    Setup Beat: auditbeat; Version: 6.4.1
2018-10-01T17:08:58.005Z        INFO    pipeline/module.go:98   Beat name: a009eb576d4b
2018-10-01T17:08:58.005Z        INFO    [auditd]        auditd/audit_linux.go:104       auditd module is running as euid=0 on k
ernel=4.14.65+                                                                                                                
2018-10-01T17:08:58.056Z        INFO    [auditd]        auditd/audit_linux.go:131       socket_type=unicast will be used.
2018-10-01T17:08:58.057Z        INFO    instance/beat.go:367    auditbeat start running.
2018-10-01T17:08:58.057Z        INFO    [monitoring]    log/log.go:114  Starting metrics logging every 30s
2018-10-01T17:09:04.672Z        INFO    [auditd]        auditd/audit_linux.go:241       Deleted 2 pre-existing audit rules.
2018-10-01T17:09:04.672Z        INFO    [auditd]        auditd/audit_linux.go:260       Successfully added 2 of 2 audit rules.
2018-10-01T17:09:04.724Z        INFO    [auditd]        auditd/audit_linux.go:284       audit status from kernel at start     {
"audit_status": {"Mask":0,"Enabled":1,"Failure":0,"PID":91,"RateLimit":0,"BacklogLimit":8192,"Lost":0,"Backlog":4,"FeatureBitma
p":63,"BacklogWaitTime":0}}
2018-10-01T17:09:04.724Z        INFO    [auditd]        auditd/audit_linux.go:308       Setting kernel backlog wait time to pre
vent backpressure propagating to the kernel.
2018-10-01T17:09:04.724Z        ERROR   [auditd]        auditd/audit_linux.go:153       Failure receiving audit events  {"error
": "failed to set audit PID. An audit process is already running (PID 91)"}

@jordansissel
Copy link
Contributor Author

On COS 70 this problem still exists, and I consider it a breaking change with Google's OS. In the meantime, here is a workaround I have tested successfully:

systemctl stop systemd-journald-audit.socket
systemctl mask systemd-journald-audit.socket
systemctl restart systemd-journald

With a test configuration:

auditbeat.modules:
- module: auditd
  audit_rules: |
    -a always,exit -F arch=b64 -S execve -S execveat -S exit -S exit_group -S fork -S clone -S vfork -S accept -S accept4 -S connect -S bind -S listen


output.console:

Then running auditbeat is successful:

$ docker run -it --cap-add=AUDIT_CONTROL --cap-add=AUDIT_READ --pid=host -v $PWD/auditbeat.yml:/usr/share/auditbeat/auditbeat.yml docker.elastic.co/beats/auditbeat:6.4.1
...
2018-12-05T00:10:33.381Z        INFO    [auditd]        auditd/audit_linux.go:104       auditd module is running as euid=0 on kernel=4.14.67+
2018-12-05T00:10:33.432Z        INFO    [auditd]        auditd/audit_linux.go:131       socket_type=unicast will be used.
2018-12-05T00:10:43.809Z        INFO    instance/beat.go:367    auditbeat start running.
..
2018-12-05T00:10:50.338Z        INFO    [auditd]        auditd/audit_linux.go:241       Deleted 2 pre-existing audit rules.
2018-12-05T00:10:50.338Z        INFO    [auditd]        auditd/audit_linux.go:260       Successfully added 2 of 2 audit rules.
2018-12-05T00:10:50.389Z        INFO    [auditd]        auditd/audit_linux.go:284       audit status from kernel at start       {"audit_status": {"Mask":0,"Enabled":1,"Failure":0,"PID":0,"RateLimit":0,"BacklogLimit":8192,"Lost":0,"Backlog":
4,"FeatureBitmap":63,"BacklogWaitTime":0}}
2018-12-05T00:10:50.389Z        INFO    [auditd]        auditd/audit_linux.go:308       Setting kernel backlog wait time to prevent backpressure propagating to the kernel.
...

@jordansissel
Copy link
Contributor Author

I'm content to close this as it is seems to be an undocumented breaking change in Google Container OS and not necessarily a bug in auditbeat. As noted above, disabling systemd-journald-audit.socket works around this problem on Google COS 69 and 70.

If I hear any news from Google's team about this change, I'll try to keep this ticket updated for posterity.

@jordansissel
Copy link
Contributor Author

The proposed workaround doesn't work. systemctl stop ... simply emits an error when invoked from a container on Kubernetes:

Running in chroot, ignoring request.

Same error with systemctl restart ...

If I try to do directly what systemctl is probably doing, it also fails, but for a different reason:

          # simulate: systemctl mask systemd-journald-audit.socket                                                       
          ln -sf /dev/null /host/etc/systemd/system/systemd-journald-audit.socket                                        
          # Try to restart systemd-journald without invoking `systemctl`                                                 
          # simulate; systemctl restart systemd-journald
          pkill -u root -f systemd-journald -HUP

The above successfully restarts systemd-journald, but it doesn't clean up the audit pid. Running auditbeat after the above (which is executed in an initContainer on Kubernetes), results in this:

Failure receiving audit events  {"error": "failed to set audit PID. An audit process is already running (PID 113)"}

However, pid 113 was the original systemd-journald process that has since been restarted by the initContainer:

sh-4.2# auditbeat show status
enabled 1
failure 0
pid 113
rate_limit 0
backlog_limit 8192
lost 0
backlog 4
backlog_wait_time 0
features 0x3f
sh-4.2# ps -p 113
    PID TTY          TIME CMD

sh-4.2# ps -fp $(pgrep -f systemd-journald)
UID          PID    PPID  C STIME TTY          TIME CMD
root      108936       1  0 05:56 ?        00:00:05 /usr/lib/systemd/systemd-journald

Any ideas on what to do for the next step?

The constraints here are that Google provides the OS, and as such, I am not doing any configuration management on the host OS, so it's difficult to figure out a solution for this.

@jordansissel jordansissel changed the title Auditbeat fails on Google Container-OS 69 Auditbeat fails on Google Container-OS >=69 / GKE >=1.10 Jan 3, 2019
@jordansissel
Copy link
Contributor Author

jordansissel commented Jan 3, 2019

After asking some friends in hangops and looking at the systemd source code, I found you can set SYSTEMD_IGNORE_CHROOT=1

Running this works:

          export SYSTEMD_IGNORE_CHROOT=1
          systemctl stop systemd-journald-audit.socket
          systemctl mask systemd-journald-audit.socket
          systemctl restart systemd-journald

Also notable, in Kubernetes, one must mount hostPath volumes: /sys/fs/cgroup, /run, and /etc (perhaps not all 3 are required, but it works for me)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants