Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set eventRecordQPS to 0 #391

Closed
wants to merge 1 commit into from
Closed

Conversation

Pluies
Copy link
Contributor

@Pluies Pluies commented Jan 8, 2020

*Issue

#99

Description of changes:

See CIS Benchmark 4.2.9

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

See CIS Benchmark 4.2.9
Copy link

@mogren mogren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stefansedich
Copy link

stefansedich commented Jan 21, 2020

I had mentioned in the original PR #347 (comment) whether or not doing this is actually reasonable, considering that setting this to 0 effectively disables rate limiting of events, can someone with some more knowledge on the subject comment?

@Pluies
Copy link
Contributor Author

Pluies commented Jan 22, 2020

@stefansedich for reference and discussion, this is the relevant section in the CIS benchmark:

4.2.9 Ensure that the --event-qps argument is set to 0 or a level which ensures appropriate event capture (Not Scored)

Profile Applicability: Level 2

Description:
Security relevant information should be captured. The --event-qps flag on the Kubelet can be used to limit the rate at which events are gathered. Setting this too low could result in relevant events not being logged, however the unlimited setting of 0 could result in a denial of service on the kubelet.

Rationale:
It is important to capture all events and not restrict event creation. Events are an important source of security information and analytics that ensure that your environment is consistently monitored using the event data.
Audit:
Run the following command on each node:
ps -ef | grep kubelet
Review the value set for the --event-qps argument and determine whether this has been set to an appropriate level for the cluster. The value of 0 can be used to ensure that all events are captured.
If the --event-qps argument does not exist, check that there is a Kubelet config file specified by --config and review the value in this location.

Remediation:
If using a Kubelet config file, edit the file to set eventRecordQPS: to an appropriate level. If using command line arguments, edit the kubelet service file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf on each worker node and set the below parameter in KUBELET_SYSTEM_PODS_ARGS variable.
Based on your system, restart the kubelet service. For example:
systemctl daemon-reload
systemctl restart kubelet.service

Impact:
Setting this parameter to 0 could result in a denial of service condition due to excessive events being created. The cluster's event processing and storage systems should be scaled to handle expected event loads.

Default Value:
By default, --event-qps argument is set to 5.

References:

  1. https://kubernetes.io/docs/admin/kubelet/
  2. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/kubel
    etconfig/v1beta1/types.go

@Pluies
Copy link
Contributor Author

Pluies commented Jan 22, 2020

In my opinion, the tradeoff is between "you may lose security events" and "you may slow down your kubelet because of load", which I'd personally rate as respectively Very Bad and Bad.

I think it makes sense to err on the side of caution and follow the CIS recommendation here, but I'll happily defer to people with more expertise 👍

Copy link

@max-lobur max-lobur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mogren mogren requested a review from nckturner April 13, 2020 01:54
@zmarouf
Copy link

zmarouf commented Aug 6, 2020

Any update on this? :)
I understand it warranted an internal discussion but it's been a few months now.

@eldada
Copy link

eldada commented Sep 16, 2020

Would also love to hear an update on this.
I see MS Azure AKS set it explicitly to 0.
I also see recommendations like 12-kubernetes-configuration-best-practices and GKE cis-benchmarks suggesting to set it to 0.

@tomhuang12
Copy link

tomhuang12 commented Sep 28, 2020

Wondering about this too. Has anyone manually set it to 0 and experienced significant impact?
Or could CloudWatch be configured to store those events permanently?

@zot24
Copy link

zot24 commented Oct 23, 2020

I just wanted to highlight a section on one of the links shared in a comment above from @eldada which I think it's key to understand the implications of the value of this variable #391 (comment)

Events are Kubernetes objects stored in etcd. To avoid overwhelming etcd they are only kept for one hour, and are not an appropriate security auditing mechanism. Allowing unlimited events as suggested in this control exposes the cluster to unnecessary DoS risk and contradicts the recommendation to use admission EventRateLimits. Security relevant events that need permanent storage should be sent to logs.

source: https://cloud.google.com/anthos/gke/docs/on-prem/concepts/cis-benchmarks

@technotaff-nbs
Copy link

This is still showing as an exception on our pen-tests.

Could we please get an answer on whether setting eventRecordQPS to 0 will cause problems, or whether mitigation is planned in a future release?

What is the advise from AWS?

@joebowbeer
Copy link

joebowbeer commented May 3, 2022

I notice that the --event-qps advice is causing a lot of confusion and potential misconfiguration.

The advice is referring to the deprecated command line option, where 0 is documented to use the default value of 5, whereas in the config file, a value of 0 is documented to disable rate limiting, which is precisely what this rule is supposed to prevent.

https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/options/options.go

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/v1beta1/defaults.go

@@ -25,6 +25,7 @@
"hairpinMode": "hairpin-veth",
"cgroupDriver": "cgroupfs",
"cgroupRoot": "/",
"eventRecordQPS": 0,
Copy link

@joebowbeer joebowbeer Sep 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"eventRecordQPS": 0,

Nope. This would be unlimited, which the CIS Benchmark warns against.

Unfortunately, the CIS Benchmark is terribly confused in what it does recommend, but that is its problem.

To understand the current snafu, start by looking at the source code that implemented the deprecated --event-qps flag. It is this flag to which the CIS Benchmark recommendation originally applied, and for this flag a setting of 0 would be interpreted the same as if the flag were omitted, and the effective QPS would be 5, which is the default value for eventRecordQPS.

Then the flag was deprecated.

Unfortunately, setting eventRecordQPS=0 is explicitly disabling any rate limiting, whereas omitting eventRecordQPS will result in the default value 5.

@joebowbeer
Copy link

joebowbeer commented Sep 17, 2022

@stefansedich https://github.com/awslabs/amazon-eks-ami/pull/391/files#r973550647

@technotaff-nbs My advice is to fix the CIS Benchmarks - which I am trying to do, by the way.

Copy link

@joebowbeer joebowbeer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not set eventRecordQPS=0 (unlimited)

Let it default to 5 or tune it to a value appropriate to your cluster.

@cartermckinnon
Copy link
Member

@joebowbeer thanks for clearing this up; I'm in agreement that the CIS spec needs updating and we shouldn't make this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.