fix(core): implement NRI handler #1674

dqsully · 2024-03-08T21:41:50Z

Purpose of PR?:

Implement NRI support in KubeArmor and mostly fix container confusion issues in BPF-LSM caused by reused Linux namespace IDs.

Does this PR introduce a breaking change? No

If the changes in this PR are manually verified, list down the scenarios covered: Verified functionality in my own sandbox environment.

Additional information for reviewer? :

This NRI solution isn't perfect, but it's much better than the existing containerd and CRI-O solutions at preventing false positives.

For whatever reason, Linux can reuse namespace IDs between containers so long as one container exits before the next one starts. Those namespace IDs are freed as soon as the container's root process exits, and they get created during the runc container initialization process. This means that if KubeArmor's enforcement ends too late for a given container, it could be incorrectly enforcing behaviors on a new container as if they were happening on the old container, including the runc init process to set up a container! On a default "Audit" posture, this is sorta ok, it will just produce false positive alerts, but on a default "Block" posture this will block the new container from being created.

This NRI solution mostly solves these namespace ID overlap issues by starting enforcement on a container after it's already started, and stopping enforcement on a container just before it stops. The only case I can come up with where this doesn't work is if a container crashes unexpectedly, in which case this NRI solution may produce false positive alerts and/or blocks, but it won't be any worse than the existing containerd and CRI-O solutions. That said, I've been running very similar NRI code in my fork of KubeArmor in production for over a month, and I haven't seen any namespace ID overlap issues and I can't cause it to happen synthetically either.

This solution does make one major tradeoff though, which is that there's a very short period of time after a container starts as well as however long it takes for the container to gracefully exit during termination where the container's actions won't be enforced. The existing containerd and CRI-O solutions don't enforce the very start of a container either, and since NRI is event-based it should both react faster on average and use less resources than the current solutions (we did see a significant decrease in CPU after switching from containerd to NRI in my environment).

Checklist:

Bug fix.
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update
PR Title follows the convention of <type>(<scope>): <subject>
Commit has unit tests
Commit has integration tests

Also, a quick note about using NRI:

As of writing this, the NRI API is currently in draft status, and while both containerd and CRI-O support NRI, it's not enabled by default in either runtime.

In my case, my company uses EKS with self-managed node groups (so we can tweak configuration as needed) and to enable NRI in the EKS nodes running containerd, we had to both update to the latest AMI and also tweak the user data of the EC2 instances to be the following script:

#!/bin/bash
set -o xtrace

KUBELET_CONFIG=/etc/kubernetes/kubelet/kubelet-config.json

# Enable the NRI plugin for containerd by modifying the containerd EKS source config file before the bootstrap script uses it
cat <<EOF >>/etc/eks/containerd/containerd-config.toml
[plugins."io.containerd.nri.v1.nri"]
    # Enable NRI support in containerd.
    disable = false
    # Allow connections from externally launched NRI plugins. (through the NRI unix socket)
    disable_connections = false
    # plugin_registration_timeout is the timeout for a plugin to register after connection.
    plugin_registration_timeout = "5s"
    # plugin_request_timeout is the timeout for a plugin to handle an event/request.
    plugin_request_timeout = "2s"
    # socket_path is the path of the NRI socket to create for plugins to connect to.
    socket_path = "/var/run/nri/nri.sock"
EOF

# Start the node
/etc/eks/bootstrap.sh

Hopefully NRI is stabilized and enabled by default in containerd and CRI-O soon, but until then I hope this extra information will help anyone else who needs to use NRI.

pkg/KubeArmorOperator/runtime/runtime.go

daemon1024

LGTM 👌🏽

pkg/KubeArmorOperator/runtime/runtime.go

In operator, for NRI detection, once it is known that containerd is available as runtime then rather setting it as runtime we go further and try to check for NRI availability using the same process/method. If NRI is not found then containerd would be set as the runtime. In implementation I've used the same logic which was used to detect containerd in case docker was available and then we were trying to see if containerd is available or not. Signed-off-by: Prateek <prateeknandle@gmail.com>

rksharma95

LGTM 👍

Aryan-sharma11

LGTM

rootxrishabh

LGTM 🚀

dqsully force-pushed the nri branch from 25536c9 to 83a1a68 Compare March 8, 2024 21:44

PrimalPimmy requested a review from daemon1024 March 11, 2024 08:43

daemon1024 requested a review from DelusionalOptimist March 15, 2024 07:50

daemon1024 mentioned this pull request Dec 4, 2024

feat(core/containerd): refactor containerd interaction to v2 APIs #1906

Merged

7 tasks

Prateeknandle force-pushed the nri branch from 83a1a68 to 90cd7f7 Compare December 5, 2024 22:21

rootxrishabh self-requested a review December 17, 2024 05:45

Prateeknandle force-pushed the nri branch 3 times, most recently from d1758ba to 8754364 Compare December 18, 2024 20:44

daemon1024 mentioned this pull request Dec 26, 2024

policy is applied randomly while using "kubectl run" to create a one-shot pod instead of a deployment #1925

Open

Prateeknandle force-pushed the nri branch 2 times, most recently from 2bd7f2a to 1758ace Compare January 14, 2025 16:48

daemon1024 reviewed Jan 15, 2025

View reviewed changes

pkg/KubeArmorOperator/runtime/runtime.go Outdated Show resolved Hide resolved

daemon1024 previously approved these changes Jan 15, 2025

View reviewed changes

rksharma95 reviewed Jan 15, 2025

View reviewed changes

pkg/KubeArmorOperator/runtime/runtime.go Show resolved Hide resolved

Prateeknandle dismissed daemon1024’s stale review via 1758ace January 15, 2025 14:16

Prateeknandle force-pushed the nri branch 4 times, most recently from 468e314 to 47f2eaf Compare January 15, 2025 14:39

daemon1024 mentioned this pull request Jan 16, 2025

feat(core): add support for kata and confidential containers #1888

Open

7 tasks

Prateeknandle force-pushed the nri branch from 47f2eaf to fd32e12 Compare January 17, 2025 13:14

Prateeknandle requested review from rksharma95 and daemon1024 January 17, 2025 13:58

rksharma95 closed this in #1906 Jan 17, 2025

daemon1024 reopened this Jan 17, 2025

Prateeknandle force-pushed the nri branch from fd32e12 to d080e8b Compare January 17, 2025 17:53

Prateeknandle force-pushed the nri branch from d080e8b to 410f8f2 Compare January 17, 2025 18:08

rksharma95 approved these changes Jan 20, 2025

View reviewed changes

Aryan-sharma11 approved these changes Jan 20, 2025

View reviewed changes

rootxrishabh approved these changes Jan 20, 2025

View reviewed changes

rksharma95 merged commit a5791b5 into kubearmor:main Jan 20, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): implement NRI handler #1674

fix(core): implement NRI handler #1674

dqsully commented Mar 8, 2024 •

edited

Loading

daemon1024 left a comment

rksharma95 left a comment •

edited

Loading

Aryan-sharma11 left a comment

rootxrishabh left a comment

fix(core): implement NRI handler #1674

fix(core): implement NRI handler #1674

Conversation

dqsully commented Mar 8, 2024 • edited Loading

daemon1024 left a comment

Choose a reason for hiding this comment

rksharma95 left a comment • edited Loading

Choose a reason for hiding this comment

Aryan-sharma11 left a comment

Choose a reason for hiding this comment

rootxrishabh left a comment

Choose a reason for hiding this comment

dqsully commented Mar 8, 2024 •

edited

Loading

rksharma95 left a comment •

edited

Loading