ig container image: docs and fixes for kubectl-debug and Docker Desktop on Windows #1809

alban · 2023-07-03T09:10:04Z

Since #1787, ig is published as a container image. You can fetch it with:

docker pull ghcr.io/inspektor-gadget/ig

This PR adds documentation and fixes for kubectl-debug and Docker Desktop on Windows.

This is part of removing things from entrypoint.sh as suggested in #801. This picks up the changes in entrypoint.sh required by Docker Desktop (#1768). See also #1789 (comment).

How to use

Using ig with kubectl debug

Examples of commands:

$ kubectl debug node/minikube-docker -ti --image=ghcr.io/inspektor-gadget/ig -- ig trace exec
$ kubectl debug node/minikube-docker -ti --image=ghcr.io/inspektor-gadget/ig -- ig list-containers -o json

Using ig in a container

Example of command:

$ docker run -ti --rm \
    --privileged \
    -v /sys/fs/bpf:/sys/fs/bpf \
    -v /sys/kernel/debug:/sys/kernel/debug \
    -v /run:/run \
    -v /:/host \
    -e HOST_ROOT=/host \
    ghcr.io/inspektor-gadget/ig \
    trace exec
CONTAINER    PID        PPID       COMM  RET ARGS
cool_wright  1163565    1154511    ls    0   /bin/ls

If you use Docker Desktop on Windows, please add --pid=host in the docker command.

Testing done

I tested the commands above with albantest.azurecr.io/ig:dev-pr1 on:

minikube with docker driver
Docker Desktop on Windows with WSL2

cc @mqasimsarfraz @eiffel-fl

Fixes #1783

alban · 2023-07-03T10:08:01Z

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev -- ig trace exec
Creating debugging pod node-debugger-minikube-docker-9t8br with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.
CONTAINER                                PID        PPID       COMM             RET ARGS
k8s_shell01_shell01_default_40bfae0e-51… 3294896    3294876    ls               0   /bin/ls

Together with @blanquicet's PR #1702, we should be able to see the Kubernetes namespace and pod with the kubectl debug command above.

eiffel-fl

This PR goal is cool but I have some interrogations.
I will also need to test it.

eiffel-fl · 2023-07-03T10:12:25Z

pkg/utils/host/host.go

+	}
+
+	// Docker Desktop with WSL2 sets up host volumes with weird pidns.
+	if HostRoot != "/" {


Basically this is used to check if we are running on Windows? Or under WSL2?
Is there any other way to do so? I mean using GOHOSTOS?

I didn't find any environment variable that distinguish Windows or WSL2. GOHOSTOS would be Linux because ig is compiled for Linux and is running on the Linux kernel.

$ uname -r 5.10.102.1-microsoft-standard-WSL2

But then I thought it is better to test the actual problem rather that test the OS. The reason we need this workaround is because the procfs /host/proc is mounted in a pid namespace which is neither the pid namespace of the ig container or a parent pid namespace (therefore it is not the host pidns too). This seems to be a recent change in Docker Desktop for WSL2. So I am testing that below with readlink("/host/proc/self").

If Docker Desktop for WSL2 changes its implementation to no longer do that, then the workaround will not be used anymore.

eiffel-fl · 2023-07-03T10:45:21Z

pkg/utils/host/host.go

+	_, err = os.Stat("/run/systemd/private")
+	if err != nil {
+		err := os.Symlink("/host/run/systemd/private", "/run/systemd/private")
+		if err != nil {
+			return fmt.Errorf("linking /run/systemd/private: %w", err)
+		}
+	}


Do you use this to ensure the file does not exist?
If so, isn't there are another way to do so? Or does os.Stat() returns a specific error in case the path does not exist?

I do that because the systemdDbus Golang package hard codes "/run/systemd/private" so I can't tell it to use "/host/run/systemd/private" instead. Normally the gadget DaemonSet uses the host volume /run:/run so that would not be necessary. But here, the gadget pod was started by "kubectl debug node", so the /run is not mounted from the host and is just a tmpfs in the container. So I make the symlink to make the systemdDbus Golang package work fine.

Yes, I can check the specific error code from Stat with errors.Is(err, os.ErrNotExist).

systemdDbus Golang package hard codes "/run/systemd/private"

Any way to open a contribution to change this behavior?

pkg/utils/host/host.go

eiffel-fl

I give my approval but only for the Linux part as I only tested this part and I do not have easy access to a WSL machine:

$ docker run -ti --rm \                                                                              alban_ig_image % u=
    --privileged \
    -v /sys/fs/bpf:/sys/fs/bpf \
    -v /sys/kernel/debug:/sys/kernel/debug \
    -v /run:/run \
    -v /:/host \  
    -e HOST_ROOT=/host \
    ig-linux-amd64 \    
    trace exec       
WARN[0000] failed to get cgroups info of container b667ecc13325c2fba398c0cafeadff5d6a8aa763abca7a2e791bb87cb81e1950 from /proc/193164/cgroup: cgroup path not found in /proc/PID/cgroup 
ERRO[0000] cgroup enricher: failed to get cgroup paths on container b667ecc13325c2fba398c0cafeadff5d6a8aa763abca7a2e791bb87cb81e1950: cgroup path not found in /proc/PID/cgroup 
CONTAINER                                               PID        PPID       COMM             RET ARGS                                                                    
intelligent_visvesvaraya                                193274     193251     ls               0   /bin/ls

$ kubectl debug node/minikube-docker -ti --image ig-linux-amd64 --image-pull-policy=Never -- ig list-containers
Creating debugging pod node-debugger-minikube-docker-8x9pm with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.
RUNTIME                                        ID                                     NAME                                                                                 
docker                                         54c10eea34b28b5e27ca052d3e54e2c0ed478… k8s_debugger_node-debugger-minikube-docker-8x9pm_default_c9f68cb1-dfc8-4183-b502-247…
docker                                         18c4911d32b396df4cecfc64b021dcf417b50… k8s_etcd_etcd-minikube-docker_kube-system_3602af6795c463d205ca440218f9a4af_1         
docker                                         1dc470aee6ec2c241536d9517c89b6f54d751… k8s_kube-apiserver_kube-apiserver-minikube-docker_kube-system_4b0f5be86866fc2e09a85c…
docker                                         1a0980bf465b671b017be1d32df3957827711… k8s_kube-controller-manager_kube-controller-manager-minikube-docker_kube-system_f0da…
docker                                         b797377fa6256e774f744964792c3657040cc… k8s_kube-proxy_kube-proxy-jnr98_kube-system_26a7281e-9377-4009-ac5c-8fed3fa25792_1   
docker                                         249e18e3891f0649dfb7528703b3d57b350de… k8s_kube-scheduler_kube-scheduler-minikube-docker_kube-system_8889d4e06ea110c9727048…
docker                                         e3be405bb7001455465bb284e82c591dda6a4… k8s_storage-provisioner_storage-provisioner_kube-system_76df6fba-9121-4b0e-aebe-67b4…
INFO[0000] systemd unit "kubectl-debug-ig-c227d051.service" returned "done"

eiffel-fl · 2023-07-03T16:06:36Z

docs/ig.md

+    --privileged \
+    -v /sys/fs/bpf:/sys/fs/bpf \
+    -v /sys/kernel/debug:/sys/kernel/debug \
+    -v /run:/run \
+    -v /:/host \
+    -e HOST_ROOT=/host \


I think this would a good idea to explain why each option is needed.

eiffel-fl · 2023-07-03T16:20:30Z

pkg/utils/host/host.go

+		fmt.Sprintf("/proc/%d/root/usr/bin/ig", os.Getpid()),
+	}
+	cmd = append(cmd, os.Args[1:]...)
+	properties := []systemdDbus.Property{


Do we really need to create all this systemd related stuff?
Is it proper to kubectl debug?

Documentation for "kubectl debug node":

Although the container runs in the host IPC, Network, and PID namespaces, the pod isn't privileged.

kubectl debug starts a pod with some privileges but not all. It has the /host volume, some host namespaces and some capabilities but not CAP_SYS_ADMIN. And it does not have any option like --privileged to give the missing capabilities required by Inspektor Gadget.

So as a workaround, ig re-executes itself but on the host in a fully privileged systemd unit.

I am unsure if this workaround should be automatic. Maybe it could prompt the user on the terminal unless an environment variable $IG_AUTO_REEXEC_IN_SD_UNIT is set:

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev \ -e IG_AUTO_REEXEC_IN_SD_UNIT=1 -- ig trace exec

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev -- ig trace exec Creating debugging pod node-debugger-minikube-docker-9t8br with container debugger on node minikube-docker. If you don't see a command prompt, try pressing enter. ig is running in a container without CAP_SYS_ADMIN but with access to systemd. Reexecute ig in a systemd unit? [yes/no] yes CONTAINER PID PPID COMM RET ARGS k8s_shell01_shell01_default_40bfae0e-51… 3294896 3294876 ls 0 /bin/ls

What do you think, should I implement this prompt / envvar?

Thank you for the precision, it makes sense.

What do you think, should I implement this prompt / envvar?

I find it complicated.
In any case, without the workaround ig would not work while run from kubectl debug?
So, I think this workaround is needed.
We should nonetheless document it and I am wondering if this should just be a comment in the code or written somewhere in the documentation.

Maybe just an important question: this workaround cannot really do harm?

I find it complicated.

I talked about this with colleagues over lunch and the conclusion was that users would not like if ig creates a systemd unit without telling the user, so a flag like --allow-sdunit would be better.

At the moment, all this code is implemented in a init() function, so before CLI flags are parsed. It would need to be moved to

In any case, without the workaround ig would not work while run from kubectl debug?

Correct.

So, I think this workaround is needed.

Correct. But we should do it in a way which is not behind the user's back.

$ kubectl debug node/... -- ig trace exec Error: container without CAP_SYS_ADMIN detected. Use --allow-sdunit to try again in a privileged systemd unit. $ kubectl debug node/... -- ig trace exec --allow-sdunit CONTAINER PID PPID COMM RET ARGS foo 3294896 3294876 ls 0 /bin/ls

What do you think?

I talked about this with colleagues over lunch and the conclusion was that users would not like if ig creates a systemd unit without telling the user, so a flag like --allow-sdunit would be better.

This makes sense, let's go for this solution then!

The workaround should be described in the documentation, explaining what kubectl debug does, what is missing to be able to run ig, and what we are doing to have those missing configurations.

In addition, we should add a note in the documentation like: "To be sure the user is aware of this workaround, we intentionally let the command fail if --allow-sdunit is not passed", or something similar.

mauriciovasquezbernal

I have mixed feelings with this PR. On one side, I think the idea to use systemd on the host to escalate the capabilities of the container is brilliant, however, I also think it's too hacky as it relies on a privilege escalation to work.

I'm slightly inclined to not add this (as this relies on a hack to work), but if think there is added value and the hack is ok, I'll accept this without issues.

Btw, an upstream solution is "Debugging profiles", however I don't see any implementation of that yet.

mauriciovasquezbernal · 2023-07-04T14:30:53Z

pkg/utils/host/host.go

 	// Initialize IsHost*Ns
 	IsHostPidNs = isHostNamespace("pid")
 	IsHostNetNs = isHostNamespace("net")
 }

+func workarounds() error {


nit: What about having a different function for each workaround, so it's easier to track them all?

mauriciovasquezbernal · 2023-07-04T14:31:11Z

pkg/utils/host/host.go

+		}
+	}
+
+	// Docker Desktop with WSL2 sets up host volumes with weird pidns.


Is this an issue in WSL2?

mauriciovasquezbernal · 2023-07-04T14:52:37Z

pkg/utils/host/host.go

+// workaroundMounts ensures that filesystems are mounted correctly.
+// Some environments (e.g. minikube) runs with a read-only /sys without bpf
+// https://github.com/kubernetes/minikube/blob/99a0c91459f17ad8c83c80fc37a9ded41e34370c/deploy/kicbase/entrypoint#L76-L81
+// Docker Desktop with WSL2 also has filesystems unmounted.
+func workaroundMounts() error {


nit: this should be part of the previous commit to keep it organized from the beginning.

Also, what about adding similar comments to other functions, they're very dense and it's not easy to understand what they are for.

mauriciovasquezbernal · 2023-07-04T14:56:22Z

pkg/utils/host/host.go

+			return fmt.Errorf("stopping transient unit %q: got `%s`", unitName, s)
+		}
+	}
+	os.Exit(0)


defered calls aren't executed. It doesn't seems like a very big issue with the current code but what about fixing it? (like https://stackoverflow.com/a/27629493)

mauriciovasquezbernal · 2023-07-04T15:09:06Z

pkg/utils/host/host.go

+
+	select {
+	case s := <-statusChan:
+		log.Infof("systemd unit %q returned %q", unitName, s)


I don't see any of these logs. It'd be nice to show to the user that this workaround is being used.

This one is printed if the systemd unit returns by itself first. For example, with --timeout 5. If you press ctrl-c, it will go through the second case.

I'll add a debug to tell when the workaround is used.

But I think it should be a log.Debugf (only printed with --verbose): if the command returns json data, the log would still be parsable.

But I think it should be a log.Debugf (only printed with --verbose): if the command returns json data, the log would still be parsable.

It should not be a problem when using a logger since those messages go to stderr:

$ export IG_EXPERIMENTAL=true # json output is parseable even if a log message is printed $ go run -exec "sudo -E" ./cmd/ig/. trace exec -o json | jq INFO[0000] Experimental features enabled { "container": "mycontainer3", "timestamp": 1688584017253994000, "type": "normal", "mountnsid": 4026535441, "pid": 1076064, "ppid": 1074187, "comm": "cat", "args": [ "/bin/cat", "/dev/null" ], "uid": 0, "gid": 0, "loginuid": 4294967295, "sessionid": 4294967295 }

Ah true...

When using the -t option of kubectl debug, it creates a tty, so stdout and stderr are both the same file descriptor to the tty. I guess in that case users should not use -t if they want to parse the json... and we should check if that works fine.

blanquicet

Some comments, I'll test it once the allow-sdunit flag is implemented.

blanquicet · 2023-07-04T15:40:47Z

docs/ig.md

+    -v /sys/kernel/debug:/sys/kernel/debug \
+    -v /run:/run \
+    -v /:/host \
+    -e HOST_ROOT=/host \


Is this still needed after defining ENV HOST_ROOT=/host in ig.Dockerfile?

Not needed, I will remove.

blanquicet · 2023-07-04T16:20:11Z

docs/ig.md

+
+### Using ig with kubectl debug
+
+Examples of commands:


Please add a link to the documentation of kubectl debug.

blanquicet · 2023-07-04T16:29:01Z

pkg/utils/host/host.go

+		fmt.Sprintf("/proc/%d/root/usr/bin/ig", os.Getpid()),
+	}
+	cmd = append(cmd, os.Args[1:]...)
+	properties := []systemdDbus.Property{


The workaround should be described in the documentation, explaining what kubectl debug does, what is missing to be able to run ig, and what we are doing to have those missing configurations.

In addition, we should add a note in the documentation like: "To be sure the user is aware of this workaround, we intentionally let the command fail if --allow-sdunit is not passed", or something similar.

eiffel-fl

Some comments on the code itself.
I will test it again later.

docs/ig.md

gadget-container/gadgettracermanager/main.go

pkg/runtime/local/local.go

pkg/utils/host/host.go

eiffel-fl · 2023-07-06T13:22:21Z

pkg/utils/host/host.go

+	// if the host does not use systemd, we cannot use this workaround
+	_, err = os.Stat("/host/run/systemd/private")
+	if err != nil {
+		return false, nil
+	}


Suggested change

// if the host does not use systemd, we cannot use this workaround

_, err = os.Stat("/host/run/systemd/private")

if err != nil {

return false, nil

}

// if the host does not use systemd, we cannot use this workaround

_, err = os.Stat("/host/run/systemd/private")

if errors.Is(err, os.ErrNotExist) {

return false, nil

} else {

return false, err

}

With this change, when ig is running without CAP_SYS_ADMIN on a Linux distro without systemd, the error message will be about the non-existing systemd socket instead of being about failing later on because of the lack of CAP_SYS_ADMIN. Is it really what we want? The lack of systemd on a Linux distro without systemd is not something that the user can fix, whereas the lack of CAP_SYS_ADMIN can be fixed by the user (by creating the pod manually instead of using "kubectl debug node").

Hiding errors is something we should really avoid.

the error message will be about the non-existing systemd socket instead of being about failing later on because of the lack of CAP_SYS_ADMIN

If you do not have systemd, you will not be able to run the workaround, so better to fail here, right?

The lack of systemd on a Linux distro without systemd is not something that the user can fix

Indeed, then better to explode yelling: "get a systemd distro".

whereas the lack of CAP_SYS_ADMIN can be fixed by the user (by creating the pod manually instead of using "kubectl debug node").

I do not understand.
This whole code is code is called in the systemd related workaround which is used by kubectl debug node.
What should we do then? Printing an error message advising the user to create a privileged pod by hand?

This function is called autoSdUnitRestart and the flag is called --auto-sd-unit-restart: it was not meant to unconditionally execute the workaround but first checks if it is useful and possible. If it finds out it is not necessary (e.g. we have CAP_SYS_ADMIN) or not possible (non-systemd distro), then it does not try the workaround and continue on the normal path (and most likely fail, unless it is a command such as ig version, ig --help, etc.).

I could implement your suggestion to fail on a non-systemd distro, but then I should rename the flag to --restart-in-sd-unit to imply to the user that it is not conditional.

I am fine with either choice, I just want to make the flag name match its semantic.

If we later want to support non-systemd distro with Docker, we could implement another workaround that creates a privileged Docker container by talking to the dockerd socket. With my semantic, we could have two flags --auto-sd-unit-restart and --auto-docker-restart, the user could specify both, and ig would select whatever it finds useful and possible. With your semantic, we could have two flags --restart-in-sd-unit and --restart-in-docker, and the user must use one of them because using the wrong one would fail.

it was not meant to unconditionally execute the workaround but first checks if it is useful and possible. If it finds out it is not necessary (e.g. we have CAP_SYS_ADMIN) or not possible (non-systemd distro), then it does not try the workaround and continue on the normal path (and most likely fail

I am not highly convinced.
On one hand, if we almost sure it will fail, then better to not run ig at all and inform the user of why it will fail (lack of CAP_SYS_ADMIN or non systemd distro).
On another hand, the above would prevent users running "simple" command, like ig --help.
Moreover, if it fails due to lack of CAP_SYS_ADMIN and/or systemd, what ig would print?
If the error message explains well the problem, then let's keep your solution as is, otherwise adding explanation somewhere should help.

At one moment, you told this workaround could maybe not be needed in the future due to some modifications upstream in kubectl debug.
What is the status of this modification? Could we help bring it?
In we work upstream, we would no more need this workaround and the whole situation would be easier for everyone.

On one hand, if we almost sure it will fail, then better to not run ig at all and inform the user of why it will fail (lack of CAP_SYS_ADMIN or non systemd distro).
On another hand, the above would prevent users running "simple" command, like ig --help.

The workarounds are now done in the local runtime, so simple commands like "version" and "help" should run without attempting the workaround.

Moreover, if it fails due to lack of CAP_SYS_ADMIN and/or systemd, what ig would print?

At the moment, because of lack of capabilities:

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev3 -- ig trace exec Creating debugging pod node-debugger-minikube-docker-khj6w with container debugger on node minikube-docker. Error: initializing operators: initializing operator "LocalManager": initializing manager: failed to set memlock rlimit: operation not permitted

If the error message explains well the problem, then let's keep your solution as is, otherwise adding explanation somewhere should help.

Ok, I'll try to update this.

At one moment, you told this workaround could maybe not be needed in the future due to some modifications upstream in kubectl debug.
What is the status of this modification? Could we help bring it?
In we work upstream, we would no more need this workaround and the whole situation would be easier for everyone.

KEP-1441 (issue #1441) defines security profiles and kubernetes/kubernetes#114280 implemented some of them in kubectl v1.27. Unfortunately, even in the last version, the security profile "sysadmin" with CAP_SYS_ADMIN is not yet implemented. There are still some PRs in progress about KEP-1441 such as kubernetes/kubernetes#118647 so I assume the work in KEP-1441 with the "sysadmin" will be completed at some point, but it might take some time. I will update the documentation to reflect that.

pkg/utils/host/host.go

alban · 2023-07-06T15:32:04Z

pkg/utils/host/workaround_nonlinux.go

+//go:build !linux
+// +build !linux


The build flags here are a workaround for #1830

eiffel-fl

For the moment, the workaround is mandatory.
It is nonetheless enough documented and people should understand the consequences of using it.
If we decide to go further with this PR, we need it in any cases.
In an ideal future, we should rely on kubectl debug --profile=sysadmin once it exists.

I renew my approval but please wait for someone else to give its opinion before merging.

eiffel-fl · 2023-07-10T13:01:06Z

Hum...
I just tested it because I wanted to check something and ig lacks some permission:

$ ./kubectl debug node/minikube -ti --image-pull-policy=Never --image=ig-linux-amd64 -- ig --auto-sd-unit-restart=true trace exec
Creating debugging pod node-debugger-minikube-chh2d with container debugger on node minikube.
Error: initializing operators: initializing operator "LocalManager": initializing manager: failed to set memlock rlimit: operation not permitted

Anything that I miss somewhere?

francis@pwmachine:~/Codes/kinvolk$ ./minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

docker@minikube:~$ logout
francis@pwmachine:~/Codes/kinvolk$ ./minikube image list
...
docker.io/library/ig-linux-amd64:latest

alban · 2023-07-10T15:19:13Z

docs/ig.md

+As of today, the `kubectl debug` command does not have a way to give enough privileges to the debugging pod to be able
+to use `ig`.
+This might change in the future: the Kubernetes Enhancement Proposal 1441
+([KEP-1441](https://github.com/kubernetes/enhancements/tree/master/keps/sig-cli/1441-kubectl-debug))
+suggests to implement Debugging Profiles (`--profile=`) to be able to give the necessary privileges.
+kubectl v1.27 implements some of those profiles but not yet the "sysadmin" profile, so it is not possible to use
+`--profile=` yet.


Thanks @eiffel-fl for kubernetes/kubernetes#119200

eiffel-fl · 2023-07-11T07:44:56Z

pkg/operators/localmanager/localmanager.go

@@ -190,6 +191,15 @@ partsLoop:

 	l.rc = rc

+	hostConfig := host.Config{
+		AutoMount:             true,
+		AutoWindowsWorkaround: true,


Why is this set to true?

It is set to true for ig so that it works automatically. In the gadget pod, the windows workaround is set to false because it doesn't run on windows anyway.

Do you prefer to switch to a flag like it was done for the systemd workaround?

I'd prefer to have all those workaround off by default. My only concern is how to suggest the users to enable them? Is there a place where we can put a warning about them?

eiffel-fl

There are several points that I do not understand in this PR.
I would advise splitting in two parts:

One which document how to use the ig container image, so we can test it easily and close the corresponding issue.
Another which deals with the systemd workaround.

alban · 2023-07-11T11:44:11Z

@eiffel-fl Ok, I filed #1841 for the documentation part. Note that I needed to add more -v flags since it does not include the auto mount workaround.

mauriciovasquezbernal · 2023-07-11T16:32:04Z

gadget-container/gadgettracermanager/main.go

+		log.Infof("HostPID=%s", strconv.FormatBool(host.IsHostPidNs()))
+		log.Infof("HostNetwork=%s", strconv.FormatBool(host.IsHostNetNs()))


log.Infof("HostPID=%t", host.IsHostPidNs()) log.Infof("HostNetwork=%t", host.IsHostNetNs())

mauriciovasquezbernal · 2023-07-11T16:34:46Z

pkg/utils/host/host.go

+// capabilities.
+//
+// This is useful for the "kubectl debug node" command.
+func AddAutoSdUnitRestartFlag(command *cobra.Command) {


Make this more generic: AddFlags in case we add another flag later on?

To be honest, for the moment, I would not change it as we already plan to remove it in #1842.

I don't get it, kubernetes/kubernetes#119200 isn't even. It'd take some quite some time until users are running that version. Also, this is not only about that flag but to make it general in case there are more flags, like we're proposing for the "windows workaround"

mauriciovasquezbernal · 2023-07-11T16:36:38Z

pkg/utils/host/workaround_linux.go

+
+	// Only root can talk to the systemd socket
+	if os.Geteuid() != 0 {
+		return false, nil


Shouldn't this return an error?

mauriciovasquezbernal · 2023-07-11T16:38:41Z

pkg/utils/host/workaround_linux.go

+	// if the host does not use systemd, we cannot use this workaround
+	_, err = os.Stat("/host/run/systemd/private")
+	if err != nil {
+		return false, nil


mauriciovasquezbernal · 2023-07-11T16:41:43Z

pkg/utils/host/workaround_linux.go

+
+// autoSdUnitRestart will automatically restart the process in a privileged
+// systemd unit if the current process does not have enough capabilities.
+func autoSdUnitRestart() (exit bool, err error) {


It's not clear to me when this function returns errors, some times it returns nil, other it returns the error, etc. What about using this way:

Checking IG_IN_SYSTEMD_UNIT ro avoid recursive restarts

Check if has CAP_SYS_ADMIN

At this point the workaround is needed, any error should be returned.

WDYT?

mauriciovasquezbernal · 2023-07-11T16:45:47Z

pkg/utils/host/workaround_linux.go

+
+// autoWindowsWorkaround overrides HostRoot and HostProcFs if necessary.
+// Docker Desktop with WSL2 sets up host volumes with weird pidns.
+func autoWindowsWorkaround() error {


Can we call it WSLWorkaround to be more specific? I was confused because of "Windows".

mauriciovasquezbernal · 2023-07-11T16:48:29Z

pkg/operators/localmanager/localmanager.go

@@ -190,6 +191,15 @@ partsLoop:

 	l.rc = rc

+	hostConfig := host.Config{
+		AutoMount:             true,
+		AutoWindowsWorkaround: true,


I'd prefer to have all those workaround off by default. My only concern is how to suggest the users to enable them? Is there a place where we can put a warning about them?

mauriciovasquezbernal · 2023-07-11T16:50:22Z

pkg/prometheus/prometheus_test.go

+	// local runtime requires root to run host.Init()
+	utilstest.RequireRoot(t)


Is it still true if we disable the workarounds by default?

Still true because of the way the code is written. I'll rewrite IsHostPidNs() in a cleaner way to avoid that.

The prometheus test no longer needs to run as root.

alban · 2023-07-12T18:20:35Z

I updated the branch with better error messages. This can be tested as follows:

make ig
docker tag ig-linux-amd64:latest myrepo/ig:dev
docker push myrepo/ig:dev

The workarounds are not automatic, but the error message suggests them as appropriate:

$ kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig list-containers -w
Creating debugging pod node-debugger-minikube-docker-4pm2j with container debugger on node minikube-docker.
error: need CAP_SYS_ADMIN (did you try --auto-sd-unit-restart?)

Tested commands:

kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true list-containers -w
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true top ebpf
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true trace exec

On WSL2:

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec
error: filesystems bpf, debugfs, tracefs not mounted (did you try --auto-mount-filesystems?)

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems
error: /host/proc/self not found on WSL2 (did you try --auto-wsl-workaround?)

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems --auto-wsl-workaround
WARN[0000] /host/proc's pidns is neither the current pidns or a parent of the current pidns. Remounting.
WARN[0000] Overriding HostRoot=/proc/1296/root/ HostProcFs=/proc/1296/root/proc (lifecycle-server)
CONTAINER                         PID        PPID       COMM             RET ARGS
wizardly_franklin                 16754      14684      cat              0   /bin/cat /tmp/file

mauriciovasquezbernal

LGTM once CI is green.

mauriciovasquezbernal · 2023-07-12T21:07:46Z

pkg/runcfanotify/runcfanotify.go

-	if !host.IsHostPidNs {
+	hostPidNs, err := host.IsHostPidNs()
+	if err != nil {
+		log.Debugf("Runcfanotify: not supported: %s", err)


Suggested change

log.Debugf("Runcfanotify: not supported: %s", err)

log.Debugf("checking if current pid namespace is host pid namespace: %s", err)

?

The 'kubectl debug node' is now supported. This patch adds documentation for it. Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>

This can be tested as follows: ``` make ig docker tag ig-linux-amd64:latest myrepo/ig:dev docker push myrepo/ig:dev ``` The workarounds are not automatic, but the error message suggests them as appropriate: ``` $ kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig list-containers -w Creating debugging pod node-debugger-minikube-docker-4pm2j with container debugger on node minikube-docker. error: need CAP_SYS_ADMIN (did you try --auto-sd-unit-restart?) ``` Tested commands: ``` kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true list-containers -w kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true top ebpf kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true trace exec ``` On WSL2: ``` $ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec error: filesystems bpf, debugfs, tracefs not mounted (did you try --auto-mount-filesystems?) $ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems error: /host/proc/self not found on WSL2 (did you try --auto-wsl-workaround?) $ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems --auto-wsl-workaround WARN[0000] /host/proc's pidns is neither the current pidns or a parent of the current pidns. Remounting. WARN[0000] Overriding HostRoot=/proc/1296/root/ HostProcFs=/proc/1296/root/proc (lifecycle-server) CONTAINER PID PPID COMM RET ARGS wizardly_franklin 16754 14684 cat 0 /bin/cat /tmp/file ``` Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>

Execute the tests without the workarounds. Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>

alban · 2023-07-13T08:53:53Z

The CI failure seems unrelated to this PR. I will restart it to see if it helps.

https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/5540796137/jobs/10113500123?pr=1809

        {"node":"minikube-docker-m02","namespace":"test-trace-oomkill-7478521341844805850","pod":"test-pod","container":"test-pod-container","timestamp":1689237512775921304,"type":"normal","mountnsid":4026533542,"kpid":52473,"kcomm":"tail","pages":1081343,"tpid":52473,"tuid":1000,"tgid":2000,"tcomm":"tail"}
        
    command.go:512: invalid command output(StartOomkilGadget): verifying output with custom function: unexpected output entry:
          &types.Event{
          	... // 4 identical fields
          	Pages:         0,
          	TriggeredPid:  0,
        - 	TriggeredUid:  1000,
        + 	TriggeredUid:  0,
        - 	TriggeredGid:  2000,
        + 	TriggeredGid:  0,
          	TriggeredComm: "",
          }
...
--- FAIL: TestTraceOOMKill (21.09s)

It is weird that this TestTraceOOMKill failure only happens in the core flavor on docker (but not on containerd or cri-o or with the default flavor).

alban · 2023-07-13T09:51:05Z

The CI passed fine after a retry. The TestTraceOOMKill failure was a flake.

alban force-pushed the alban_ig_image branch from 83032d7 to 6ddcd12 Compare July 3, 2023 09:53

eiffel-fl reviewed Jul 3, 2023

View reviewed changes

alban force-pushed the alban_ig_image branch from 6ddcd12 to c3578e3 Compare July 3, 2023 14:07

eiffel-fl approved these changes Jul 3, 2023

View reviewed changes

eiffel-fl reviewed Jul 3, 2023

View reviewed changes

mauriciovasquezbernal reviewed Jul 4, 2023

View reviewed changes

blanquicet reviewed Jul 4, 2023

View reviewed changes

alban force-pushed the alban_ig_image branch 8 times, most recently from 64d463d to a551847 Compare July 6, 2023 12:58

eiffel-fl reviewed Jul 6, 2023

View reviewed changes

alban mentioned this pull request Jul 6, 2023

Prometheus: Implement metrics collection in user space #1620

Merged

6 tasks

alban force-pushed the alban_ig_image branch from a551847 to c6f4d45 Compare July 6, 2023 14:44

alban mentioned this pull request Jul 6, 2023

kubectl-gadget imports local-runtime #1830

Closed

alban commented Jul 6, 2023

View reviewed changes

alban force-pushed the alban_ig_image branch 2 times, most recently from 85651dd to 81c581a Compare July 10, 2023 10:12

eiffel-fl approved these changes Jul 10, 2023

View reviewed changes

alban mentioned this pull request Jul 10, 2023

ig: Provide container image #1640

Closed

3 tasks

alban force-pushed the alban_ig_image branch 2 times, most recently from 3d3b1fc to 4315da7 Compare July 10, 2023 14:47

alban commented Jul 10, 2023

View reviewed changes

eiffel-fl reviewed Jul 11, 2023

View reviewed changes

alban mentioned this pull request Jul 11, 2023

ig container image: add documentation #1841

Merged

mauriciovasquezbernal self-requested a review July 11, 2023 13:29

alban force-pushed the alban_ig_image branch from 4315da7 to b4c65d0 Compare July 11, 2023 14:31

mauriciovasquezbernal reviewed Jul 11, 2023

View reviewed changes

alban mentioned this pull request Jul 12, 2023

pkg/prometheus: Move config to independent package #1845

Merged

alban force-pushed the alban_ig_image branch from b4c65d0 to 68b39ef Compare July 12, 2023 18:17

alban mentioned this pull request Jul 12, 2023

Deprecate and delete runc-fanotify in favour of container-hook #1831

Closed

mauriciovasquezbernal approved these changes Jul 12, 2023

View reviewed changes

alban added 3 commits July 13, 2023 10:22

docs: ig container: update for kubectl-debug

6945d99

The 'kubectl debug node' is now supported. This patch adds documentation for it. Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>

tests: update to the new host API

ac5d9fd

Execute the tests without the workarounds. Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>

alban force-pushed the alban_ig_image branch from fdc1273 to ac5d9fd Compare July 13, 2023 08:23

alban merged commit dbc4192 into main Jul 13, 2023

alban deleted the alban_ig_image branch July 13, 2023 09:50

alban mentioned this pull request Jul 17, 2023

docs: ig: update the columns for kubectl-debug #1853

Merged

		log.Infof("HostPID=%s", strconv.FormatBool(host.IsHostPidNs()))
		log.Infof("HostNetwork=%s", strconv.FormatBool(host.IsHostNetNs()))

		// local runtime requires root to run host.Init()
		utilstest.RequireRoot(t)

	log.Debugf("Runcfanotify: not supported: %s", err)
	log.Debugf("checking if current pid namespace is host pid namespace: %s", err)

ig container image: docs and fixes for kubectl-debug and Docker Desktop on Windows #1809

ig container image: docs and fixes for kubectl-debug and Docker Desktop on Windows #1809

Conversation

alban commented Jul 3, 2023 • edited Loading

How to use

Using ig with kubectl debug

Using ig in a container

Testing done

alban commented Jul 3, 2023

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiffel-fl Jul 3, 2023 • edited Loading

Choose a reason for hiding this comment

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban Jul 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blanquicet Jul 4, 2023 • edited Loading

Choose a reason for hiding this comment

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blanquicet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blanquicet Jul 4, 2023 • edited Loading

Choose a reason for hiding this comment

eiffel-fl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban Jul 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiffel-fl left a comment • edited Loading

Choose a reason for hiding this comment

eiffel-fl commented Jul 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eiffel-fl left a comment

Choose a reason for hiding this comment

alban commented Jul 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alban commented Jul 12, 2023

alban commented Jul 3, 2023 •

edited

Loading

eiffel-fl Jul 3, 2023 •

edited

Loading

alban Jul 3, 2023 •

edited

Loading

blanquicet Jul 4, 2023 •

edited

Loading

blanquicet Jul 4, 2023 •

edited

Loading

alban Jul 10, 2023 •

edited

Loading

eiffel-fl left a comment •

edited

Loading