Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ig container image: docs and fixes for kubectl-debug and Docker Desktop on Windows #1809

Merged
merged 3 commits into from
Jul 13, 2023

Conversation

alban
Copy link
Member

@alban alban commented Jul 3, 2023

Since #1787, ig is published as a container image. You can fetch it with:

docker pull ghcr.io/inspektor-gadget/ig

This PR adds documentation and fixes for kubectl-debug and Docker Desktop on Windows.

This is part of removing things from entrypoint.sh as suggested in #801. This picks up the changes in entrypoint.sh required by Docker Desktop (#1768). See also #1789 (comment).

How to use

Using ig with kubectl debug

Examples of commands:

$ kubectl debug node/minikube-docker -ti --image=ghcr.io/inspektor-gadget/ig -- ig trace exec
$ kubectl debug node/minikube-docker -ti --image=ghcr.io/inspektor-gadget/ig -- ig list-containers -o json

Using ig in a container

Example of command:

$ docker run -ti --rm \
    --privileged \
    -v /sys/fs/bpf:/sys/fs/bpf \
    -v /sys/kernel/debug:/sys/kernel/debug \
    -v /run:/run \
    -v /:/host \
    -e HOST_ROOT=/host \
    ghcr.io/inspektor-gadget/ig \
    trace exec
CONTAINER    PID        PPID       COMM  RET ARGS
cool_wright  1163565    1154511    ls    0   /bin/ls

If you use Docker Desktop on Windows, please add --pid=host in the docker command.

Testing done

I tested the commands above with albantest.azurecr.io/ig:dev-pr1 on:

  • minikube with docker driver
  • Docker Desktop on Windows with WSL2

cc @mqasimsarfraz @eiffel-fl

Fixes #1783

@alban alban force-pushed the alban_ig_image branch from 83032d7 to 6ddcd12 Compare July 3, 2023 09:53
@alban
Copy link
Member Author

alban commented Jul 3, 2023

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev -- ig trace exec
Creating debugging pod node-debugger-minikube-docker-9t8br with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.
CONTAINER                                PID        PPID       COMM             RET ARGS
k8s_shell01_shell01_default_40bfae0e-51… 3294896    3294876    ls               0   /bin/ls

Together with @blanquicet's PR #1702, we should be able to see the Kubernetes namespace and pod with the kubectl debug command above.

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR goal is cool but I have some interrogations.
I will also need to test it.

}

// Docker Desktop with WSL2 sets up host volumes with weird pidns.
if HostRoot != "/" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this is used to check if we are running on Windows? Or under WSL2?
Is there any other way to do so? I mean using GOHOSTOS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any environment variable that distinguish Windows or WSL2. GOHOSTOS would be Linux because ig is compiled for Linux and is running on the Linux kernel.

$ uname -r
5.10.102.1-microsoft-standard-WSL2

But then I thought it is better to test the actual problem rather that test the OS. The reason we need this workaround is because the procfs /host/proc is mounted in a pid namespace which is neither the pid namespace of the ig container or a parent pid namespace (therefore it is not the host pidns too). This seems to be a recent change in Docker Desktop for WSL2. So I am testing that below with readlink("/host/proc/self").

If Docker Desktop for WSL2 changes its implementation to no longer do that, then the workaround will not be used anymore.

Comment on lines 127 to 245
_, err = os.Stat("/run/systemd/private")
if err != nil {
err := os.Symlink("/host/run/systemd/private", "/run/systemd/private")
if err != nil {
return fmt.Errorf("linking /run/systemd/private: %w", err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use this to ensure the file does not exist?
If so, isn't there are another way to do so? Or does os.Stat() returns a specific error in case the path does not exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do that because the systemdDbus Golang package hard codes "/run/systemd/private" so I can't tell it to use "/host/run/systemd/private" instead. Normally the gadget DaemonSet uses the host volume /run:/run so that would not be necessary. But here, the gadget pod was started by "kubectl debug node", so the /run is not mounted from the host and is just a tmpfs in the container. So I make the symlink to make the systemdDbus Golang package work fine.

Yes, I can check the specific error code from Stat with errors.Is(err, os.ErrNotExist).

Copy link
Member

@eiffel-fl eiffel-fl Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemdDbus Golang package hard codes "/run/systemd/private"

Any way to open a contribution to change this behavior?

@alban alban force-pushed the alban_ig_image branch from 6ddcd12 to c3578e3 Compare July 3, 2023 14:07
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give my approval but only for the Linux part as I only tested this part and I do not have easy access to a WSL machine:

$ docker run -ti --rm \                                                                              alban_ig_image % u=
    --privileged \
    -v /sys/fs/bpf:/sys/fs/bpf \
    -v /sys/kernel/debug:/sys/kernel/debug \
    -v /run:/run \
    -v /:/host \  
    -e HOST_ROOT=/host \
    ig-linux-amd64 \    
    trace exec       
WARN[0000] failed to get cgroups info of container b667ecc13325c2fba398c0cafeadff5d6a8aa763abca7a2e791bb87cb81e1950 from /proc/193164/cgroup: cgroup path not found in /proc/PID/cgroup 
ERRO[0000] cgroup enricher: failed to get cgroup paths on container b667ecc13325c2fba398c0cafeadff5d6a8aa763abca7a2e791bb87cb81e1950: cgroup path not found in /proc/PID/cgroup 
CONTAINER                                               PID        PPID       COMM             RET ARGS                                                                    
intelligent_visvesvaraya                                193274     193251     ls               0   /bin/ls                                                                 
$ kubectl debug node/minikube-docker -ti --image ig-linux-amd64 --image-pull-policy=Never -- ig list-containers
Creating debugging pod node-debugger-minikube-docker-8x9pm with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.
RUNTIME                                        ID                                     NAME                                                                                 
docker                                         54c10eea34b28b5e27ca052d3e54e2c0ed478… k8s_debugger_node-debugger-minikube-docker-8x9pm_default_c9f68cb1-dfc8-4183-b502-247…
docker                                         18c4911d32b396df4cecfc64b021dcf417b50… k8s_etcd_etcd-minikube-docker_kube-system_3602af6795c463d205ca440218f9a4af_1         
docker                                         1dc470aee6ec2c241536d9517c89b6f54d751… k8s_kube-apiserver_kube-apiserver-minikube-docker_kube-system_4b0f5be86866fc2e09a85c…
docker                                         1a0980bf465b671b017be1d32df3957827711… k8s_kube-controller-manager_kube-controller-manager-minikube-docker_kube-system_f0da…
docker                                         b797377fa6256e774f744964792c3657040cc… k8s_kube-proxy_kube-proxy-jnr98_kube-system_26a7281e-9377-4009-ac5c-8fed3fa25792_1   
docker                                         249e18e3891f0649dfb7528703b3d57b350de… k8s_kube-scheduler_kube-scheduler-minikube-docker_kube-system_8889d4e06ea110c9727048…
docker                                         e3be405bb7001455465bb284e82c591dda6a4… k8s_storage-provisioner_storage-provisioner_kube-system_76df6fba-9121-4b0e-aebe-67b4…
INFO[0000] systemd unit "kubectl-debug-ig-c227d051.service" returned "done"

docs/ig.md Outdated
Comment on lines 169 to 174
--privileged \
-v /sys/fs/bpf:/sys/fs/bpf \
-v /sys/kernel/debug:/sys/kernel/debug \
-v /run:/run \
-v /:/host \
-e HOST_ROOT=/host \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would a good idea to explain why each option is needed.

fmt.Sprintf("/proc/%d/root/usr/bin/ig", os.Getpid()),
}
cmd = append(cmd, os.Args[1:]...)
properties := []systemdDbus.Property{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to create all this systemd related stuff?
Is it proper to kubectl debug?

Copy link
Member Author

@alban alban Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation for "kubectl debug node":

Although the container runs in the host IPC, Network, and PID namespaces, the pod isn't privileged.

kubectl debug starts a pod with some privileges but not all. It has the /host volume, some host namespaces and some capabilities but not CAP_SYS_ADMIN. And it does not have any option like --privileged to give the missing capabilities required by Inspektor Gadget.

So as a workaround, ig re-executes itself but on the host in a fully privileged systemd unit.

I am unsure if this workaround should be automatic. Maybe it could prompt the user on the terminal unless an environment variable $IG_AUTO_REEXEC_IN_SD_UNIT is set:

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev \
    -e IG_AUTO_REEXEC_IN_SD_UNIT=1 -- ig trace exec
$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev -- ig trace exec
Creating debugging pod node-debugger-minikube-docker-9t8br with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.

ig is running in a container without CAP_SYS_ADMIN but with access to systemd.
Reexecute ig in a systemd unit? [yes/no] yes

CONTAINER                                PID        PPID       COMM             RET ARGS
k8s_shell01_shell01_default_40bfae0e-51… 3294896    3294876    ls               0   /bin/ls

What do you think, should I implement this prompt / envvar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the precision, it makes sense.

What do you think, should I implement this prompt / envvar?

I find it complicated.
In any case, without the workaround ig would not work while run from kubectl debug?
So, I think this workaround is needed.
We should nonetheless document it and I am wondering if this should just be a comment in the code or written somewhere in the documentation.

Maybe just an important question: this workaround cannot really do harm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it complicated.

I talked about this with colleagues over lunch and the conclusion was that users would not like if ig creates a systemd unit without telling the user, so a flag like --allow-sdunit would be better.

At the moment, all this code is implemented in a init() function, so before CLI flags are parsed. It would need to be moved to

In any case, without the workaround ig would not work while run from kubectl debug?

Correct.

So, I think this workaround is needed.

Correct. But we should do it in a way which is not behind the user's back.

$ kubectl debug node/... -- ig trace exec
Error: container without CAP_SYS_ADMIN detected. Use --allow-sdunit to try again in a privileged systemd unit.
$ kubectl debug node/... -- ig trace exec --allow-sdunit
CONTAINER  PID        PPID       COMM             RET ARGS
foo        3294896    3294876    ls               0   /bin/ls

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked about this with colleagues over lunch and the conclusion was that users would not like if ig creates a systemd unit without telling the user, so a flag like --allow-sdunit would be better.

This makes sense, let's go for this solution then!

Copy link
Member

@blanquicet blanquicet Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workaround should be described in the documentation, explaining what kubectl debug does, what is missing to be able to run ig, and what we are doing to have those missing configurations.

In addition, we should add a note in the documentation like: "To be sure the user is aware of this workaround, we intentionally let the command fail if --allow-sdunit is not passed", or something similar.

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings with this PR. On one side, I think the idea to use systemd on the host to escalate the capabilities of the container is brilliant, however, I also think it's too hacky as it relies on a privilege escalation to work.

I'm slightly inclined to not add this (as this relies on a hack to work), but if think there is added value and the hack is ok, I'll accept this without issues.

Btw, an upstream solution is "Debugging profiles", however I don't see any implementation of that yet.

// Initialize IsHost*Ns
IsHostPidNs = isHostNamespace("pid")
IsHostNetNs = isHostNamespace("net")
}

func workarounds() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What about having a different function for each workaround, so it's easier to track them all?

}
}

// Docker Desktop with WSL2 sets up host volumes with weird pidns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an issue in WSL2?

Comment on lines 219 to 223
// workaroundMounts ensures that filesystems are mounted correctly.
// Some environments (e.g. minikube) runs with a read-only /sys without bpf
// https://github.com/kubernetes/minikube/blob/99a0c91459f17ad8c83c80fc37a9ded41e34370c/deploy/kicbase/entrypoint#L76-L81
// Docker Desktop with WSL2 also has filesystems unmounted.
func workaroundMounts() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be part of the previous commit to keep it organized from the beginning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what about adding similar comments to other functions, they're very dense and it's not easy to understand what they are for.

return fmt.Errorf("stopping transient unit %q: got `%s`", unitName, s)
}
}
os.Exit(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defered calls aren't executed. It doesn't seems like a very big issue with the current code but what about fixing it? (like https://stackoverflow.com/a/27629493)


select {
case s := <-statusChan:
log.Infof("systemd unit %q returned %q", unitName, s)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any of these logs. It'd be nice to show to the user that this workaround is being used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is printed if the systemd unit returns by itself first. For example, with --timeout 5. If you press ctrl-c, it will go through the second case.

I'll add a debug to tell when the workaround is used.

But I think it should be a log.Debugf (only printed with --verbose): if the command returns json data, the log would still be parsable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it should be a log.Debugf (only printed with --verbose): if the command returns json data, the log would still be parsable.

It should not be a problem when using a logger since those messages go to stderr:

$ export IG_EXPERIMENTAL=true
# json output is parseable even if a log message is printed
$ go run -exec "sudo -E" ./cmd/ig/. trace exec -o json | jq
INFO[0000] Experimental features enabled                
{
  "container": "mycontainer3",
  "timestamp": 1688584017253994000,
  "type": "normal",
  "mountnsid": 4026535441,
  "pid": 1076064,
  "ppid": 1074187,
  "comm": "cat",
  "args": [
    "/bin/cat",
    "/dev/null"
  ],
  "uid": 0,
  "gid": 0,
  "loginuid": 4294967295,
  "sessionid": 4294967295
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true...

When using the -t option of kubectl debug, it creates a tty, so stdout and stderr are both the same file descriptor to the tty. I guess in that case users should not use -t if they want to parse the json... and we should check if that works fine.

Copy link
Member

@blanquicet blanquicet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, I'll test it once the allow-sdunit flag is implemented.

docs/ig.md Outdated
-v /sys/kernel/debug:/sys/kernel/debug \
-v /run:/run \
-v /:/host \
-e HOST_ROOT=/host \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed after defining ENV HOST_ROOT=/host in ig.Dockerfile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, I will remove.


### Using ig with kubectl debug

Examples of commands:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a link to the documentation of kubectl debug.

fmt.Sprintf("/proc/%d/root/usr/bin/ig", os.Getpid()),
}
cmd = append(cmd, os.Args[1:]...)
properties := []systemdDbus.Property{
Copy link
Member

@blanquicet blanquicet Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workaround should be described in the documentation, explaining what kubectl debug does, what is missing to be able to run ig, and what we are doing to have those missing configurations.

In addition, we should add a note in the documentation like: "To be sure the user is aware of this workaround, we intentionally let the command fail if --allow-sdunit is not passed", or something similar.

@alban alban force-pushed the alban_ig_image branch 8 times, most recently from 64d463d to a551847 Compare July 6, 2023 12:58
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the code itself.
I will test it again later.

Comment on lines 216 to 220
// if the host does not use systemd, we cannot use this workaround
_, err = os.Stat("/host/run/systemd/private")
if err != nil {
return false, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if the host does not use systemd, we cannot use this workaround
_, err = os.Stat("/host/run/systemd/private")
if err != nil {
return false, nil
}
// if the host does not use systemd, we cannot use this workaround
_, err = os.Stat("/host/run/systemd/private")
if errors.Is(err, os.ErrNotExist) {
return false, nil
} else {
return false, err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, when ig is running without CAP_SYS_ADMIN on a Linux distro without systemd, the error message will be about the non-existing systemd socket instead of being about failing later on because of the lack of CAP_SYS_ADMIN. Is it really what we want? The lack of systemd on a Linux distro without systemd is not something that the user can fix, whereas the lack of CAP_SYS_ADMIN can be fixed by the user (by creating the pod manually instead of using "kubectl debug node").

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiding errors is something we should really avoid.

the error message will be about the non-existing systemd socket instead of being about failing later on because of the lack of CAP_SYS_ADMIN

If you do not have systemd, you will not be able to run the workaround, so better to fail here, right?

The lack of systemd on a Linux distro without systemd is not something that the user can fix

Indeed, then better to explode yelling: "get a systemd distro".

whereas the lack of CAP_SYS_ADMIN can be fixed by the user (by creating the pod manually instead of using "kubectl debug node").

I do not understand.
This whole code is code is called in the systemd related workaround which is used by kubectl debug node.
What should we do then? Printing an error message advising the user to create a privileged pod by hand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called autoSdUnitRestart and the flag is called --auto-sd-unit-restart: it was not meant to unconditionally execute the workaround but first checks if it is useful and possible. If it finds out it is not necessary (e.g. we have CAP_SYS_ADMIN) or not possible (non-systemd distro), then it does not try the workaround and continue on the normal path (and most likely fail, unless it is a command such as ig version, ig --help, etc.).

I could implement your suggestion to fail on a non-systemd distro, but then I should rename the flag to --restart-in-sd-unit to imply to the user that it is not conditional.

I am fine with either choice, I just want to make the flag name match its semantic.

If we later want to support non-systemd distro with Docker, we could implement another workaround that creates a privileged Docker container by talking to the dockerd socket. With my semantic, we could have two flags --auto-sd-unit-restart and --auto-docker-restart, the user could specify both, and ig would select whatever it finds useful and possible. With your semantic, we could have two flags --restart-in-sd-unit and --restart-in-docker, and the user must use one of them because using the wrong one would fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was not meant to unconditionally execute the workaround but first checks if it is useful and possible. If it finds out it is not necessary (e.g. we have CAP_SYS_ADMIN) or not possible (non-systemd distro), then it does not try the workaround and continue on the normal path (and most likely fail

I am not highly convinced.
On one hand, if we almost sure it will fail, then better to not run ig at all and inform the user of why it will fail (lack of CAP_SYS_ADMIN or non systemd distro).
On another hand, the above would prevent users running "simple" command, like ig --help.
Moreover, if it fails due to lack of CAP_SYS_ADMIN and/or systemd, what ig would print?
If the error message explains well the problem, then let's keep your solution as is, otherwise adding explanation somewhere should help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one moment, you told this workaround could maybe not be needed in the future due to some modifications upstream in kubectl debug.
What is the status of this modification? Could we help bring it?
In we work upstream, we would no more need this workaround and the whole situation would be easier for everyone.

Copy link
Member Author

@alban alban Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, if we almost sure it will fail, then better to not run ig at all and inform the user of why it will fail (lack of CAP_SYS_ADMIN or non systemd distro).
On another hand, the above would prevent users running "simple" command, like ig --help.

The workarounds are now done in the local runtime, so simple commands like "version" and "help" should run without attempting the workaround.

Moreover, if it fails due to lack of CAP_SYS_ADMIN and/or systemd, what ig would print?

At the moment, because of lack of capabilities:

$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev3 -- ig trace exec
Creating debugging pod node-debugger-minikube-docker-khj6w with container debugger on node minikube-docker.
Error: initializing operators: initializing operator "LocalManager": initializing manager: failed to set memlock rlimit: operation not permitted

If the error message explains well the problem, then let's keep your solution as is, otherwise adding explanation somewhere should help.

Ok, I'll try to update this.

At one moment, you told this workaround could maybe not be needed in the future due to some modifications upstream in kubectl debug.
What is the status of this modification? Could we help bring it?
In we work upstream, we would no more need this workaround and the whole situation would be easier for everyone.

KEP-1441 (issue #1441) defines security profiles and kubernetes/kubernetes#114280 implemented some of them in kubectl v1.27. Unfortunately, even in the last version, the security profile "sysadmin" with CAP_SYS_ADMIN is not yet implemented. There are still some PRs in progress about KEP-1441 such as kubernetes/kubernetes#118647 so I assume the work in KEP-1441 with the "sysadmin" will be completed at some point, but it might take some time. I will update the documentation to reflect that.

Comment on lines 1 to 2
//go:build !linux
// +build !linux
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build flags here are a workaround for #1830

@alban alban force-pushed the alban_ig_image branch 2 times, most recently from 85651dd to 81c581a Compare July 10, 2023 10:12
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, the workaround is mandatory.
It is nonetheless enough documented and people should understand the consequences of using it.
If we decide to go further with this PR, we need it in any cases.
In an ideal future, we should rely on kubectl debug --profile=sysadmin once it exists.

I renew my approval but please wait for someone else to give its opinion before merging.

@alban alban mentioned this pull request Jul 10, 2023
3 tasks
@eiffel-fl
Copy link
Member

Hum...
I just tested it because I wanted to check something and ig lacks some permission:

$ ./kubectl debug node/minikube -ti --image-pull-policy=Never --image=ig-linux-amd64 -- ig --auto-sd-unit-restart=true trace exec
Creating debugging pod node-debugger-minikube-chh2d with container debugger on node minikube.
Error: initializing operators: initializing operator "LocalManager": initializing manager: failed to set memlock rlimit: operation not permitted

Anything that I miss somewhere?

francis@pwmachine:~/Codes/kinvolk$ ./minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured

docker@minikube:~$ logout
francis@pwmachine:~/Codes/kinvolk$ ./minikube image list
...
docker.io/library/ig-linux-amd64:latest

@alban alban force-pushed the alban_ig_image branch 2 times, most recently from 3d3b1fc to 4315da7 Compare July 10, 2023 14:47
Comment on lines +180 to +186
As of today, the `kubectl debug` command does not have a way to give enough privileges to the debugging pod to be able
to use `ig`.
This might change in the future: the Kubernetes Enhancement Proposal 1441
([KEP-1441](https://github.com/kubernetes/enhancements/tree/master/keps/sig-cli/1441-kubectl-debug))
suggests to implement Debugging Profiles (`--profile=`) to be able to give the necessary privileges.
kubectl v1.27 implements some of those profiles but not yet the "sysadmin" profile, so it is not possible to use
`--profile=` yet.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -190,6 +191,15 @@ partsLoop:

l.rc = rc

hostConfig := host.Config{
AutoMount: true,
AutoWindowsWorkaround: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this set to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is set to true for ig so that it works automatically. In the gadget pod, the windows workaround is set to false because it doesn't run on windows anyway.

Do you prefer to switch to a flag like it was done for the systemd workaround?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have all those workaround off by default. My only concern is how to suggest the users to enable them? Is there a place where we can put a warning about them?

Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several points that I do not understand in this PR.
I would advise splitting in two parts:

  1. One which document how to use the ig container image, so we can test it easily and close the corresponding issue.
  2. Another which deals with the systemd workaround.

@alban
Copy link
Member Author

alban commented Jul 11, 2023

@eiffel-fl Ok, I filed #1841 for the documentation part. Note that I needed to add more -v flags since it does not include the auto mount workaround.

Comment on lines 269 to 270
log.Infof("HostPID=%s", strconv.FormatBool(host.IsHostPidNs()))
log.Infof("HostNetwork=%s", strconv.FormatBool(host.IsHostNetNs()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		log.Infof("HostPID=%t", host.IsHostPidNs())
		log.Infof("HostNetwork=%t", host.IsHostNetNs())

// capabilities.
//
// This is useful for the "kubectl debug node" command.
func AddAutoSdUnitRestartFlag(command *cobra.Command) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this more generic: AddFlags in case we add another flag later on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, for the moment, I would not change it as we already plan to remove it in #1842.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, kubernetes/kubernetes#119200 isn't even. It'd take some quite some time until users are running that version. Also, this is not only about that flag but to make it general in case there are more flags, like we're proposing for the "windows workaround"


// Only root can talk to the systemd socket
if os.Geteuid() != 0 {
return false, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return an error?

// if the host does not use systemd, we cannot use this workaround
_, err = os.Stat("/host/run/systemd/private")
if err != nil {
return false, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


// autoSdUnitRestart will automatically restart the process in a privileged
// systemd unit if the current process does not have enough capabilities.
func autoSdUnitRestart() (exit bool, err error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me when this function returns errors, some times it returns nil, other it returns the error, etc. What about using this way:

  • Checking IG_IN_SYSTEMD_UNIT ro avoid recursive restarts
  • Check if has CAP_SYS_ADMIN

At this point the workaround is needed, any error should be returned.

WDYT?


// autoWindowsWorkaround overrides HostRoot and HostProcFs if necessary.
// Docker Desktop with WSL2 sets up host volumes with weird pidns.
func autoWindowsWorkaround() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it WSLWorkaround to be more specific? I was confused because of "Windows".

@@ -190,6 +191,15 @@ partsLoop:

l.rc = rc

hostConfig := host.Config{
AutoMount: true,
AutoWindowsWorkaround: true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have all those workaround off by default. My only concern is how to suggest the users to enable them? Is there a place where we can put a warning about them?

Comment on lines 40 to 41
// local runtime requires root to run host.Init()
utilstest.RequireRoot(t)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still true if we disable the workarounds by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still true because of the way the code is written. I'll rewrite IsHostPidNs() in a cleaner way to avoid that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prometheus test no longer needs to run as root.

@alban
Copy link
Member Author

alban commented Jul 12, 2023

I updated the branch with better error messages. This can be tested as follows:

make ig
docker tag ig-linux-amd64:latest myrepo/ig:dev
docker push myrepo/ig:dev

The workarounds are not automatic, but the error message suggests them as appropriate:

$ kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig list-containers -w
Creating debugging pod node-debugger-minikube-docker-4pm2j with container debugger on node minikube-docker.
error: need CAP_SYS_ADMIN (did you try --auto-sd-unit-restart?)

Tested commands:

kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true list-containers -w
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true top ebpf
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true trace exec

On WSL2:

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec
error: filesystems bpf, debugfs, tracefs not mounted (did you try --auto-mount-filesystems?)

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems
error: /host/proc/self not found on WSL2 (did you try --auto-wsl-workaround?)

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems --auto-wsl-workaround
WARN[0000] /host/proc's pidns is neither the current pidns or a parent of the current pidns. Remounting.
WARN[0000] Overriding HostRoot=/proc/1296/root/ HostProcFs=/proc/1296/root/proc (lifecycle-server)
CONTAINER                         PID        PPID       COMM             RET ARGS
wizardly_franklin                 16754      14684      cat              0   /bin/cat /tmp/file

Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green.

if !host.IsHostPidNs {
hostPidNs, err := host.IsHostPidNs()
if err != nil {
log.Debugf("Runcfanotify: not supported: %s", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Debugf("Runcfanotify: not supported: %s", err)
log.Debugf("checking if current pid namespace is host pid namespace: %s", err)

?

alban added 3 commits July 13, 2023 10:22
The 'kubectl debug node' is now supported. This patch adds documentation
for it.

Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
This can be tested as follows:

```
make ig
docker tag ig-linux-amd64:latest myrepo/ig:dev
docker push myrepo/ig:dev
```

The workarounds are not automatic, but the error message suggests them
as appropriate:

```
$ kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig list-containers -w
Creating debugging pod node-debugger-minikube-docker-4pm2j with container debugger on node minikube-docker.
error: need CAP_SYS_ADMIN (did you try --auto-sd-unit-restart?)
```

Tested commands:
```
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true list-containers -w
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true top ebpf
kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true trace exec
```

On WSL2:
```
$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec
error: filesystems bpf, debugfs, tracefs not mounted (did you try --auto-mount-filesystems?)

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems
error: /host/proc/self not found on WSL2 (did you try --auto-wsl-workaround?)

$ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems --auto-wsl-workaround
WARN[0000] /host/proc's pidns is neither the current pidns or a parent of the current pidns. Remounting.
WARN[0000] Overriding HostRoot=/proc/1296/root/ HostProcFs=/proc/1296/root/proc (lifecycle-server)
CONTAINER                         PID        PPID       COMM             RET ARGS
wizardly_franklin                 16754      14684      cat              0   /bin/cat /tmp/file
```

Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
Execute the tests without the workarounds.

Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
@alban
Copy link
Member Author

alban commented Jul 13, 2023

The CI failure seems unrelated to this PR. I will restart it to see if it helps.

https://github.com/inspektor-gadget/inspektor-gadget/actions/runs/5540796137/jobs/10113500123?pr=1809

        {"node":"minikube-docker-m02","namespace":"test-trace-oomkill-7478521341844805850","pod":"test-pod","container":"test-pod-container","timestamp":1689237512775921304,"type":"normal","mountnsid":4026533542,"kpid":52473,"kcomm":"tail","pages":1081343,"tpid":52473,"tuid":1000,"tgid":2000,"tcomm":"tail"}
        
    command.go:512: invalid command output(StartOomkilGadget): verifying output with custom function: unexpected output entry:
          &types.Event{
          	... // 4 identical fields
          	Pages:         0,
          	TriggeredPid:  0,
        - 	TriggeredUid:  1000,
        + 	TriggeredUid:  0,
        - 	TriggeredGid:  2000,
        + 	TriggeredGid:  0,
          	TriggeredComm: "",
          }
...
--- FAIL: TestTraceOOMKill (21.09s)

It is weird that this TestTraceOOMKill failure only happens in the core flavor on docker (but not on containerd or cri-o or with the default flavor).

@alban alban merged commit dbc4192 into main Jul 13, 2023
@alban alban deleted the alban_ig_image branch July 13, 2023 09:50
@alban
Copy link
Member Author

alban commented Jul 13, 2023

The CI passed fine after a retry. The TestTraceOOMKill failure was a flake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] Documentation: use IG with kubectl debug
4 participants