-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ig container image: docs and fixes for kubectl-debug and Docker Desktop on Windows #1809
Conversation
Together with @blanquicet's PR #1702, we should be able to see the Kubernetes namespace and pod with the kubectl debug command above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR goal is cool but I have some interrogations.
I will also need to test it.
pkg/utils/host/host.go
Outdated
} | ||
|
||
// Docker Desktop with WSL2 sets up host volumes with weird pidns. | ||
if HostRoot != "/" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this is used to check if we are running on Windows? Or under WSL2?
Is there any other way to do so? I mean using GOHOSTOS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any environment variable that distinguish Windows or WSL2. GOHOSTOS would be Linux because ig
is compiled for Linux and is running on the Linux kernel.
$ uname -r
5.10.102.1-microsoft-standard-WSL2
But then I thought it is better to test the actual problem rather that test the OS. The reason we need this workaround is because the procfs /host/proc
is mounted in a pid namespace which is neither the pid namespace of the ig container or a parent pid namespace (therefore it is not the host pidns too). This seems to be a recent change in Docker Desktop for WSL2. So I am testing that below with readlink("/host/proc/self")
.
If Docker Desktop for WSL2 changes its implementation to no longer do that, then the workaround will not be used anymore.
pkg/utils/host/host.go
Outdated
_, err = os.Stat("/run/systemd/private") | ||
if err != nil { | ||
err := os.Symlink("/host/run/systemd/private", "/run/systemd/private") | ||
if err != nil { | ||
return fmt.Errorf("linking /run/systemd/private: %w", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use this to ensure the file does not exist?
If so, isn't there are another way to do so? Or does os.Stat()
returns a specific error in case the path does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do that because the systemdDbus Golang package hard codes "/run/systemd/private" so I can't tell it to use "/host/run/systemd/private" instead. Normally the gadget DaemonSet uses the host volume /run:/run
so that would not be necessary. But here, the gadget pod was started by "kubectl debug node", so the /run
is not mounted from the host and is just a tmpfs in the container. So I make the symlink to make the systemdDbus Golang package work fine.
Yes, I can check the specific error code from Stat with errors.Is(err, os.ErrNotExist)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemdDbus Golang package hard codes "/run/systemd/private"
Any way to open a contribution to change this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I give my approval but only for the Linux part as I only tested this part and I do not have easy access to a WSL machine:
$ docker run -ti --rm \ alban_ig_image % u=
--privileged \
-v /sys/fs/bpf:/sys/fs/bpf \
-v /sys/kernel/debug:/sys/kernel/debug \
-v /run:/run \
-v /:/host \
-e HOST_ROOT=/host \
ig-linux-amd64 \
trace exec
WARN[0000] failed to get cgroups info of container b667ecc13325c2fba398c0cafeadff5d6a8aa763abca7a2e791bb87cb81e1950 from /proc/193164/cgroup: cgroup path not found in /proc/PID/cgroup
ERRO[0000] cgroup enricher: failed to get cgroup paths on container b667ecc13325c2fba398c0cafeadff5d6a8aa763abca7a2e791bb87cb81e1950: cgroup path not found in /proc/PID/cgroup
CONTAINER PID PPID COMM RET ARGS
intelligent_visvesvaraya 193274 193251 ls 0 /bin/ls
$ kubectl debug node/minikube-docker -ti --image ig-linux-amd64 --image-pull-policy=Never -- ig list-containers
Creating debugging pod node-debugger-minikube-docker-8x9pm with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.
RUNTIME ID NAME
docker 54c10eea34b28b5e27ca052d3e54e2c0ed478… k8s_debugger_node-debugger-minikube-docker-8x9pm_default_c9f68cb1-dfc8-4183-b502-247…
docker 18c4911d32b396df4cecfc64b021dcf417b50… k8s_etcd_etcd-minikube-docker_kube-system_3602af6795c463d205ca440218f9a4af_1
docker 1dc470aee6ec2c241536d9517c89b6f54d751… k8s_kube-apiserver_kube-apiserver-minikube-docker_kube-system_4b0f5be86866fc2e09a85c…
docker 1a0980bf465b671b017be1d32df3957827711… k8s_kube-controller-manager_kube-controller-manager-minikube-docker_kube-system_f0da…
docker b797377fa6256e774f744964792c3657040cc… k8s_kube-proxy_kube-proxy-jnr98_kube-system_26a7281e-9377-4009-ac5c-8fed3fa25792_1
docker 249e18e3891f0649dfb7528703b3d57b350de… k8s_kube-scheduler_kube-scheduler-minikube-docker_kube-system_8889d4e06ea110c9727048…
docker e3be405bb7001455465bb284e82c591dda6a4… k8s_storage-provisioner_storage-provisioner_kube-system_76df6fba-9121-4b0e-aebe-67b4…
INFO[0000] systemd unit "kubectl-debug-ig-c227d051.service" returned "done"
docs/ig.md
Outdated
--privileged \ | ||
-v /sys/fs/bpf:/sys/fs/bpf \ | ||
-v /sys/kernel/debug:/sys/kernel/debug \ | ||
-v /run:/run \ | ||
-v /:/host \ | ||
-e HOST_ROOT=/host \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would a good idea to explain why each option is needed.
pkg/utils/host/host.go
Outdated
fmt.Sprintf("/proc/%d/root/usr/bin/ig", os.Getpid()), | ||
} | ||
cmd = append(cmd, os.Args[1:]...) | ||
properties := []systemdDbus.Property{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to create all this systemd related stuff?
Is it proper to kubectl debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation for "kubectl debug node":
Although the container runs in the host IPC, Network, and PID namespaces, the pod isn't privileged.
kubectl debug starts a pod with some privileges but not all. It has the /host
volume, some host namespaces and some capabilities but not CAP_SYS_ADMIN. And it does not have any option like --privileged
to give the missing capabilities required by Inspektor Gadget.
So as a workaround, ig
re-executes itself but on the host in a fully privileged systemd unit.
I am unsure if this workaround should be automatic. Maybe it could prompt the user on the terminal unless an environment variable $IG_AUTO_REEXEC_IN_SD_UNIT
is set:
$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev \
-e IG_AUTO_REEXEC_IN_SD_UNIT=1 -- ig trace exec
$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev -- ig trace exec
Creating debugging pod node-debugger-minikube-docker-9t8br with container debugger on node minikube-docker.
If you don't see a command prompt, try pressing enter.
ig is running in a container without CAP_SYS_ADMIN but with access to systemd.
Reexecute ig in a systemd unit? [yes/no] yes
CONTAINER PID PPID COMM RET ARGS
k8s_shell01_shell01_default_40bfae0e-51… 3294896 3294876 ls 0 /bin/ls
What do you think, should I implement this prompt / envvar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the precision, it makes sense.
What do you think, should I implement this prompt / envvar?
I find it complicated.
In any case, without the workaround ig
would not work while run from kubectl debug
?
So, I think this workaround is needed.
We should nonetheless document it and I am wondering if this should just be a comment in the code or written somewhere in the documentation.
Maybe just an important question: this workaround cannot really do harm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it complicated.
I talked about this with colleagues over lunch and the conclusion was that users would not like if ig
creates a systemd unit without telling the user, so a flag like --allow-sdunit
would be better.
At the moment, all this code is implemented in a init()
function, so before CLI flags are parsed. It would need to be moved to
In any case, without the workaround ig would not work while run from kubectl debug?
Correct.
So, I think this workaround is needed.
Correct. But we should do it in a way which is not behind the user's back.
$ kubectl debug node/... -- ig trace exec
Error: container without CAP_SYS_ADMIN detected. Use --allow-sdunit to try again in a privileged systemd unit.
$ kubectl debug node/... -- ig trace exec --allow-sdunit
CONTAINER PID PPID COMM RET ARGS
foo 3294896 3294876 ls 0 /bin/ls
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked about this with colleagues over lunch and the conclusion was that users would not like if ig creates a systemd unit without telling the user, so a flag like --allow-sdunit would be better.
This makes sense, let's go for this solution then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workaround should be described in the documentation, explaining what kubectl debug does, what is missing to be able to run ig
, and what we are doing to have those missing configurations.
In addition, we should add a note in the documentation like: "To be sure the user is aware of this workaround, we intentionally let the command fail if --allow-sdunit
is not passed", or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings with this PR. On one side, I think the idea to use systemd on the host to escalate the capabilities of the container is brilliant, however, I also think it's too hacky as it relies on a privilege escalation to work.
I'm slightly inclined to not add this (as this relies on a hack to work), but if think there is added value and the hack is ok, I'll accept this without issues.
Btw, an upstream solution is "Debugging profiles", however I don't see any implementation of that yet.
pkg/utils/host/host.go
Outdated
// Initialize IsHost*Ns | ||
IsHostPidNs = isHostNamespace("pid") | ||
IsHostNetNs = isHostNamespace("net") | ||
} | ||
|
||
func workarounds() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What about having a different function for each workaround, so it's easier to track them all?
pkg/utils/host/host.go
Outdated
} | ||
} | ||
|
||
// Docker Desktop with WSL2 sets up host volumes with weird pidns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an issue in WSL2?
pkg/utils/host/host.go
Outdated
// workaroundMounts ensures that filesystems are mounted correctly. | ||
// Some environments (e.g. minikube) runs with a read-only /sys without bpf | ||
// https://github.com/kubernetes/minikube/blob/99a0c91459f17ad8c83c80fc37a9ded41e34370c/deploy/kicbase/entrypoint#L76-L81 | ||
// Docker Desktop with WSL2 also has filesystems unmounted. | ||
func workaroundMounts() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should be part of the previous commit to keep it organized from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what about adding similar comments to other functions, they're very dense and it's not easy to understand what they are for.
pkg/utils/host/host.go
Outdated
return fmt.Errorf("stopping transient unit %q: got `%s`", unitName, s) | ||
} | ||
} | ||
os.Exit(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defered calls aren't executed. It doesn't seems like a very big issue with the current code but what about fixing it? (like https://stackoverflow.com/a/27629493)
pkg/utils/host/host.go
Outdated
|
||
select { | ||
case s := <-statusChan: | ||
log.Infof("systemd unit %q returned %q", unitName, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any of these logs. It'd be nice to show to the user that this workaround is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is printed if the systemd unit returns by itself first. For example, with --timeout 5
. If you press ctrl-c, it will go through the second case
.
I'll add a debug to tell when the workaround is used.
But I think it should be a log.Debugf
(only printed with --verbose
): if the command returns json data, the log would still be parsable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think it should be a log.Debugf (only printed with --verbose): if the command returns json data, the log would still be parsable.
It should not be a problem when using a logger since those messages go to stderr:
$ export IG_EXPERIMENTAL=true
# json output is parseable even if a log message is printed
$ go run -exec "sudo -E" ./cmd/ig/. trace exec -o json | jq
INFO[0000] Experimental features enabled
{
"container": "mycontainer3",
"timestamp": 1688584017253994000,
"type": "normal",
"mountnsid": 4026535441,
"pid": 1076064,
"ppid": 1074187,
"comm": "cat",
"args": [
"/bin/cat",
"/dev/null"
],
"uid": 0,
"gid": 0,
"loginuid": 4294967295,
"sessionid": 4294967295
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true...
When using the -t
option of kubectl debug
, it creates a tty, so stdout and stderr are both the same file descriptor to the tty. I guess in that case users should not use -t
if they want to parse the json... and we should check if that works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, I'll test it once the allow-sdunit
flag is implemented.
docs/ig.md
Outdated
-v /sys/kernel/debug:/sys/kernel/debug \ | ||
-v /run:/run \ | ||
-v /:/host \ | ||
-e HOST_ROOT=/host \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed after defining ENV HOST_ROOT=/host
in ig.Dockerfile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, I will remove.
|
||
### Using ig with kubectl debug | ||
|
||
Examples of commands: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a link to the documentation of kubectl debug.
pkg/utils/host/host.go
Outdated
fmt.Sprintf("/proc/%d/root/usr/bin/ig", os.Getpid()), | ||
} | ||
cmd = append(cmd, os.Args[1:]...) | ||
properties := []systemdDbus.Property{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workaround should be described in the documentation, explaining what kubectl debug does, what is missing to be able to run ig
, and what we are doing to have those missing configurations.
In addition, we should add a note in the documentation like: "To be sure the user is aware of this workaround, we intentionally let the command fail if --allow-sdunit
is not passed", or something similar.
64d463d
to
a551847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the code itself.
I will test it again later.
pkg/utils/host/host.go
Outdated
// if the host does not use systemd, we cannot use this workaround | ||
_, err = os.Stat("/host/run/systemd/private") | ||
if err != nil { | ||
return false, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// if the host does not use systemd, we cannot use this workaround | |
_, err = os.Stat("/host/run/systemd/private") | |
if err != nil { | |
return false, nil | |
} | |
// if the host does not use systemd, we cannot use this workaround | |
_, err = os.Stat("/host/run/systemd/private") | |
if errors.Is(err, os.ErrNotExist) { | |
return false, nil | |
} else { | |
return false, err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, when ig
is running without CAP_SYS_ADMIN on a Linux distro without systemd, the error message will be about the non-existing systemd socket instead of being about failing later on because of the lack of CAP_SYS_ADMIN. Is it really what we want? The lack of systemd on a Linux distro without systemd is not something that the user can fix, whereas the lack of CAP_SYS_ADMIN can be fixed by the user (by creating the pod manually instead of using "kubectl debug node").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiding errors is something we should really avoid.
the error message will be about the non-existing systemd socket instead of being about failing later on because of the lack of CAP_SYS_ADMIN
If you do not have systemd
, you will not be able to run the workaround, so better to fail here, right?
The lack of systemd on a Linux distro without systemd is not something that the user can fix
Indeed, then better to explode yelling: "get a systemd distro".
whereas the lack of CAP_SYS_ADMIN can be fixed by the user (by creating the pod manually instead of using "kubectl debug node").
I do not understand.
This whole code is code is called in the systemd
related workaround which is used by kubectl debug node
.
What should we do then? Printing an error message advising the user to create a privileged pod by hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called autoSdUnitRestart
and the flag is called --auto-sd-unit-restart
: it was not meant to unconditionally execute the workaround but first checks if it is useful and possible. If it finds out it is not necessary (e.g. we have CAP_SYS_ADMIN) or not possible (non-systemd distro), then it does not try the workaround and continue on the normal path (and most likely fail, unless it is a command such as ig version
, ig --help
, etc.).
I could implement your suggestion to fail on a non-systemd distro, but then I should rename the flag to --restart-in-sd-unit
to imply to the user that it is not conditional.
I am fine with either choice, I just want to make the flag name match its semantic.
If we later want to support non-systemd distro with Docker, we could implement another workaround that creates a privileged Docker container by talking to the dockerd socket. With my semantic, we could have two flags --auto-sd-unit-restart
and --auto-docker-restart
, the user could specify both, and ig would select whatever it finds useful and possible. With your semantic, we could have two flags --restart-in-sd-unit
and --restart-in-docker
, and the user must use one of them because using the wrong one would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was not meant to unconditionally execute the workaround but first checks if it is useful and possible. If it finds out it is not necessary (e.g. we have CAP_SYS_ADMIN) or not possible (non-systemd distro), then it does not try the workaround and continue on the normal path (and most likely fail
I am not highly convinced.
On one hand, if we almost sure it will fail, then better to not run ig
at all and inform the user of why it will fail (lack of CAP_SYS_ADMIN
or non systemd
distro).
On another hand, the above would prevent users running "simple" command, like ig --help
.
Moreover, if it fails due to lack of CAP_SYS_ADMIN
and/or systemd
, what ig
would print?
If the error message explains well the problem, then let's keep your solution as is, otherwise adding explanation somewhere should help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one moment, you told this workaround could maybe not be needed in the future due to some modifications upstream in kubectl debug
.
What is the status of this modification? Could we help bring it?
In we work upstream, we would no more need this workaround and the whole situation would be easier for everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On one hand, if we almost sure it will fail, then better to not run ig at all and inform the user of why it will fail (lack of CAP_SYS_ADMIN or non systemd distro).
On another hand, the above would prevent users running "simple" command, like ig --help.
The workarounds are now done in the local runtime, so simple commands like "version" and "help" should run without attempting the workaround.
Moreover, if it fails due to lack of CAP_SYS_ADMIN and/or systemd, what ig would print?
At the moment, because of lack of capabilities:
$ kubectl debug node/minikube-docker -ti --image=albantest.azurecr.io/ig:dev3 -- ig trace exec
Creating debugging pod node-debugger-minikube-docker-khj6w with container debugger on node minikube-docker.
Error: initializing operators: initializing operator "LocalManager": initializing manager: failed to set memlock rlimit: operation not permitted
If the error message explains well the problem, then let's keep your solution as is, otherwise adding explanation somewhere should help.
Ok, I'll try to update this.
At one moment, you told this workaround could maybe not be needed in the future due to some modifications upstream in kubectl debug.
What is the status of this modification? Could we help bring it?
In we work upstream, we would no more need this workaround and the whole situation would be easier for everyone.
KEP-1441 (issue #1441) defines security profiles and kubernetes/kubernetes#114280 implemented some of them in kubectl v1.27. Unfortunately, even in the last version, the security profile "sysadmin" with CAP_SYS_ADMIN is not yet implemented. There are still some PRs in progress about KEP-1441 such as kubernetes/kubernetes#118647 so I assume the work in KEP-1441 with the "sysadmin" will be completed at some point, but it might take some time. I will update the documentation to reflect that.
//go:build !linux | ||
// +build !linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build flags here are a workaround for #1830
85651dd
to
81c581a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment, the workaround is mandatory.
It is nonetheless enough documented and people should understand the consequences of using it.
If we decide to go further with this PR, we need it in any cases.
In an ideal future, we should rely on kubectl debug --profile=sysadmin
once it exists.
I renew my approval but please wait for someone else to give its opinion before merging.
Hum... $ ./kubectl debug node/minikube -ti --image-pull-policy=Never --image=ig-linux-amd64 -- ig --auto-sd-unit-restart=true trace exec
Creating debugging pod node-debugger-minikube-chh2d with container debugger on node minikube.
Error: initializing operators: initializing operator "LocalManager": initializing manager: failed to set memlock rlimit: operation not permitted Anything that I miss somewhere?
|
3d3b1fc
to
4315da7
Compare
As of today, the `kubectl debug` command does not have a way to give enough privileges to the debugging pod to be able | ||
to use `ig`. | ||
This might change in the future: the Kubernetes Enhancement Proposal 1441 | ||
([KEP-1441](https://github.com/kubernetes/enhancements/tree/master/keps/sig-cli/1441-kubectl-debug)) | ||
suggests to implement Debugging Profiles (`--profile=`) to be able to give the necessary privileges. | ||
kubectl v1.27 implements some of those profiles but not yet the "sysadmin" profile, so it is not possible to use | ||
`--profile=` yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eiffel-fl for kubernetes/kubernetes#119200
@@ -190,6 +191,15 @@ partsLoop: | |||
|
|||
l.rc = rc | |||
|
|||
hostConfig := host.Config{ | |||
AutoMount: true, | |||
AutoWindowsWorkaround: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is set to true for ig so that it works automatically. In the gadget pod, the windows workaround is set to false because it doesn't run on windows anyway.
Do you prefer to switch to a flag like it was done for the systemd workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have all those workaround off by default. My only concern is how to suggest the users to enable them? Is there a place where we can put a warning about them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several points that I do not understand in this PR.
I would advise splitting in two parts:
- One which document how to use the
ig
container image, so we can test it easily and close the corresponding issue. - Another which deals with the
systemd
workaround.
@eiffel-fl Ok, I filed #1841 for the documentation part. Note that I needed to add more |
log.Infof("HostPID=%s", strconv.FormatBool(host.IsHostPidNs())) | ||
log.Infof("HostNetwork=%s", strconv.FormatBool(host.IsHostNetNs())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Infof("HostPID=%t", host.IsHostPidNs())
log.Infof("HostNetwork=%t", host.IsHostNetNs())
pkg/utils/host/host.go
Outdated
// capabilities. | ||
// | ||
// This is useful for the "kubectl debug node" command. | ||
func AddAutoSdUnitRestartFlag(command *cobra.Command) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this more generic: AddFlags
in case we add another flag later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, for the moment, I would not change it as we already plan to remove it in #1842.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it, kubernetes/kubernetes#119200 isn't even. It'd take some quite some time until users are running that version. Also, this is not only about that flag but to make it general in case there are more flags, like we're proposing for the "windows workaround"
pkg/utils/host/workaround_linux.go
Outdated
|
||
// Only root can talk to the systemd socket | ||
if os.Geteuid() != 0 { | ||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return an error?
pkg/utils/host/workaround_linux.go
Outdated
// if the host does not use systemd, we cannot use this workaround | ||
_, err = os.Stat("/host/run/systemd/private") | ||
if err != nil { | ||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
pkg/utils/host/workaround_linux.go
Outdated
|
||
// autoSdUnitRestart will automatically restart the process in a privileged | ||
// systemd unit if the current process does not have enough capabilities. | ||
func autoSdUnitRestart() (exit bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me when this function returns errors, some times it returns nil, other it returns the error, etc. What about using this way:
- Checking IG_IN_SYSTEMD_UNIT ro avoid recursive restarts
- Check if has CAP_SYS_ADMIN
At this point the workaround is needed, any error should be returned.
WDYT?
pkg/utils/host/workaround_linux.go
Outdated
|
||
// autoWindowsWorkaround overrides HostRoot and HostProcFs if necessary. | ||
// Docker Desktop with WSL2 sets up host volumes with weird pidns. | ||
func autoWindowsWorkaround() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call it WSLWorkaround
to be more specific? I was confused because of "Windows".
@@ -190,6 +191,15 @@ partsLoop: | |||
|
|||
l.rc = rc | |||
|
|||
hostConfig := host.Config{ | |||
AutoMount: true, | |||
AutoWindowsWorkaround: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have all those workaround off by default. My only concern is how to suggest the users to enable them? Is there a place where we can put a warning about them?
pkg/prometheus/prometheus_test.go
Outdated
// local runtime requires root to run host.Init() | ||
utilstest.RequireRoot(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still true if we disable the workarounds by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still true because of the way the code is written. I'll rewrite IsHostPidNs()
in a cleaner way to avoid that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prometheus test no longer needs to run as root.
I updated the branch with better error messages. This can be tested as follows:
The workarounds are not automatic, but the error message suggests them as appropriate:
Tested commands:
On WSL2:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once CI is green.
if !host.IsHostPidNs { | ||
hostPidNs, err := host.IsHostPidNs() | ||
if err != nil { | ||
log.Debugf("Runcfanotify: not supported: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Debugf("Runcfanotify: not supported: %s", err) | |
log.Debugf("checking if current pid namespace is host pid namespace: %s", err) |
?
The 'kubectl debug node' is now supported. This patch adds documentation for it. Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
This can be tested as follows: ``` make ig docker tag ig-linux-amd64:latest myrepo/ig:dev docker push myrepo/ig:dev ``` The workarounds are not automatic, but the error message suggests them as appropriate: ``` $ kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig list-containers -w Creating debugging pod node-debugger-minikube-docker-4pm2j with container debugger on node minikube-docker. error: need CAP_SYS_ADMIN (did you try --auto-sd-unit-restart?) ``` Tested commands: ``` kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true list-containers -w kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true top ebpf kubectl debug node/minikube-docker -ti --image=myrepo/ig:dev -- ig --auto-sd-unit-restart=true trace exec ``` On WSL2: ``` $ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec error: filesystems bpf, debugfs, tracefs not mounted (did you try --auto-mount-filesystems?) $ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems error: /host/proc/self not found on WSL2 (did you try --auto-wsl-workaround?) $ docker run -ti --rm --privileged -v /run:/run -v /:/host --pid=host myrepo/ig:dev trace exec --auto-mount-filesystems --auto-wsl-workaround WARN[0000] /host/proc's pidns is neither the current pidns or a parent of the current pidns. Remounting. WARN[0000] Overriding HostRoot=/proc/1296/root/ HostProcFs=/proc/1296/root/proc (lifecycle-server) CONTAINER PID PPID COMM RET ARGS wizardly_franklin 16754 14684 cat 0 /bin/cat /tmp/file ``` Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
Execute the tests without the workarounds. Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
The CI failure seems unrelated to this PR. I will restart it to see if it helps.
It is weird that this TestTraceOOMKill failure only happens in the core flavor on docker (but not on containerd or cri-o or with the default flavor). |
The CI passed fine after a retry. The TestTraceOOMKill failure was a flake. |
Since #1787, ig is published as a container image. You can fetch it with:
This PR adds documentation and fixes for kubectl-debug and Docker Desktop on Windows.
This is part of removing things from entrypoint.sh as suggested in #801. This picks up the changes in entrypoint.sh required by Docker Desktop (#1768). See also #1789 (comment).
How to use
Using ig with kubectl debug
Examples of commands:
$ kubectl debug node/minikube-docker -ti --image=ghcr.io/inspektor-gadget/ig -- ig trace exec $ kubectl debug node/minikube-docker -ti --image=ghcr.io/inspektor-gadget/ig -- ig list-containers -o json
Using ig in a container
Example of command:
$ docker run -ti --rm \ --privileged \ -v /sys/fs/bpf:/sys/fs/bpf \ -v /sys/kernel/debug:/sys/kernel/debug \ -v /run:/run \ -v /:/host \ -e HOST_ROOT=/host \ ghcr.io/inspektor-gadget/ig \ trace exec CONTAINER PID PPID COMM RET ARGS cool_wright 1163565 1154511 ls 0 /bin/ls
If you use Docker Desktop on Windows, please add
--pid=host
in the docker command.Testing done
I tested the commands above with
albantest.azurecr.io/ig:dev-pr1
on:cc @mqasimsarfraz @eiffel-fl
Fixes #1783