-
Notifications
You must be signed in to change notification settings - Fork 113
Discuss introducing a debug console facility #169
Comments
/cc @mcastelino, @sboeuf, @jcvenegas. |
/cc @egernst. |
runV/hyperstart had a method to run a process outside any container. I agree this is useful sometimes. add @laijs |
@jodh-intel I think we all agree this is really needed since we want to be able to access a VM to perform some advanced debugging. I am not sure if we need to maintain an official debug image, instead we could provide the right documentation on how a user can create its own debug image based on the official one with Unfortunately, and as you mentioned for security purposes, we should not have a dedicated way to debug from the official image itself. |
If possible, we could add a flag to osbuilder to automatically add and configure the bits to an image - that would make the process of generating a debug image a little easier as well. |
@mcastelino can comment further, but to be really useful, an admin would be able to debug any pod. That implies a single image type which either contains some sort of built-in but dormant debug console facility that can be enabled as desired [*], or a way to allow a debug console to be added securely by an admin to the running virtual machine. Maybe a compromise would be to provide some way to do the following (optionally after some sort of clone operation so the debug fiddling could be performed on a copy of the real workload?):
[*] - but must be disabled by default when new containers are created. |
@jodh-intel not sure I understand why we need such a complex functioning ? The user will just need to make sure the proxy(or anything else) is not consuming the dedicated socket, but other than that, it can be joined anytime, no need to pause the VM for that. |
Hi @sboeuf - Oh, sorry - I hadn't read you previous comment correctly. I think the confusion here revolves around perceived requirements. I think we all agree that:
But what still doesn't seem very clear is whether we have the following requirement:
Folk with a security hat on will tend to say "no", whilst folks who want to be able to debug running container issues will tend to say "yes". Back to your corners y'all for a wet sponge and towel... and... 🔔 Round 1! 🔔 |
my 2p - maybe you can have a single image if you can have access to the debug port secure enough that it does not pose a major problem during deployment. |
Having the terminal + ssh into the image will make it bigger, should not be that big though, but might worth to take into account. Now, I agree that having the debug capability included in every image would be easier because we would never have the problem of not being able to reproduce the issue through a specific Maybe the alternative to this would be to look at the problem upside down here, and consider having an image including debug by default, but only when people would need to use Kata in production, they would generate a "production-ready" image from |
@sboeuf I do not agree that the user should generate the image. It is not hard for us to generate the image each time we create a release and then the user can simply switch to the debug image by modifying the toml. @egernst and myself have had to support multiple customers and developers to get this working. Making our users and customer jump through hoops just to be able to debug is not a productive use of their time or ours. |
@mcastelino I don't have a strong opinion on that, and based on your experience with customers, I trust your opinion about this ! |
@mcastelino - what's your view though on whether we should provide a single image or two ("standard" and "debug")? |
@jodh-intel two images generated by our build system would be nice. And we could only choose to only validate the standard image. And the debug image is just there for interactive debug convenience. As we cannot flip the interactive debuggability on, on the fly without opening up potential security issues; having a separate debug image may be better from a security and validation point of view. The only test we need to run on the debug image is to verify if the console is accessible. Also the debug image does not really helpful in the cases where the POD panics or the VM crashes. That has to be debugged via kernel, proxy and shim logs. We have found the debug image useful typically when we deploy a new/customized QEMU or Kernel and things do not quite work as expected. @sboeuf we just need to include a minimal shell. There is no need for ssh, as the console will not need auth. That is unless we change the way we provide debug or if we want to provide a single image. |
@mcastelino I am fine with this solution :) @jodh-intel are we planning to test only images based on |
@mcastelino - you mean since theoretically if you want to debug container X, the debug facility would also be available to container Y in the same pod? Also, I'm slightly confused. I had thought that you and @egernst were in favour of having the ability to debug any container. Have I misunderstood or has that requirement now changed?
Right. All debug logs are now tested on all PRs using the
I've added tests for these things on the issue I raised for creating a "packaging CI": kata-containers/ci#10. @sboeuf - that's a good question. Given the amount of boot optimisation Clear Linux has performed, we want to ensure it remains a first class citizen here imho. However, if we have CI capacity we should consider testing a few variants (maybe wdyt @xu, @bergwolf, @chavafg, @jcvenegas, @grahamwhaley? |
I think it primarily comes down to:
|
Yep. I think we should aim to test However, we can enhance our existing osbuilder tests to create a basic container using each supported image type. I've raised kata-containers/osbuilder#68 to do that. |
Refreshing this -- I think one thing we can agree on, based on reading last few comments, is that it makes sense to provide a second image which is identical to the primary, but has this debug console enabled (ie, has the shell, has the service). Is this something that we can move forward with? I assume this would be included in the packages provided. @erick0z |
Background
We have considered adding a debug console in Clear Containers in the past:
However, discussions stalled for two main reasons:
Security
The concern that introducing such a feature could be an attack vector.
Image size
Adding a shell (even a tiny one) is going to bloat the images slightly and that space isn't going to be used 99.99% of the time.
Rationale
It would be very useful for developers and admins to have the ability to debug a running container from the guest-side root namespace. Note that
docker exec
is NOT what we want as that is not running in the guest root namespace and is thus constrained.Image support
We could generate two images - one with a shell and one without. However, the general view is that this is suboptimal since:
Architecture
runv
expects the agent to be running asPID = 1
(init daemon)A debug console shell would either need to run as a separate thread or a child process of the agent.
cc-runtime
(virtcontainers
) assumes the agent to be running with PID!= 1
A debug console shell can be launched by the init daemon (
systemd
by default) or could be handled as a separate thread / child of the agent.From a testing (and security) perspective, it would be safer to have a single code path for a debug console.
Configuration and logging
If we introduce a debug console facility it:
The text was updated successfully, but these errors were encountered: