-
Notifications
You must be signed in to change notification settings - Fork 374
Stdout buffer not being drained fast enough #2467
Comments
we'll end up throttling the grpc channel, I'd prefer to increase the number of pages from 16 to 32, maybe the memory footprint won't be impacted too much @awprice have you tried changing the number of pages? |
@devimc Haven't tried increasing the number of pages. I'll give that a go and report back. |
@awprice Yeah, increasing the size for the pipe seems like a reasonable approach. Let us know if that works out. |
So after increasing the number of pages for stdio in the Kata agent we've found that the internal workload is able to run successfully under Kata. I've found that I used the following Kata agent patch to get this working:
and the before/after:
@amshinde / @devimc What are your thoughts on making this change to the upstream Kata agent? |
@awprice thanks for taking the time to debug this. Since not all workloads require to increase the size of the pipe, may be the kata-agent should have a kernel command line option to allow users change the size of the pipe:
|
Haha! I love the shirt size / "OMG are those Roman numerals?" idea, but I'd vote for option 1 as it's explicit. We can always provide some "reasonable" values for the shirt-sizes in the comments in Crikey! It's well past XV o'clock, must be time for a cuppa... |
@devimc @jodh-intel I realise this is not required for all workloads, but I am afraid adding yet another configuration option is just going to add to configuration bloat. @awprice Was increasing the size of the stdout pipe enough for running the workload? Or did you need to increase the stdin and stderr pipes as well? |
@amshinde Agree with the configuration bloat. Is there another method we can resize this without configuration? We only needed to increase the size of stdout and stderr. |
@devimc No annotations are not supported with docker, but we can document that. @awprice One approach would be to detect if we are running out of space on the pipe and automatically increase the pipe size when the buffer data goes beyond a certain limit. I am not entirely sure if is doable though as I havent tried it out myself. |
@amshinde I would prefer an annotation to configure the pipe size - dynamically resizing it sounds overly complicated and have the potential for the thing that is resizing it not reacting fast enough. |
@awprice Lets go with the adding an annotation. |
@amshinde 👍 Thanks! |
@amshinde Does the option need to be configurable through the |
Just annotations I think - generally we are trying to reduce the number of options in the config file :-) It's an interesting conflict - I like having the global flexibility, but documenting all the items in the config file ends up confusing end users as there are sooo many options, and they interact in moderately complex ways :-( |
This adds the `agent.container_pipe_size` annotation which allows configuration of the size of the pipes for stdout/stderr for containers inside the guest. fixes kata-containers#2467 Signed-off-by: Alex Price <aprice@atlassian.com>
This adds the `agent.container_pipe_size` annotation which allows configuration of the size of the pipes for stdout/stderr for containers inside the guest. fixes kata-containers#2467 Signed-off-by: Alex Price <aprice@atlassian.com>
This adds the `agent.container_pipe_size` annotation which allows configuration of the size of the pipes for stdout/stderr for containers inside the guest. fixes kata-containers#2467 Signed-off-by: Alex Price <aprice@atlassian.com>
This adds the `agent.container_pipe_size` annotation which allows configuration of the size of the pipes for stdout/stderr for containers inside the guest. fixes kata-containers#2467 Signed-off-by: Alex Price <aprice@atlassian.com>
This adds the `agent.container_pipe_size` annotation which allows configuration of the size of the pipes for stdout/stderr for containers inside the guest. fixes kata-containers#2467 Signed-off-by: Alex Price <aprice@atlassian.com>
Description of problem
This is bit of a weird one.
We have an internal workload that uses a specific version of a NodeJS docker image -
node:10.16.3-stretch
. This workload also emits a large amount of logs to stdout. The combination of using this node version and logging a large amount of logs causes the workload on Kata to fail.This is due to the specific version of NodeJS changing the blocking mode of stdout to non-blocking. Then when the workload goes to emit logs to stdout later (in a separate, non-node command), it fails when it tries to write to stdout that is full.
You can find more information on NodeJS changing the mode of stdout here:
I've confirmed that the mode of stdout is changed with after node has been run:
I've developed the following script that can be used to replicate:
And can be run with:
The problem with this script is that it will fail the same on both Kata and Runc. But with the workload, it doesn't log
70,000 bytes at once and the buffer is drained faster on runc.
Expected result
When running the workload on non-Kata, i.e. runc, the workload completes successfully.
Actual result
When running the workload on Kata, we get the following error:
The workload uses
tee
to redirect the output of the command to both a file and stdout.Solution
I've considered a bunch of different solutions for this issue on the Kata side:
Can we increase the size of the os.Pipe for the stdout buffer?
The stdin/stdout/stderr pipes are created here https://github.com/kata-containers/agent/blob/7c2d8ab303085d0c59dcba784c22bdf168ff8961/grpc.go#L347
It sounds like the size of it in the Kata guest is 16 pages of 4096 bytes, so 65,536 bytes. See http://man7.org/linux/man-pages/man7/pipe.7.html
section "Pipe Capacity" -
but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations.
Can we increase the rate at which the shim/runtime polls the agent for logs, thus not letting the buffer fill up as fast?
If there isn't a way to fix this on the Kata side, I have some solutions on the workload side that I am considering:
Could I get some feedback/thoughts on the above?
Additional Details
Show kata-collect-data.sh details
Meta details
Running
kata-collect-data.sh
version1.10.0 (commit ebe9677f23b574c5defacf57456d221d8ce901f2)
at2020-02-18.06:08:47.580899329+0000
.Runtime is
/opt/kata/bin/kata-runtime
.kata-env
Output of "
/opt/kata/bin/kata-runtime kata-env
":Runtime config files
Runtime default config files
Runtime config file contents
Output of "
cat "/etc/kata-containers/configuration.toml"
":Output of "
cat "/opt/kata/share/defaults/kata-containers/configuration.toml"
":Config file
/usr/share/defaults/kata-containers/configuration.toml
not foundKSM throttler
version
Output of "
--version
":systemd service
Image details
Initrd details
No initrd
Logfiles
Runtime logs
No recent runtime problems found in system journal.
Proxy logs
No recent proxy problems found in system journal.
Shim logs
No recent shim problems found in system journal.
Throttler logs
No recent throttler problems found in system journal.
Container manager details
Have
docker
Docker
Output of "
docker version
":Output of "
docker info
":Output of "
systemctl show docker
":No
kubectl
No
crio
Have
containerd
containerd
Output of "
containerd --version
":Output of "
systemctl show containerd
":Output of "
cat /etc/containerd/config.toml
":Packages
No
dpkg
No
rpm
The text was updated successfully, but these errors were encountered: