-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve nsexec logging #2487
Improve nsexec logging #2487
Conversation
6b06797
to
22896bc
Compare
Current theory: added debug logging revealed some kind of a race in the code, which is now failing. |
2cf1faa
to
8f872aa
Compare
This comment has been minimized.
This comment has been minimized.
8f872aa
to
707d9db
Compare
OK, apparently the golang runtime just ends earlier than the nsexec part, so some part of logs are not printed. Perhaps we need to wait for |
yes, basically it's a race between the logger goroutine and the main thread. |
707d9db
to
9a1afdf
Compare
OK, it looks like I have fixed the tests. Let me split the last commit into smaller pieces and I guess this can go in. |
634317d
to
b03b723
Compare
05a3cf5
to
9bae011
Compare
factory, err := libcontainer.New("") | ||
if err != nil { | ||
logrus.Fatalf("unable to initialize for container: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logrus
is replaced by panic, because we just closed the logrus file descriptor. This is a test case so doesn't matter much.
@@ -19,12 +23,40 @@ func init() { | |||
} | |||
runtime.GOMAXPROCS(1) | |||
runtime.LockOSThread() | |||
|
|||
logLevel := logrus.InfoLevel | |||
if level := os.Getenv("_LIBCONTAINER_LOGLEVEL"); level != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was made optional because most (or all) of the test cases do not set Process.LogLevel and it is easier to have the default here rather than to amend all the tests.
Alternatively, the default can be moved somewhere to Process.Start
TODO: close the init's |
9bae011
to
f3c34ce
Compare
cd4dca6
to
f9b4cf0
Compare
f9b4cf0
to
b79c8f5
Compare
Rebased on top of the current master. Don't remember what was I stuck as last time I was working on it. |
At least one failure;
|
703bef9
to
4eaf907
Compare
I was running in circles around it but it looks like I finally made it working. The fix itself (for the flakes in debug.bats, i.e. #2460) is actually pretty simple:
Before this PR, the logpipe fd was only closed upon successful So, it all works now but I'm keeping it as a draft because I have yet to convert this to a set of nice and clean commits. |
It does not make sense to have a string for a numeric type. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Remove redundant "echo $output" from the first test case, as "runc" helper function already logs the output. 2. Show the contents of log.out to stderr, so it case of error we can see what is going on. 3. Remove the check that `log.out` file exists. This check is redundant, since right after it we do `cat log.out` and check its exit code. 4. Factor out common checks into check_debug. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Error handling is slightly cleaner this way. While at it, do minor refactoring and fix error logging in processEntry. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fixes a big red warning in my vim. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I have noticed that ConfigureLogs do not return an error in case logging was already configured -- instead it just warns about it. So I went ahead and changed the warning to the actual error... ... only to discover I broke things badly, because in case of runc init logging is configured twice. The fix is to not configure logging in case we are init. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sometimes debug.bats test cases are failing like this: > not ok 27 global --debug to --log --log-format 'json' > # (in test file tests/integration/debug.bats, line 77) > # `[[ "${output}" == *"child process in init()"* ]]' failed It happens more when writing to disk. This issue is caused by the fact that runc spawns log forwarding goroutine (ForwardLogs) but does not wait for it to finish, resulting in missing debug lines from nsexec. ForwardLogs itself, though, never finishes, because it reads from a reading side of a pipe which writing side is not closed. This is especially true in case of runc create, which spawns runc init and exits; meanwhile runc init waits on exec fifo for some undetermined long time before doing execve. So, to fix the failure described above, we need to: 1. Make runc create/run/exec wait for ForwardLogs to finish; 2. Make runc init close its log pipe file descriptor (i.e. the one which value is passed in _LIBCONTAINER_LOGPIPE environment variable). This is exactly what this commit does. In addition, it - adds logrus debug to late stages of runc init, and amends the integration tests to check for those messages; - adds runc --debug exec test cases, similar to those in debug.bats but for runc exec eather than runc run; - refactors libcotainer/logs unit tests to simplify and update for the new ForwardLogs. PS I have to admit I still do not understand why closing log pipe fd is required in e.g. (*linuxSetnsInit).Init, right before the execve which (thanks to CLOEXEC) closes the fd anyway. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In order to make 'runc --debug' actually useful for debugging nsexec bugs, provide information about all the internal operations when in debug mode. [@kolyshkin: auto-fix formatting using indent for make validate to pass] Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Some strings logged by write_log() contain \n, which leads to errors like this one: > # time="2020-06-07T15:41:37Z" level=error msg="failed to decode \"{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-0[2265]: update /proc/2266/uid_map to '0 1000 1\\n\" to json: invalid character '\\n' in string literal" The fix is to escape such characters. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
4eaf907
to
fce08a9
Compare
will open another one |
This is a carry of #2460 with a few fixes on top:
make validate
;Keeping this one a draft until #2835 is merged.
Closes: #2460