Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsexec logging race #3119

Closed
kolyshkin opened this issue Jul 29, 2021 · 3 comments · Fixed by #3120
Closed

nsexec logging race #3119

kolyshkin opened this issue Jul 29, 2021 · 3 comments · Fixed by #3120

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented Jul 29, 2021

As reported in https://kubernetes.slack.com/archives/CAZH62UR1/p1627584431224800

[20:46] nb3:apps% kubectl logs nextcloud-nextcloud-846d6bbcf-9mmp8 nextcloud
time="2021-07-29T20:45:35+02:00" level=error msg="failed to decode \"{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-0[7972]: forward stage-1 (7973) and stage-2 (7974) pids to runc{\\\"level\\\":\\\"debug\\\", \\\"msg\\\": \\\"nsexec-1[7973]: signal completion to stage-0\\\"}\" to json: invalid character 'l' after object key:value pair"
time="2021-07-29T20:45:35+02:00" level=error msg="failed to decode \"\\\"}\" to json: unexpected end of JSON input"
Initializing nextcloud 20.0.11.1 ...
Initializing finished
New nextcloud instance
[29-Jul-2021 18:45:40] NOTICE: fpm is running, pid 1
[29-Jul-2021 18:45:40] NOTICE: ready to handle connections

What we see here is runc failed to decode JSON log message from runc init's nsexec.c. The messages are:

{"level":"debug", "msg": "nsexec-0[7972]: forward stage-1 (7973) and stage-2 (7974) pids to runc{"level":"debug", "msg": "nsexec-1[7973]: signal completion to stage-0"}
}

Looks like the code hit a jackpot of a debug log race between the parent at

write_log(DEBUG, "forward stage-1 (%d) and stage-2 (%d) pids to runc",
stage1_pid, stage2_pid);

and the first child at

write_log(DEBUG, "signal completion to stage-0");

Cc @cyphar

@kolyshkin
Copy link
Contributor Author

As the production rarely uses debug, in practice this is fixed by b1bc762

@kolyshkin
Copy link
Contributor Author

Another possible solution is something like this

diff --git a/libcontainer/nsenter/nsexec.c b/libcontainer/nsenter/nsexec.c
index 8199faf3..0c9eca00 100644
--- a/libcontainer/nsenter/nsexec.c
+++ b/libcontainer/nsenter/nsexec.c
@@ -142,7 +142,7 @@ int setns(int fd, int nstype)
 
 static void write_log(const char *level, const char *format, ...)
 {
-       char *message = NULL, *stage = NULL;
+       char *message = NULL, *stage = NULL, *json = NULL;
        va_list args;
        int ret;
 
@@ -164,11 +164,16 @@ static void write_log(const char *level, const char *format, ...)
        if (ret < 0)
                goto out;
 
-       dprintf(logfd, "{\"level\":\"%s\", \"msg\": \"%s[%d]: %s\"}\n", level, stage, getpid(), message);
+       ret = asprintf(&json, "{\"level\":\"%s\", \"msg\": \"%s[%d]: %s\"}\n", level, stage, getpid(), message);
+       if (ret < 0)
+               goto out;
+
+       write(logfd, json, ret);
 
 out:
        free(message);
        free(stage);
+       free(json);
 }
 
 /* XXX: This is ugly. */

@kolyshkin
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant