From 4bcc595ccd80decb4245096e3d1258989c50ed41 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 8 Oct 2016 20:32:40 -0700 Subject: [PATCH] printk: reinstate KERN_CONT for printing continuation lines Long long ago the kernel log buffer was a buffered stream of bytes, very much like stdio in user space. It supported log levels by scanning the stream and noticing the log level markers at the beginning of each line, but if you wanted to print a partial line in multiple chunks, you just did multiple printk() calls, and it just automatically worked. Except when it didn't, and you had very confusing output when different lines got all mixed up with each other. Then you got fragment lines mixing with each other, or with non-fragment lines, because it was traditionally impossible to tell whether a printk() call was a continuation or not. To at least help clarify the issue of continuation lines, we added a KERN_CONT marker back in 2007 to mark continuation lines: 474925277671 ("printk: add KERN_CONT annotation"). That continuation marker was initially an empty string, and didn't actuall make any semantic difference. But it at least made it possible to annotate the source code, and have check-patch notice that a printk() didn't need or want a log level marker, because it was a continuation of a previous line. To avoid the ambiguity between a continuation line that had that KERN_CONT marker, and a printk with no level information at all, we then in 2009 made KERN_CONT be a real log level marker which meant that we could now reliably tell the difference between the two cases. 5fd29d6ccbc9 ("printk: clean up handling of log-levels and newlines") and we could take advantage of that to make sure we didn't mix up continuation lines with lines that just didn't have any loglevel at all. Then, in 2012, the kernel log buffer was changed to be a "record" based log, where each line was a record that has a loglevel and a timestamp. You can see the beginning of that conversion in commits e11fea92e13f ("kmsg: export printk records to the /dev/kmsg interface") 7ff9554bb578 ("printk: convert byte-buffer to variable-length record buffer") with a number of follow-up commits to fix some painful fallout from that conversion. Over all, it took a couple of months to sort out most of it. But the upside was that you could have concurrent readers (and writers) of the kernel log and not have lines with mixed output in them. And one particular pain-point for the record-based kernel logging was exactly the fragmentary lines that are generated in smaller chunks. In order to still log them as one recrod, the continuation lines need to be attached to the previous record properly. However the explicit continuation record marker that is actually useful for this exact case was actually removed in aroundm the same time by commit 61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT") due to the incorrect belief that KERN_CONT wasn't meaningful. The ambiguity between "is this a continuation line" or "is this a plain printk with no log level information" was reintroduced, and in fact became an even bigger pain point because there was now the whole record-level merging of kernel messages going on. This patch reinstates the KERN_CONT as a real non-empty string marker, so that the ambiguity is fixed once again. But it's not a plain revert of that original removal: in the four years since we made KERN_CONT an empty string again, not only has the format of the log level markers changed, we've also had some usage changes in this area. For example, some ACPI code seems to use KERN_CONT _together_ with a log level, and now uses both the KERN_CONT marker and (for example) a KERN_INFO marker to show that it's an informational continuation of a line. Which is actually not a bad idea - if the continuation line cannot be attached to its predecessor, without the log level information we don't know what log level to assign to it (and we traditionally just assigned it the default loglevel). So having both a log level and the KERN_CONT marker is not necessarily a bad idea, but it does mean that we need to actually iterate over potentially multiple markers, rather than just a single one. Also, since KERN_CONT was still conceptually needed, and encouraged, but didn't actually _do_ anything, we've also had the reverse problem: rather than having too many annotations it has too few, and there is bit rot with code that no longer marks the continuation lines with the KERN_CONT marker. So this patch not only re-instates the non-empty KERN_CONT marker, it also fixes up the cases of bit-rot I noticed in my own logs. There are probably other cases where KERN_CONT will be needed to be added, either because it is new code that never dealt with the need for KERN_CONT, or old code that has bitrotted without anybody noticing. That said, we should strive to avoid the need for KERN_CONT. It does result in real problems for logging, and should generally not be seen as a good feature. If we some day can get rid of the feature entirely, because nobody does any fragmented printk calls, that would be lovely. But until that point, let's at mark the code that relies on the hacky multi-fragment kernel printk's. Not only does it avoid the ambiguity, it also annotates code as "maybe this would be good to fix some day". (That said, particularly during single-threaded bootup, the downsides of KERN_CONT are very limited. Things get much hairier when you have multiple threads going on and user level reading and writing logs too). Signed-off-by: Linus Torvalds --- drivers/tty/vt/vt.c | 6 +++--- include/linux/kern_levels.h | 2 +- include/linux/printk.h | 1 + kernel/printk/printk.c | 29 ++++++++++++----------------- security/selinux/ss/policydb.c | 4 ++-- 5 files changed, 19 insertions(+), 23 deletions(-) diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index 2705ca960e92ea..145a75864e44b6 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3182,11 +3182,11 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last, pr_info("Console: switching "); if (!deflt) - printk("consoles %d-%d ", first+1, last+1); + printk(KERN_CONT "consoles %d-%d ", first+1, last+1); if (j >= 0) { struct vc_data *vc = vc_cons[j].d; - printk("to %s %s %dx%d\n", + printk(KERN_CONT "to %s %s %dx%d\n", vc->vc_can_do_color ? "colour" : "mono", desc, vc->vc_cols, vc->vc_rows); @@ -3195,7 +3195,7 @@ static int do_bind_con_driver(const struct consw *csw, int first, int last, update_screen(vc); } } else - printk("to %s\n", desc); + printk(KERN_CONT "to %s\n", desc); retval = 0; err: diff --git a/include/linux/kern_levels.h b/include/linux/kern_levels.h index c2ce155d83ccb5..f282d4e87258bc 100644 --- a/include/linux/kern_levels.h +++ b/include/linux/kern_levels.h @@ -20,7 +20,7 @@ * line that had no enclosing \n). Only to be used by core/arch code * during early bootup (a continued line is not SMP-safe otherwise). */ -#define KERN_CONT "" +#define KERN_CONT KERN_SOH "c" /* integer equivalents of KERN_ */ #define LOGLEVEL_SCHED -2 /* Deferred messages from sched code diff --git a/include/linux/printk.h b/include/linux/printk.h index 696a56be7d3e2b..eac1af8502bbff 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -16,6 +16,7 @@ static inline int printk_get_level(const char *buffer) switch (buffer[1]) { case '0' ... '7': case 'd': /* KERN_DEFAULT */ + case 'c': /* KERN_CONT */ return buffer[1]; } } diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index eea6dbc2d8cf6f..7b449802089afa 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -655,11 +655,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, * better readable output. 'c' in the record flags mark the first * fragment of a line, '+' the following. */ - if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT)) - cont = 'c'; - else if ((msg->flags & LOG_CONT) || - ((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX))) - cont = '+'; + if (msg->flags & LOG_CONT) + cont = (prev_flags & LOG_CONT) ? '+' : 'c'; return scnprintf(buf, size, "%u,%llu,%llu,%c;", (msg->facility << 3) | msg->level, seq, ts_usec, cont); @@ -1819,10 +1816,9 @@ asmlinkage int vprintk_emit(int facility, int level, /* strip kernel syslog prefix and extract log level or control flags */ if (facility == 0) { - int kern_level = printk_get_level(text); + int kern_level; - if (kern_level) { - const char *end_of_header = printk_skip_level(text); + while ((kern_level = printk_get_level(text)) != 0) { switch (kern_level) { case '0' ... '7': if (level == LOGLEVEL_DEFAULT) @@ -1830,14 +1826,13 @@ asmlinkage int vprintk_emit(int facility, int level, /* fallthrough */ case 'd': /* KERN_DEFAULT */ lflags |= LOG_PREFIX; + break; + case 'c': /* KERN_CONT */ + lflags |= LOG_CONT; } - /* - * No need to check length here because vscnprintf - * put '\0' at the end of the string. Only valid and - * newly printed level is detected. - */ - text_len -= end_of_header - text; - text = (char *)end_of_header; + + text_len -= 2; + text += 2; } } @@ -1852,7 +1847,7 @@ asmlinkage int vprintk_emit(int facility, int level, * Flush the conflicting buffer. An earlier newline was missing, * or another task also prints continuation lines. */ - if (cont.len && (lflags & LOG_PREFIX || cont.owner != current)) + if (cont.len && (!(lflags & LOG_CONT) || cont.owner != current)) cont_flush(LOG_NEWLINE); /* buffer line if possible, otherwise store it right away */ @@ -1874,7 +1869,7 @@ asmlinkage int vprintk_emit(int facility, int level, * a newline, flush and append the newline. */ if (cont.len) { - if (cont.owner == current && !(lflags & LOG_PREFIX)) + if (cont.owner == current && (lflags & LOG_CONT)) stored = cont_add(facility, level, text, text_len); cont_flush(LOG_NEWLINE); diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c index 992a3153082587..bb9720b96c30c6 100644 --- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -527,9 +527,9 @@ static int policydb_index(struct policydb *p) printk(KERN_DEBUG "SELinux: %d users, %d roles, %d types, %d bools", p->p_users.nprim, p->p_roles.nprim, p->p_types.nprim, p->p_bools.nprim); if (p->mls_enabled) - printk(", %d sens, %d cats", p->p_levels.nprim, + printk(KERN_CONT ", %d sens, %d cats", p->p_levels.nprim, p->p_cats.nprim); - printk("\n"); + printk(KERN_CONT "\n"); printk(KERN_DEBUG "SELinux: %d classes, %d rules\n", p->p_classes.nprim, p->te_avtab.nel);