-
Notifications
You must be signed in to change notification settings - Fork 143
Improve kernel argument parsing #1476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new, more robust kernel command line parser, kernel::Cmdline
, which correctly handles quoted values. The new parser is used throughout the codebase, replacing the previous simpler implementation. The changes are well-tested and improve correctness. I've provided a few suggestions in kernel.rs
to improve code conciseness and use more idiomatic Rust patterns.
e021bb8
to
f56e9d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sane just some style nits. Thanks for working on this!
crates/lib/src/kernel.rs
Outdated
impl<'a, T: AsRef<[u8]> + ?Sized> From<&'a T> for Parameter<'a> { | ||
fn from(input: &'a T) -> Self { | ||
let input = input.as_ref(); | ||
let equals = input.iter().position(|b| *b == EQUALS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a lot cleaner to use split_once
for this. There are several other examples in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split_once
is nightly-only for slices though
https://doc.rust-lang.org/std/primitive.slice.html#method.split_once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, annoying.
You know I think the bigger picture thing here is that almost everyone doing parsing of u8
is using a higher level library and so there's not much motivation to improve std here. There's a lot of those.
One most relevant for this use case is BStr
which has https://docs.rs/bstr/latest/bstr/trait.ByteSlice.html#method.split_once_str
bstr is a pretty good fit for this, it's not in our depchain today but that's just because we don't use regex
among other things.
OK as is but perhaps worth a comment like
// https://doc.rust-lang.org/std/primitive.slice.html#method.split_once is nightly only right now
// We might consider using bstr in the future too
or so?
Not a blocker but something to discuss I think raised by this that we will want to address is that we still have overlap/duplication with at least the kargs parsing in composefs-boot. So one option is to move this there (and also make it |
f56e9d4
to
d650114
Compare
This adds a new `kernel::Cmdline` struct, which is populated either via `Cmdline::from` (borrowed) or `Cmdline::from_proc` (owned). This attempts to follow the same behavior as the kernel, which is mostly covered in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/cmdline.c?id=e8d780dcd957d80725ad5dd00bab53b856429bc0#n227 The algorithm is basically: - Scan forward until you find the first unquoted isspace() byte. That's the end of the param. - If you encounter an `=` along the way, note where. That's where it will terminate the key and split for the value. Any future `=` are not treated as special. - The value can be quoted to allow spaces, but is unquoted only in as much as `"` is removed from the first or last byte. You can still have `"` in the middle of the value. This operates on `&[u8]` because the kernel does not enforce any particular encoding for the cmdline. Iterating using `Cmdline::iter()` will emit the `Parameter` type, which has helper methods `key_lossy()` and `value_lossy()` to convert potentially-non-UTF8 data into `String`s. Resolves: bootc-dev#1425 Signed-off-by: John Eckersberg <jeckersb@redhat.com>
d650114
to
cf7f150
Compare
This adds a new
kernel::Cmdline
struct, which is populated eithervia
Cmdline::from
(borrowed) orCmdline::from_proc
(owned).This attempts to follow the same behavior as the kernel, which is
mostly covered in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/cmdline.c?id=e8d780dcd957d80725ad5dd00bab53b856429bc0#n227
The algorithm is basically:
byte. That's the end of the param.
=
along the way, note where. That's where itwill terminate the key and split for the value. Any future
=
arenot treated as special.
much as
"
is removed from the first or last byte. You can stillhave
"
in the middle of the value.This operates on
&[u8]
because the kernel does not enforce anyparticular encoding for the cmdline. Iterating using
Cmdline::iter()
will emit theParameter
type, which has helpermethods
key_lossy()
andvalue_lossy()
to convertpotentially-non-UTF8 data into
String
s.Resolves: #1425
Signed-off-by: John Eckersberg jeckersb@redhat.com