Skip to content

Conversation

@jeckersb
Copy link
Collaborator

@jeckersb jeckersb commented Sep 9, 2025

Split the kernel command line parsing functionality into two focused
modules. The bytes module handles raw byte parsing without UTF-8
requirements, matching kernel behavior for arbitrary byte
sequences. The utf8 module provides string-based parsing for cases
where UTF-8 validation is needed. The utf8 module reuses the
bytes module primitives where possible, and uses the fact that
utf8::Cmdline can only be constructed from valid UTF-8 to do
unchecked conversions between the two.

Signed-off-by: John Eckersberg jeckersb@redhat.com

@bootc-bot bootc-bot bot requested a review from cgwalters September 9, 2025 20:19
@jeckersb
Copy link
Collaborator Author

jeckersb commented Sep 9, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the kernel_cmdline crate into bytes and utf8 modules, which is a significant improvement in code structure and clarity. The bytes module handles raw byte parsing, while the utf8 module provides a safe, string-based API on top. The use of unsafe in the utf8 module is well-justified with SAFETY comments. The changes are well-implemented. I have one minor suggestion to clean up a test case.

@jeckersb jeckersb force-pushed the cmdline-improvements branch from a6d11bd to 7051af9 Compare September 9, 2025 20:26
@cgwalters cgwalters added the area/unsafe-code Changes `unsafe` code label Sep 9, 2025
Split the kernel command line parsing functionality into two focused
modules. The `bytes` module handles raw byte parsing without UTF-8
requirements, matching kernel behavior for arbitrary byte
sequences. The `utf8` module provides string-based parsing for cases
where UTF-8 validation is needed.  The `utf8` module reuses the
`bytes` module primitives where possible, and uses the fact that
`utf8::Cmdline` can only be constructed from valid UTF-8 to do
unchecked conversions between the two.

Signed-off-by: John Eckersberg <jeckersb@redhat.com>
@jeckersb jeckersb force-pushed the cmdline-improvements branch from 7051af9 to 5454cec Compare September 9, 2025 20:58
@jeckersb
Copy link
Collaborator Author

jeckersb commented Sep 9, 2025

I replaced all of the unsafe usage with checked + expect. It's not like this is performance-critical, the extra checks aren't going to hurt anything. I'm as certain as I can be that it will never panic (but if it does... better than UB).

@jeckersb jeckersb marked this pull request as ready for review September 9, 2025 21:12
@bootc-bot bootc-bot bot requested a review from cgwalters September 9, 2025 21:12
@jeckersb
Copy link
Collaborator Author

Ok I think this new commit works the way we want it to. Gracefully handles non-UTF-8 data but errors out if we have non-UTF-8 data in the specific parameters we're interested in.

@jeckersb
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a good refactoring of the kernel command line parsing logic. It splits the functionality into a bytes module for raw byte slice parsing and a utf8 module for validated UTF-8 string parsing. This separation improves clarity and robustness, especially in handling potentially non-UTF-8 command line arguments from the kernel. The utf8 module correctly builds upon the bytes module, ensuring code reuse and maintaining safety invariants. The changes in install.rs to use the new modules are also well-done and improve error handling around non-UTF-8 data.

I've found one critical bug in the quote stripping logic and a minor performance issue. Apart from those, the changes look great.

@jeckersb jeckersb force-pushed the cmdline-improvements branch from 4413c56 to 621ea8f Compare September 11, 2025 13:46
- Removed `From<bytes::Parameter>` implementation for
  `utf8::Parameter` and similar for `utf8::ParameterKey`.  This was
  public and would allow end-users to construct utf8 parameters from
  non-utf8 data.  Replaced internally with `from_bytes` in the places
  where we know we can safely convert known-UTF-8 data.

- Added `TryFrom<bytes::Paramter>` implementation for
  `utf8::Parameter` to allow checked conversions, plus tests.

- Added `iter_utf8` and `find_utf8` to `bytes::Cmdline`, plus tests.

- Updated `find_root_args_to_inherit` in bootc to use these
  improvements.  Notably bootc will now allow non-UTF8 data in the
  kernel cmdline, *unless* it occurs in parameters that bootc is
  explicitly looking for.

- Added more tests to `find_root_args_to_inherit` to validate expected
  functionality with non-UTF-8 data.

- Fixed a parser bug that gemini pointed out with unmatched quotes,
  plus tests to check for that.

Signed-off-by: John Eckersberg <jeckersb@redhat.com>
@jeckersb jeckersb force-pushed the cmdline-improvements branch from 621ea8f to 6cede27 Compare September 11, 2025 14:24
@cgwalters cgwalters merged commit d947ed1 into bootc-dev:main Sep 11, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/unsafe-code Changes `unsafe` code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants