Skip to content

ACP: Add nul-terminated version of core::panic::Location::file #466

Closed
@Darksonn

Description

@Darksonn

Proposal

Problem statement

When using #[track_caller] in codebases that mix C and Rust, you often wish to pass the caller's filename to a C api. However, this usually requires a nul-terminated string.

Motivating examples or use cases

I would like to utilize this in the Linux kernel to implement a Rust equivalent of the following utility:

/**
 * might_sleep - annotation for functions that can sleep
 *
 * this macro will print a stack trace if it is executed in an atomic
 * context (spinlock, irq-handler, ...). Additional sections where blocking is
 * not allowed can be annotated with non_block_start() and non_block_end()
 * pairs.
 *
 * This is a useful debugging help to be able to catch problems early and not
 * be bitten later when the calling function happens to sleep when it is not
 * supposed to.
 */
#define might_sleep() do { __might_sleep(__FILE__, __LINE__); might_resched(); } while (0)

It's essentially an assertion that crashes the kernel if a function is used in the wrong context. The filename and line number is used in the error message when it fails. Unfortunately, the __might_sleep function requires the filename to be a nul-terminated string.

Note that unlike with things like the file!() macro, it's impossible for us to do this ourselves statically. Copying the filename at runtime into another string to nul-terminate it is also not a great solution because we need to create the string even if the assertion doesn't fail, as the assertion is checked on the C side.

Solution sketch

Add a new function core::panic::Location::file_with_nul that returns a &CStr instead of a &str.

This has the implication that the compiler must now always store a nul-byte in the filename when generating the string constants.

Alternatives

It could make sense to return *const c_char instead of &CStr to avoid having to compute the length when all you need is a pointer you can pass into C code. This could be important as possible future work involves reducing the size of Location by removing the length. In this case, the existing core::panic::Location::file function would be updated to compute the length using the nul-terminator. Right now, the &CStr return value forces us to compute the length even when we don't need it.

Links and related work

An implementation can be found at rust-lang/rust#131828.

For more context, please see zulip and the Linux kernel mailing list. This is one of RfL's wanted features in core.

Adding a nul-terminator to the Location string has been tried before in rust-lang/rust#117431. However, back then, it was motivated by reducing the size of Location, and the previous PR did not actually expose the c string in the API.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

cc @ojeda @Noratrieb

Activity

pitaj

pitaj commented on Oct 17, 2024

@pitaj

IIRC string constants are always null terminated in the binary anyways.

Noratrieb

Noratrieb commented on Oct 17, 2024

@Noratrieb
Member

I don't think they are

programmerjake

programmerjake commented on Oct 17, 2024

@programmerjake
Member

it's a constant with known length, I wouldn't consider "computing the length" to be a problem.
e.g.:

pub struct Location<'a> {
    file_with_nul: &'a [u8],
    line: u32,
    column: u32,
}

impl<'a> Location<'a> {
    pub fn file(&self) -> &'a str {
        unsafe { str::from_utf8_unchecked(self.file_with_nul.get_unchecked(..self.file_with_nul.len() - 1)) }
    }
    pub fn file_cstr(&self) -> &'a CStr {
        unsafe { CStr::from_bytes_with_nul_unchecked(self.file_with_nul) }
    }
}
programmerjake

programmerjake commented on Oct 17, 2024

@programmerjake
Member

one thing that changes though is that if we get an API to set the implicit #[track_caller] argument, that we'd have to pass a &'static [u8] in that has both the terminating nul and is utf-8, instead of the much more common &str

Darksonn

Darksonn commented on Oct 17, 2024

@Darksonn
Author

it's a constant with known length, I wouldn't consider "computing the length" to be a problem.

But it becomes a problem if we later go through with the size optimization from rust-lang/rust#117431. Then, the length is no longer known, so it really does have to be computed by calling strlen or similar.

scottmcm

scottmcm commented on Oct 17, 2024

@scottmcm
Member

It seems unfortunate if all the track_caller data in the binary needs to be bigger for everyone just because some people want to pass it to a random C API sometimes.

Could we have file_cstr!() instead, so only people dealing in C strings need to deal with it? Yes, that's not as nice as track_caller, but oh well?

programmerjake

programmerjake commented on Oct 17, 2024

@programmerjake
Member

if combined with the optimization in Location's size, it's probably smaller to use nul-terminated strings, since each string is only needed for a whole file and only needs one more byte whereas the size field is duplicated for every tracked location and is either 4 or 8 extra bytes in each one.

Noratrieb

Noratrieb commented on Oct 18, 2024

@Noratrieb
Member

Using null terminated strings may also unlock linker string merging size optimizations, which could further decrease binary size.
It seems unlikely to me that anyone cares about the tiny size increase - those who really really care about location size are gonna use something like -Zlocation-detail=none anyways, which deletes all this info.

Darksonn

Darksonn commented on Oct 18, 2024

@Darksonn
Author

Could we have file_cstr!() instead, so only people dealing in C strings need to deal with it? Yes, that's not as nice as track_caller, but oh well?

It would make a lot of things that could otherwise be function calls into macros. :(

traviscross

traviscross commented on Oct 22, 2024

@traviscross

The libs-api team talked about this today on a short-staffed call. Those on the call had a question:

Looking at the motivating example, why not write a version of __might_sleep that takes a pointer and a length? What are the drawbacks to that?

As context, the feeling on the call was that this represents a tradeoff of whether to make the C codebase more Rust-like or Rust more C-like, and people weren't sure it was worth making Rust more C-like, and paying any costs here for all users, in this case.

It was noted on the call that this PR...

...had been closed as not being worth it. Though, reading the comments here now more closely, such as the one from @Noratrieb (who was the author of that PR) here, I gather that perhaps there is some interest in trying this again.

If there is a way to do this that does in fact result in a worthwhile improvement for all Rust users, then my own feeling is that probably would have affected the mood on the call about this proposal.

workingjubilee

workingjubilee commented on Oct 23, 2024

@workingjubilee
Member

I am not sure that there is in fact a "cost" here.

Every file path already has a de-facto terminator: it is suffixed by ".rs", and this causes it to be "prefix-free": https://en.wikipedia.org/wiki/Prefix_code

This is the same property possessed by NUL-terminated CStr. Each has "\0" at the end, which means no CStr can be a prefix of another CStr. Thus the argument about the cost seems wildly speculative, unless we wish to introduce a very curious new state of affairs, like not providing the ".rs" suffix!

Meanwhile, these file paths also could benefit from linker-driven deduplication (which revolves around the fact that CStrs can share a suffix).

64 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACP-acceptedAPI Change Proposal is accepted (seconded with no objections)T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bonzini@shepmaster@Amanieu@RalfJung@BurntSushi

        Issue actions

          ACP: Add nul-terminated version of `core::panic::Location::file` · Issue #466 · rust-lang/libs-team