Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lint / ImproperCTypes: better handling of indirections, take 2 #134697

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

niacdoial
Copy link
Contributor

This PR tries to re-add the changes in #131669 (which were reverted in #134064 after one (1) nightly),
and adds better coverage of ty_kinds:

  • in the take-1-added TypeSizedness enum and its construction
  • in a new test file

The changes in the original PR aim to make ImproperCTypes/ImproperCTypesDefinitions produce better warnings when dealing with indirections (Box, &T, *T), especially for those to DSTs.

r? workingjubilee (because you reviewed the first attempt)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 23, 2024
@niacdoial
Copy link
Contributor Author

(
hi Jubilee, I'm back at this again!
I know this is not the best time of year to add PRs, so I'm fine with postponing this if you don't feel like tackling it these upcoming weeks.
In any case, have some nice end-of-year festivities, if you celebrate any!
)

@niacdoial
Copy link
Contributor Author

ah, and before I forget: a small part of the new test file is commented out because it hits ICE #134587, but there should be more than decent coverage anyway

@workingjubilee
Copy link
Member

unfortunately the lint needs to be gutted and rewritten.

@workingjubilee
Copy link
Member

workingjubilee commented Dec 24, 2024

Also while I was possibly having a mild case of get-there-itis and thus mostly tried to just make sure things were coherent, I would prefer all new code for the lint be in compiler/rustc_lint/src/types/improper_ctypes.rs.

@workingjubilee workingjubilee added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 4, 2025
@workingjubilee
Copy link
Member

The version cut happened so there will be less time pressure now.

@niacdoial
Copy link
Contributor Author

I have a first version for compiler/rustc_lint/src/types/improper_ctypes.rs if you want.
it's less of a from-the-ground-up rewrite as it is scrapping the original for parts, if the analogy makes sense.

you probably have things to say about its architecture, even if the whole thing still have a bunch of TODO comments
the progress so far looks like this:

  • completely separate the type-checking and reporting systems
  • (part of the way there) remove some of the special cases and integrate them to the "main logic"
    • check_for_opaque_types is still a "special case" part of the checking logic
    • Cstr and Cstring are also somewhat special-cased because the advice for them depends on the type around them, if any
    • the unit type is handled in multiple places, see if this can be fixed
  • (almost complete) compile, pass existing tests
    • only failed tests are for Cstring, due to different error messages
    • one unrelated test had to have a second "#[allow(improper_ctypes)]" added, but it makes more sense for it to need that anyway
  • better separation of the different checks in different visit_* methods of ImproperCTypesVisitor
  • better tracking of how the currently-checked type is used (static, function argument, function return's inner type, etc...)
    • raises questions about the separation of improper_ctypes and improper_ctypes_definitions versus declared/defined functions, especially when FnPtr:s are involved
  • allow single argument check to emit multiple errors (for fnptr:s, structures with multiple FFI-unsafe fields, etc)
  • review what is considered FFI-safe or not (once everything else is complete)

If you want to take a look in this state, should I just commit it here? (possibly put the PR in draft mode while I'm at it?)
or send you the files in a different way?

@bors
Copy link
Collaborator

bors commented Jan 15, 2025

☔ The latest upstream changes (presumably #135525) made this pull request unmergeable. Please resolve the merge conflicts.

@niacdoial
Copy link
Contributor Author

niacdoial commented Jan 18, 2025

aaaand I think I have something that's "first draft" material! (should I put this pull in the "draft" state?)
There's still a bunch of TODOs (...well, they were changed to FIXMEs to be pushed here) for questions I didn't manage to answer (and a bunch of failing tests because I don't know if the error should be here), but yeah.

here's a list of some of my remaining questions and concerns:

  • visit_numeric seems too x86_64-specific

  • should we revisit the distinction between ImproperCTypes and ImproperCTypesDefinitions?

    • part 1: the output ("external fn" or vs "external block" vs other possibilities)
    • part 2: handling opaque types (there's a high correlation between ImproperCTypesDefinitions and places where we allow FFI-opaque types to be fully specified. do we want this correlation to be 1?)
  • more on FFI-opaque types: how do we handle that in the context of the "context switch" between functions and possible FnPtr arguments? The answer that seems correct currently prevents a stage1 compiler from being built

    • should we introduce a std::ffi::FfiOpaquePtr type? (which would be a *const c_void and some phantomdata, on first approximation)
  • for indirections whose values may be supplied by non-rust code: do we only allow pointers (and Optionstd::ptr::NonNull), or do we also allow Option<&T> and Option<Box<T>>?

  • not sure if the new error messages are intelligible in all cases (especially if there's a type param like Self or <Self as ::std::ops::Add<Self>>::Output that gets resolved in the error message).

  • it feels like the current handling of CStr/Cstring and Option-like enums uses special casing, since those are tested for in multiple places.

  • if we deny references and boxes in defined functions, what of &self in methods? We don't allow *const Self, last I checked.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Jan 23, 2025

☔ The latest upstream changes (presumably #135921) made this pull request unmergeable. Please resolve the merge conflicts.

- removed special-case logic for a few cases (including the unit type,
  which is now only checked for in one place)
- moved a lot of type-checking code in their dedicated visit_* methods
- reworked FfiResult type to handle multiple diagnostics per type
  (currently imperfect due to type caching)
- reworked the messages around CStr and CString
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
   Compiling num-conv v0.1.0
   Compiling powerfmt v0.2.0
   Compiling time-core v0.1.2
   Compiling nu-ansi-term v0.50.1
error: `extern` fn uses type `&RustString`, which is not FFI-safe
   |
68 |     buf: &RustString,
   |          ^^^^^^^^^^^ not FFI-safe
   |

@workingjubilee
Copy link
Member

hmm.

@workingjubilee workingjubilee added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 31, 2025
@workingjubilee workingjubilee self-requested a review January 31, 2025 05:14
@workingjubilee workingjubilee marked this pull request as draft January 31, 2025 05:15
@workingjubilee
Copy link
Member

aaaand I think I have something that's "first draft" material! (should I put this pull in the "draft" state?)

Yes, it's a good marker for "I don't want this merged yet, even if it looks done".

@workingjubilee
Copy link
Member

visit_numeric seems too x86_64-specific

It probably is.

should we revisit the distinction between ImproperCTypes and ImproperCTypesDefinitions?

Yes, but in particular, not to just repartition them between: I think breaking them into as many conceptually-smaller lints as possible is good, as long as each one is a distinct idea (no splitting just for the sake of splitting!).

more on FFI-opaque types: how do we handle that in the context of the "context switch" between functions and possible FnPtr arguments? The answer that seems correct currently prevents a stage1 compiler from being built

I'm not sure what you mean?

for indirections whose values may be supplied by non-rust code: do we only allow pointers (and Option<std::ptr::NonNull>), or do we also allow Option<&T> and Option<Box<T>>?

We must allow Rust code to declare a pointer in a C signature to be Option<&T> or a number of things about our FFI story fall apart.

it feels like the current handling of CStr/CString and Option-like enums uses special casing, since those are tested for in multiple places.

Yes, probably.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial nits on a first pass

Comment on lines +170 to +219
// match (self, other) {
// (Self::FfiUnsafe(_), _) => {
// // nothing to do
// },
// (_, Self::FfiUnsafe(_)) => {
// *self = other;
// },
// (Self::FfiPhantom(ref ty1),Self::FfiPhantom(ty2)) => {
// println!("whoops, both FfiPhantom: self({:?}) += other({:?})", ty1, ty2);
// },
// (Self::FfiSafe,Self::FfiPhantom(_)) => {
// *self = other;
// },
// (_, Self::FfiSafe) => {
// // nothing to do
// },
// }

let s_disc = std::mem::discriminant(self);
let o_disc = std::mem::discriminant(&other);
if s_disc == o_disc {
match (self, &mut other) {
(Self::FfiUnsafe(ref mut s_inner), Self::FfiUnsafe(ref mut o_inner)) => {
s_inner.append(o_inner);
}
(Self::FfiPhantom(ref ty1), Self::FfiPhantom(ty2)) => {
debug!("whoops: both FfiPhantom, self({:?}) += other({:?})", ty1, ty2);
}
(Self::FfiSafe, Self::FfiSafe) => {}
_ => unreachable!(),
}
} else {
if let Self::FfiUnsafe(_) = self {
return;
}
match other {
Self::FfiUnsafe(o_inner) => {
// self is Safe or Phantom: Unsafe wins
*self = Self::FfiUnsafe(o_inner);
}
Self::FfiSafe => {
// self is always "wins"
return;
}
Self::FfiPhantom(o_inner) => {
// self is Safe: Phantom wins
*self = Self::FfiPhantom(o_inner);
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following compiles:

Suggested change
// match (self, other) {
// (Self::FfiUnsafe(_), _) => {
// // nothing to do
// },
// (_, Self::FfiUnsafe(_)) => {
// *self = other;
// },
// (Self::FfiPhantom(ref ty1),Self::FfiPhantom(ty2)) => {
// println!("whoops, both FfiPhantom: self({:?}) += other({:?})", ty1, ty2);
// },
// (Self::FfiSafe,Self::FfiPhantom(_)) => {
// *self = other;
// },
// (_, Self::FfiSafe) => {
// // nothing to do
// },
// }
let s_disc = std::mem::discriminant(self);
let o_disc = std::mem::discriminant(&other);
if s_disc == o_disc {
match (self, &mut other) {
(Self::FfiUnsafe(ref mut s_inner), Self::FfiUnsafe(ref mut o_inner)) => {
s_inner.append(o_inner);
}
(Self::FfiPhantom(ref ty1), Self::FfiPhantom(ty2)) => {
debug!("whoops: both FfiPhantom, self({:?}) += other({:?})", ty1, ty2);
}
(Self::FfiSafe, Self::FfiSafe) => {}
_ => unreachable!(),
}
} else {
if let Self::FfiUnsafe(_) = self {
return;
}
match other {
Self::FfiUnsafe(o_inner) => {
// self is Safe or Phantom: Unsafe wins
*self = Self::FfiUnsafe(o_inner);
}
Self::FfiSafe => {
// self is always "wins"
return;
}
Self::FfiPhantom(o_inner) => {
// self is Safe: Phantom wins
*self = Self::FfiPhantom(o_inner);
}
}
}
match (self, other) {
(Self::FfiUnsafe(_), _) => {
// nothing to do
}
(myself, other @ Self::FfiUnsafe(_)) => {
*myself = other;
}
(Self::FfiPhantom(ref ty1), Self::FfiPhantom(ty2)) => {
println!("whoops, both FfiPhantom: self({:?}) += other({:?})", ty1, ty2);
}
(myself @ Self::FfiSafe, other @ Self::FfiPhantom(_)) => {
*myself = other;
}
(_, Self::FfiSafe) => {
// nothing to do
}
}

}

impl CTypesVisitorState {
/// wether the type is used (directly or not) in a static variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// wether the type is used (directly or not) in a static variable
/// whether the type is used (directly or not) in a static variable

fn is_in_static(self) -> bool {
((self as u8) & 0b0010) != 0
}
/// wether the type is used (directly or not) in a function, in return position
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// wether the type is used (directly or not) in a function, in return position
/// whether the type is used (directly or not) in a function, in return position

Comment on lines +397 to +398
/// wether the type is used (directly or not) in a defined function
/// in other words, wether or not we allow non-FFI-safe types behind a C pointer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// wether the type is used (directly or not) in a defined function
/// in other words, wether or not we allow non-FFI-safe types behind a C pointer,
/// whether the type is used (directly or not) in a defined function
/// in other words, whether or not we allow non-FFI-safe types behind a C pointer,

ret
}

/// wether the value for that type might come from the non-rust side of a FFI boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// wether the value for that type might come from the non-rust side of a FFI boundary
/// whether the value for that type might come from the non-rust side of a FFI boundary

Comment on lines +872 to +873
// but for some reason one can just go and write function *pointers* like that:
// `type Foo = extern "C" fn(::std::ffi::CStr);`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Because unsized function parameters are something we may want to support.
  2. The code may not be well-formed: as you may have noticed at some point, you get warnings even if you get errors (usually), and this is because we lint even on "bad" code. This is because rustc didn't use to, once upon a time, and it was a bad debugging experience.

Comment on lines +893 to +895
// you would think that int-range pattern types that exclude 0 would have Option layout optimisation
// they don't (see tests/ui/type/pattern_types/range_patterns.stderr)
// so there's no need to allow Option<pattern_type!(u32 in 1..)>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I should fix that probably


type Sig<'tcx> = Binder<TyCtxt<'tcx>, FnSig<TyCtxt<'tcx>>>;

/// for a given `extern "ABI"`, tell wether that ABI is *not* considered a FFI boundary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// for a given `extern "ABI"`, tell wether that ABI is *not* considered a FFI boundary
/// for a given `extern "ABI"`, tell whether that ABI is *not* considered a FFI boundary

}
}

/// Determine if a type is sized or not, and wether it affects references/pointers/boxes to it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Determine if a type is sized or not, and wether it affects references/pointers/boxes to it
/// Determine if a type is sized or not, and whether it affects references/pointers/boxes to it

@workingjubilee workingjubilee added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 20, 2025
impl CTypesVisitorState {
/// wether the type is used (directly or not) in a static variable
fn is_in_static(self) -> bool {
((self as u8) & 0b0010) != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
((self as u8) & 0b0010) != 0
((self as u8) & (Self::StaticTy as u8) != 0

}
/// wether the type is used (directly or not) in a function, in return position
fn is_in_function_return(self) -> bool {
let ret = ((self as u8) & 0b0100) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let ret = ((self as u8) & 0b0100) != 0;
let ret = ((self as u8) & (Self::ReturnTyInDeclaration as u8)) != 0;

/// in other words, wether or not we allow non-FFI-safe types behind a C pointer,
/// to be treated as an opaque type on the other side of the FFI boundary
fn is_in_defined_function(self) -> bool {
let ret = ((self as u8) & 0b1000) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etc... please encase the magic numbers.

}

impl<'a, 'tcx> ImproperCTypesVisitor<'a, 'tcx> {
/// Checks wether an `extern "ABI" fn` function pointer is indeed FFI-safe to call
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Checks wether an `extern "ABI" fn` function pointer is indeed FFI-safe to call
/// Checks whether an `extern "ABI" fn` function pointer is indeed FFI-safe to call

// 'fake' declarations (in traits, needed to be implemented elsewhere), and definitions.
// (for instance, definitions should worry about &self with Self:?Sized, but fake declarations shouldn't)

// wether they are FFI-safe or not does not depend on the indirections involved (&Self, &T, Box<impl Trait>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler spellcheckers cannot catch this because "wether" is in fact a word (though definitely not the one you meant), and more complex ones are infuriating because they are never quite smart enough to understand when you meant to use a word that way.

Suggested change
// wether they are FFI-safe or not does not depend on the indirections involved (&Self, &T, Box<impl Trait>),
// whether they are FFI-safe or not does not depend on the indirections involved (&Self, &T, Box<impl Trait>),

Comment on lines +772 to +776
if def.variants().is_empty() {
// Empty enums are implicitely handled as the never type:
// FIXME think about the FFI-safety of functions that use that
return FfiSafe;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! is never a valid argument, but it is a valid return value.

@niacdoial
Copy link
Contributor Author

alright, sorry for taking a while!

I'm currently planning what changes I'll do in terms of splitting the lint(s)
my current idea is to separate based on the nature of the thing being (presumably) propped up against a FFI boundary

  • improper_ctypes: what's definitely an interface to an outside library (extern statics, extern function declarations)
  • improper_ctypes_fn_definitions: functions written in rust intended to be exported
  • improper_ctypes_callbacks: FnPtr arguments, no matter in what function they are being used
  • improper_ctypes_ty_definitions: repr(C) structs/enums(/unions)?

more on FFI-opaque types: how do we handle that in the context of the "context switch" between functions and possible FnPtr arguments? The answer that seems correct currently prevents a stage1 compiler from being built

I'm not sure what you mean?

well, this is more or less answered in what I said before that, but my question was about how to deal with "switching" from checking arguments for, say, a function definition, to checking the arguments of a FnPtr argument?

  1. should the nature of the lint change?
    (temptative answer: yes)
  2. how should FFI-Safe-pointers-to-FFI-Unsafe-pointees work in FnPtr arguments? Should it be the rules for extern fn declarations? (throw the lint because one should use *const c_void, an extern type declaration, etc...) or the rules for extern fn definitions? (allow that, the function's body needs the full type even if it's opaque to the other side of the FFI boundary)
    (temptative answer: it should be the former, but parts of the rustc codebase doesn't follow this rule, so I can't get a stage1 compiler if I make that the rule)

// you would think that int-range pattern types that exclude 0 would have Option layout optimisation
// they don't (see tests/ui/type/pattern_types/range_patterns.stderr)
// so there's no need to allow Option<pattern_type!(u32 in 1..)>.

oh, I should fix that probably

I... maybe? I can't for the life of me find the link to that again but I think I saw a discussion about that and type covariance/contravariance,
where i32 is 1.. is a subtype of i32 (well that was under consideration), meaning fn(Option<i32>) is a type of fn(Option<i32 is 1..>) and it might have impacts on whether there should be an optimisation because of transmutation?

Though you'll definitely know more than me on all the moving parts.
Especially assuming you might have looked at this more in the past week.


As for the rest of your advice, I already took all this in!
thanks for shedding light on my code, one nit at a time!

@workingjubilee
Copy link
Member

workingjubilee commented Mar 2, 2025

honestly, I think that initially we might be better off just cutting out linting on thin pointers based on their pointees. then we can spend a bit of time rethinking the justification for the lint and rewriting the lint for that case, I think, and having it be specifically a lint for an improper_ctype_pointee.

@niacdoial
Copy link
Contributor Author

this sounds like we might have combinatorics on our hands with that. (one lint for repr(C) structures, one for pointees within, one for extern function definitions, one for pointees in their arguments, etc.)
still, OK for leaving this for later.

@workingjubilee
Copy link
Member

Hm... I think it's possible to have a case where a lint fires only when two lints are enabled.

@workingjubilee
Copy link
Member

...Regardless, I think any reworking should start with scaling as far as possible back to "this is definitely inappropriate" versus "this is trying to catch a pattern that can be used correctly but usually isn't".

And I think that anything that assumes that a C-ABI-compatible pointer with an ABI-incompatible pointee is not merely being used opaquely is an example of the latter, of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants