Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UB in *_offset functions #2450

Merged
merged 4 commits into from
Jun 18, 2022
Merged

Fix UB in *_offset functions #2450

merged 4 commits into from
Jun 18, 2022

Conversation

mejrs
Copy link
Member

@mejrs mejrs commented Jun 12, 2022

If the addr_of macro is not available, this created a zeroed PyCell<T>, but arbitrary T's do not permit zero-initialization.

@adamreichold
Copy link
Member

I suspect it had been considered when this code was written, but isn't memoffset the central place where these workarounds are collected?

@mejrs
Copy link
Member Author

mejrs commented Jun 12, 2022

I suspect it had been considered when this code was written, but isn't memoffset the central place where these workarounds are collected?

memoffset just chooses the least UB way to do this, and it cannot avoid UB if addr_of! is unavailable. The fallback as written here is not UB.

FWIW I suspect the real reason that memoffset was not used is to avoid adding a dependency.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Jun 12, 2022

memoffset shall never reach exploited UB using cargo + rustc:

  • before addr_of!, such fixed list of rustc implementations did not exploit its UB;
  • after addr_of!, it no longer has UB.

It is thus as fine as it could possibly be.

The current PR duplicates a structure's layout, which means these could theoretically end up out of sync as the codebase is updated, which in a vacuum (I don't know the constraints of PyO3), seems orders of magnitude more error-prone than using an unofficial Rust implementation with no addr_of! which would exploit that UB


One approach to make the code robust to original struct updates, while keeping the current interesting strategy, would be:

type Apply<Wrapper, T> = <Wrapper as CanWrap<T>>::T;

trait CanWrap<T> : 'static { type T; }

/// For any `T`, `Apply<Identity, T> = T`.
enum Identity {}
impl<T> CanWrap<T> for Identity { type T = T; }

/// For any `T`, `Apply<InManuallyDrop, T> = ManuallyDrop<T>`
enum InManuallyDrop {}
impl<T> CanWrap<T> for InManuallyDrop { type T = ManuallyDrop<T>; }

#[repr(C)]
pub(crate) struct PyCellContents<T: PyClassImpl, Wrapper = Identity>
where
    // this would not be needed if we had (type) GATs
    Wrapper : CanWrap< ManuallyDrop<UnsafeCell<T>> >,
    Wrapper : CanWrap< <T::PyClassMutability as PyClassMutability>::Storage >,
    Wrapper : CanWrap< T::ThreadChecker >,
    Wrapper : CanWrap< T::Dict >,
    Wrapper : CanWrap< T::WeakRef >,
{
    pub(crate) value: Apply<Wrapper, ManuallyDrop<UnsafeCell<T> >,
    pub(crate) borrow_checker: Apply<Wrapper, <T::PyClassMutability as PyClassMutability>::Storage >,
    pub(crate) thread_checker: Apply<Wrapper, T::ThreadChecker >,
    pub(crate) dict: Apply<Wrapper, T::Dict >,
    pub(crate) weakref: Apply<Wrapper, T::WeakRef >,
}

that way, the MaybeUninit-projected struct could be queried by using PyCellContents<T, InManuallyDrop>, and would be guaranteed to be kept in sync.

  • (Sadly, without (type) GATs, this is a mouthful, so I still think that using memoffset (or inlining its logic) is the least error-prone approach)

  • (another approach would be to define a helper derive to get the …Dummy def auto-generated alongside the non-Dummy one, which with a macro_rules! macro is not that annoying to write)

@mejrs
Copy link
Member Author

mejrs commented Jun 12, 2022

Hmm, I like simple, and just using memoffset sounds simple.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, using memoffset sounds reasonable to me. Please add a CHANGELOG entry. Not sure whether this classes as Packaging or Fixed category.

Thanks for cleaning up the tests with flaky type inference issues too! 😁

@@ -17,6 +17,7 @@ edition = "2018"
cfg-if = "1.0"
libc = "0.2.62"
parking_lot = ">= 0.11, < 0.13"
memoffset = "0.6.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note here that it's to support safe offset calculations on versions predating addr_of? I imagine in the future we might consider removing again.

Copy link
Member

@adamreichold adamreichold Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are using it to compute offsets in all cases since even when addr_of is available, using memoffset enables us to avoid writing out its usage ourselves.

Personally, I think as it takes care of what is available where and how to use it, memoffset does carry its weight as a dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this, I was thinking more that after a future MSRV bump that may change as addr_of would be trivial for us to implement safely. I'll worry about that in the future, perhaps 😄

@davidhewitt davidhewitt merged commit 517f4a8 into PyO3:main Jun 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants