Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UB in *_offset functions #2450

Merged
merged 4 commits into from
Jun 18, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ edition = "2018"
cfg-if = "1.0"
libc = "0.2.62"
parking_lot = ">= 0.11, < 0.13"
memoffset = "0.6.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a note here that it's to support safe offset calculations on versions predating addr_of? I imagine in the future we might consider removing again.

Copy link
Member

@adamreichold adamreichold Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are using it to compute offsets in all cases since even when addr_of is available, using memoffset enables us to avoid writing out its usage ourselves.

Personally, I think as it takes care of what is available where and how to use it, memoffset does carry its weight as a dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this, I was thinking more that after a future MSRV bump that may change as addr_of would be trivial for us to implement safely. I'll worry about that in the future, perhaps 😄


# ffi bindings to the python interpreter, split into a separate crate so they can be used independently
pyo3-ffi = { path = "pyo3-ffi", version = "=0.16.5" }
Expand Down
97 changes: 4 additions & 93 deletions src/pycell.rs
Original file line number Diff line number Diff line change
Expand Up @@ -433,8 +433,6 @@ pub struct PyCell<T: PyClassImpl> {
contents: PyCellContents<T>,
}

// This struct has mirrored definitions in `weaklist_offset` and `dict_offset`,
// who need the same struct layout to function correctly.
#[repr(C)]
pub(crate) struct PyCellContents<T: PyClassImpl> {
pub(crate) value: ManuallyDrop<UnsafeCell<T>>,
Expand Down Expand Up @@ -627,110 +625,23 @@ impl<T: PyClass> PyCell<T> {

/// Gets the offset of the dictionary from the start of the struct in bytes.
pub(crate) fn dict_offset() -> ffi::Py_ssize_t {
use memoffset::offset_of;
use std::convert::TryInto;
use std::mem::MaybeUninit;

#[cfg(addr_of)]
let offset = unsafe {
// The much simpler way, using addr_of!
let cell = MaybeUninit::<Self>::uninit();
let base_ptr = cell.as_ptr();
let dict_ptr = std::ptr::addr_of!((*base_ptr).contents.dict);
(dict_ptr.cast::<u8>()).offset_from(base_ptr.cast::<u8>())
};
#[cfg(not(addr_of))]
let offset = unsafe {
// The annoying way using a dummy struct because there's no way to get
// a pointer without going through a reference, which is insta-UB
// if it is not initialized.

use std::mem::size_of;

#[repr(C)]
struct PyCellDummy<T: PyClassImpl> {
ob_base: MaybeUninit<<T::BaseType as PyClassBaseType>::LayoutAsBase>,
contents: PyCellContentsDummy<T>,
}

#[repr(C)]
struct PyCellContentsDummy<T: PyClassImpl> {
value: MaybeUninit<T>,
borrow_checker: MaybeUninit<<T::PyClassMutability as PyClassMutability>::Storage>,
thread_checker: MaybeUninit<T::ThreadChecker>,
dict: MaybeUninit<T::Dict>,
weakref: MaybeUninit<T::WeakRef>,
}

assert_eq!(size_of::<PyCell<T>>(), size_of::<PyCellDummy<T>>());
assert_eq!(
size_of::<PyCellContents<T>>(),
size_of::<PyCellContentsDummy<T>>()
);

// All of the pycelldummy's fields are maybeuninit, which require no initialization
let cell = MaybeUninit::<PyCellDummy<T>>::uninit().assume_init();

let base_ptr: *const PyCellDummy<T> = &cell;
let dict_ptr: *const MaybeUninit<T::Dict> = &(*base_ptr).contents.dict;
let offset = offset_of!(PyCell<T>, contents) + offset_of!(PyCellContents<T>, dict);

(dict_ptr.cast::<u8>()).offset_from(base_ptr.cast::<u8>())
};
// Py_ssize_t may not be equal to isize on all platforms
#[allow(clippy::useless_conversion)]
offset.try_into().expect("offset should fit in Py_ssize_t")
}

/// Gets the offset of the weakref list from the start of the struct in bytes.
pub(crate) fn weaklist_offset() -> ffi::Py_ssize_t {
use memoffset::offset_of;
use std::convert::TryInto;
use std::mem::MaybeUninit;

#[cfg(addr_of)]
let offset = unsafe {
// The much simpler way, using addr_of!
let cell = MaybeUninit::<Self>::uninit();
let base_ptr = cell.as_ptr();
let weaklist_ptr = std::ptr::addr_of!((*base_ptr).contents.weakref);
(weaklist_ptr.cast::<u8>()).offset_from(base_ptr.cast::<u8>())
};
#[cfg(not(addr_of))]
let offset = unsafe {
// The annoying way using a dummy struct because there's no way to get
// a pointer without going through a reference, which is insta-UB
// if it is not initialized.

use std::mem::size_of;

#[repr(C)]
struct PyCellDummy<T: PyClassImpl> {
ob_base: MaybeUninit<<T::BaseType as PyClassBaseType>::LayoutAsBase>,
contents: PyCellContentsDummy<T>,
}

#[repr(C)]
struct PyCellContentsDummy<T: PyClassImpl> {
value: MaybeUninit<T>,
borrow_checker: MaybeUninit<<T::PyClassMutability as PyClassMutability>::Storage>,
thread_checker: MaybeUninit<T::ThreadChecker>,
dict: MaybeUninit<T::Dict>,
weakref: MaybeUninit<T::WeakRef>,
}

assert_eq!(size_of::<PyCell<T>>(), size_of::<PyCellDummy<T>>());
assert_eq!(
size_of::<PyCellContents<T>>(),
size_of::<PyCellContentsDummy<T>>()
);

// All of the pycelldummy's fields are maybeuninit, which require no initialization
let cell = MaybeUninit::<PyCellDummy<T>>::uninit().assume_init();

let base_ptr: *const PyCellDummy<T> = &cell;
let weaklist_ptr: *const MaybeUninit<<T as PyClassImpl>::WeakRef> =
&(*base_ptr).contents.weakref;
let offset = offset_of!(PyCell<T>, contents) + offset_of!(PyCellContents<T>, weakref);

(weaklist_ptr.cast::<u8>()).offset_from(base_ptr.cast::<u8>())
};
// Py_ssize_t may not be equal to isize on all platforms
#[allow(clippy::useless_conversion)]
offset.try_into().expect("offset should fit in Py_ssize_t")
Expand Down