Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add align_offset intrinsic and [T]::align_to function #2043

Merged
merged 8 commits into from
Sep 11, 2017

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Jun 26, 2017

@withoutboats withoutboats added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jun 27, 2017
@Screwtapello
Copy link

Under "Motivation":

The standard library (and most likely many crates) [...] check whether a pointer is aligned in order to perform optimizations like reading multiple bytes at once. Not only is this code which is easy to get wrong, and which is hard to read (and thus increasing the chance of future breakage) but it also makes it impossible for miri to evaluate such statements. This means that miri cannot do utf8-checking, since that code contains such optimizations.

I assume it does something like go byte-by-byte up until the alignment of a u64, then the SIMD accelleration kicks in?

But later, under "How We Teach This":

This means that while !is_aligned(ptr) { ... } should be considered a potential infinite loop.

...so this intrinsic could not be relied on for things like the UTF-8 validation mentioned in the motivation section?

@scottmcm
Copy link
Member

I'm not certain that this is the right public API. For example, I was looking at the mem::swap loop recently, and wondering whether it'd be worth aligning one of the three copies. But the way I'd default to writing that would be to mod by alignment to memcpy the prefix first, not a loop while !is_aligned. It also feels a bit odd that this requires creating an improperly-aligned pointer first, then checking if it was ok. Maybe there's a way that this could instead produce the properly-aligned pointer from the uncertain one? Or maybe even the desired API is more like &[T] -> (&[T], &[U], &[T]), to explicitly separate off the unaligned prefix and suffix...

And a minor thing: the title of this makes me think, "intrinsics are internal implementation details; do they even need RFCs?". I'd like the "this is proposed for stability in core&std" to be more visible, such as at least in the summary (not just a line at the bottom of the design).

@nagisa
Copy link
Member

nagisa commented Jun 28, 2017 via email

@nagisa
Copy link
Member

nagisa commented Jun 28, 2017 via email

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 28, 2017

  1. Code that checks for alignment that's not natural to the type? (I do not remember there being an argument for the size of alignment and can't check ATM);

You can always cast your pointer to a type supporting your desired alignment.

  1. Something like corrode converted code/code written by a C programmer who would do the test the old way;

Well... right now we already don't support pointer arithmetic inside constants, so adding this feature will actually allow more code to work in const eval. You can always write code that won't be part of const eval.

Okay so I find it kinda funny that MIRI is not expected to support checking alignment the old way

This feature is needed for constant evaluation. CTFE cannot ever treat a pointer into any position of [u8; N] as aligned to an alignment other than 1, because it could either be &arr[0] which is aligned, or &arr[1] or... you don't know before you run the program and the array is placed at some arbitrary memory location. I mean we could decide that any const eval type is aligned like usize, which is what C does (malloc guarantees that you get at least the same alignment that the pointer type has). Rust doesn't guarantee this, even if jemalloc does it (tested by allocating a few u8s on the heap).

But even if we guarantee pointer alignment, we'd have the same issue for alignments above pointer alignment that we have right now for alignments bigger than 1.

...so this intrinsic could not be relied on for things like the UTF-8 validation mentioned in the motivation section?

Well... the utf8-validation does not rely on the optimization to ever trigger. It would just keep on doing the unoptimized stuff.

fn alignto(&[u8]) -> (&[u8], &[T], &[u8])
the desired API is more like &[T] -> (&[T], &[U], &[T]),

That would work, as long as I'm allowed to implement it as

fn alignto<T>(slice: &[u8]) -> (&[u8], &[T], &[u8]) {
    (slice, &[], &[])
}

Otherwise we have gained nothing. This API has the advantage over is_aligned of being hard to use wrongly.

@nagisa
Copy link
Member

nagisa commented Jun 28, 2017 via email

@oli-obk
Copy link
Contributor Author

oli-obk commented Jun 28, 2017

Ok, I agree that is_aligned has too many problems to be useful, and it's not the most elegant solution to the actual problem we're trying to solve. I'll change the rfc to suggest an intrinsic for splitting a slice into aligned and nonaligned parts

@nagisa the function splitting slices also doesn't support your 4KB alignment use case. But it would be addressed once #[repr(align(4096))] (rust-lang/rust#33626) is through, since you can just create a wrapper type with the correct alignment.

@oli-obk oli-obk changed the title Add is_aligned intrinsic Add alignto intrinsic and mem::alignto function Jul 4, 2017
@strega-nil
Copy link

@nagisa "Furthermore it seems to me that MIRI is in a pecular spot where it has to
just support unaligned access for all types" I don't get why this is? I feel like, on UB, miri should just error, not continue working.

@scottmcm
Copy link
Member

Thanks for the update, @oli-obk! Sorry for being slow to respond.

A few thoughts:

  • I think this could benefit from the RFC "Guide-level" and "Reference-level" explanations to replace how we teach and detailed design sections #2059 split. The reference/implementation information here is good, but I think it could use a bit more of "here's how you use this". Maybe show how the examples in the motivation change?
  • Is it possible that the intrinsic and what's exposed should be different? As a strawman, it might be fine for the intrinsic to be fn unaligned_prefix_len<T>(p: *const (), n: usize) -> usize, with the method in core for eventual stabilization providing the nice slice-oriented interface atop it. Or the intrinsic could take the desired alignment as a parameter even if the core function uses type alignment, etc.
  • Why always u8? It seems reasonable that, say, a sum function would want to use fn (&[f32]) -> (&[f32], &[simd::f32x8], &[f32]).
  • nitpick: You mention the function is unsafe, so the code blocks and signatures should probably have unsafe in them.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 16, 2017

I think this could benefit from the RFC #2059 split. The reference/implementation information here is good, but I think it could use a bit more of "here's how you use this". Maybe show how the examples in the motivation change?

Done. Plus added a note that we should extend the align documentation to mention the alignto functions and intrinsics

Is it possible that the intrinsic and what's exposed should be different? As a strawman, it might be fine for the intrinsic to be fn unaligned_prefix_len(p: *const (), n: usize) -> usize, with the method in core for eventual stabilization providing the nice slice-oriented interface atop it. Or the intrinsic could take the desired alignment as a parameter even if the core function uses type alignment, etc.

Not just possible, but necessary. The optimizations in the stdlib can be very frickly (you can't always use iterators or loops). I added the most basic intrinsic that covers all use cases (basically the intrinsic is a more advanced form of the originally suggested is_aligned intrinsic), but kept the library functions.

Why always u8? It seems reasonable that, say, a sum function would want to use fn (&[f32]) -> (&[f32], &[simd::f32x8], &[f32]).

Just an oversight. Fixed.

nitpick: You mention the function is unsafe, so the code blocks and signatures should probably have unsafe in them.

Done.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Added a bunch more comments, but they're relatively minor.

Add a new intrinsic

```rust
fn alignto(ptr: *const (), align: usize) -> usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a backend implementing this allowed to make any assumptions about the value of align?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an intrinsic, so I'd guess invalid alignments are UB. I'd assume a non-power of two alignment or an alignment bigger than 2^32 would be considered invalid.

If that is ok, I'll add it as documentation. That's the rules that the compiler internal code has around alignment.

The alternative would be to add a generic argument for the pointer, but that would be confusing, since you have to cast the pointer to the target type, and the pointee type would change behaviour. Pointee types are easy to get wrong in casts.

cannot be aligned. Since the caller needs to check whether the returned offset
would be in-bounds of the allocation that the pointer points into, returning
`usize::max_value()` will never be in-bounds of the allocation and therefor
the caller cannot act upon the returned offset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me thinks that this is more of a None case, since it's fundamentally never actually needed to go more than align-1 bytes to find the desired alignment. But then it's a never-stable intrinsic, and the "needs to check the offset anyway" argument is a good one...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I considered this, but found that documenting that the result always needs to be checked much more realistic, especially since you might not want matches or map + closures in optimization critical code.


```rust
fn alignto(ptr: *const (), align: usize) -> usize {
align - (ptr as usize % align)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are implementations expected to return align for an already-aligned pointer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops... no.

under `core::mem` and `std::mem` that simplifies the common use case, returning
the unaligned prefix, the aligned center part and the unaligned trailing elements.
The function is unsafe because it essentially contains a `transmute<[U; N], T>`
if one reads from the aligned slice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is pointer-based type punning, not transmute-based (memcpy) type punning.


Add an intrinsic (`fn alignto(ptr: *const (), align: usize) -> usize`)
which returns the number of bytes that need to be skipped in order to correctly align the
pointer `ptr` to `align`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshed: now that this doesn't return a pointer, maybe it should have a different name? Would help make it easier to talk about the two things, too, if the intrinsic and the eventually-stable methods had different names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I chose align_offset for now, although byte_offset_to_aligned might be clearer if wordier.

// due to `size_of::<T>() % size_of::<U>() == 0`,
// the fact that `size_of::<U>() > align_of::<U>()`,
// and the fact that `align_of::<T>() > align_of::<U>()` if `offset != 0` we know
// that `offset % source_size == 0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: technically we don't know this if the intrinsic returns usize::max(), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but in that case we get a value way bigger than slice.len() and thus result in the case where head == slice and mid and tail are empty. Which is the correct behaviour.

let head_count = offset / source_size;
let split_position = core::cmp::max(slice.len(), head_count);
let (head, tail) = slice.split_at(split_position);

But I just noticed that for zsts this function panics due to division by zero. Not sure what to do about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a ZST check at the beginning that just returns (slice, &[], &[])? It's not like aligned reads of nothing are better than unaligned reads of nothing 😆

on all current platforms. `alignto_mut` is expanded accordingly.

Any backend may choose to always produce the following implementation, no matter
the given arguments, since the intrinsic is purely an optimization aid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this text is left-over from when the slice version was the intrinsic? Seems like now it'll be something more like "if the backend compiles the intrinsic as return usize::max_value(), this method will optimize down to the following"


```rust
let ptr = v.as_ptr();
let align = ptr.offset(index).alignto(usize_bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's surprising to me that the text above says "Once can clearly see the three slice parts", but this code doesn't use the slice method. Also, consider linking to the full code -- I think, from the other issue, that it's https://github.com/rust-lang/rust/blob/f09576c4a41727a8d10bbfd8fd3fb2e10e1be3b3/src/libcore/str/mod.rs#L1433?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yea, that's actually where I noticed the slice function is not sufficient

## Example 2 (slices)

The `memchr` impl in the standard library explicitly uses the three phases of
the `align_to` functions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshed/typo: align_to here, alignto in other places.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be called align_to in general? alignto feels unidiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept mistyping it as align_to, too xD.

#[repr(align = 64)]
struct TwoWord([usize; 2]);

let (head, mid, tail) = std::mem::align_to<TwoWord>(text);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the extra struct here necessary? I'd have thought that just align_to::<[usize;2], _>(text) would be fine, since the original code only uses usize-alignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I misread that code. It aligns to usize, but steps in two usize chunks.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 17, 2017

Bikeshed: Upvote this comment if you want align_to, downvote if you want to stay with alignto.

@nagisa
Copy link
Member

nagisa commented Jul 17, 2017

One way to ensure only powers of two are allowed in the align_to function is to take the power, not the alignment itself.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 17, 2017

@nagisa that would make it riddiculous to use... because you'd have to figure out the power of the result of align_of

@est31
Copy link
Member

est31 commented Jul 17, 2017

you'd have to figure out the power of the result of align_of

Easy:

0usize.leading_zeros() - (std::mem::size_of::<T>()).leading_zeros()

@scottmcm scottmcm added T-libs-api Relevant to the library API team, which will review and decide on the RFC. T-lang Relevant to the language team, which will review and decide on the RFC. and removed T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Jul 20, 2017
@scottmcm
Copy link
Member

Retagging as libs help would be good on the method to expose in core, but doesn't really affect lang.

You could also consider a T-compiler issue for just the intrinsic to start, and an RFC for the public API later.

@scottmcm scottmcm removed their assignment Jul 20, 2017
@scottmcm
Copy link
Member

For the align parameter on the intrinsic, maybe it should just have the same rules as an align in Layout from Allocator? https://github.com/rust-lang/rfcs/blob/master/text/1398-kinds-of-allocators.md#layout-api

Using the power is neat, since aligning can be done with lshr+shl nuw, but I think it's too inconsistent with how alignments are represented elsewhere in Rust and in LLVM.

@eddyb
Copy link
Member

eddyb commented Jul 21, 2017

Why would shifts be faster than let mask = align - 1; (x + mask) & mask?

@KodrAus
Copy link
Contributor

KodrAus commented Aug 21, 2017

That's a good question. There are a few unsafe inherent methods scattered around slice types, plenty on pointers since there's not much you can do with them that isn't unsafe. So I think there is actually a bit of precedent for inherent unsafe methods with unchecked indexing and slicing and whatnot.

From an ergonomics perspective it seems like a win to me.

@scottmcm
Copy link
Member

Aspirationally, It'd be great to have a safe function that could do the &[T] => (&[T], &[Chunk<T>], &[T]) split (maybe with an N on Chunk?). That's probably impractical now, but it might affect the bikeshed.

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 23, 2017

@scottmcm I think the only way to get that to be safe would be to have a runtime check for matching sizes and some form of #[transparent], but it should be a trivial addition once this RFC is through and #[transparent] exists.

@KodrAus
Copy link
Contributor

KodrAus commented Aug 24, 2017

Hmm, typically unsafe variants would have a prefix/suffix and the safe variant wouldn't, but align_to_unsafe_mut is a pretty ugly colour for a bikeshed... Assuming you would want both the safe and unsafe variants in that case.

I'm tempted to ignore a potential safe variant as far as this RFC is concerned and we can figure it out in the future when all the pieces are available to make such a thing possible.

Does that seem reasonable, or would you rather have some degree of future-proofing built in here?

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 24, 2017

Does that seem reasonable, or would you rather have some degree of future-proofing built in here?

I don't think we can create a safe variant without some form for unsafe marker traits and a canonical transparent aligning wrapper type. I'd think that the actual alignment function would not take a generic argument anymore and would probably be on the wrapper type instead of on slices. But in any case, I definitely do not think that it should be named align_to, since it'll do something slighty more complex than that.

# Unresolved questions
[unresolved]: #unresolved-questions

* produce a lint in case `sizeof<T>() % sizeof<U>() != 0` and in case the expansion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be another useful clippy lint to have. Otherwise you'll just end up with something like (&[T], &[], &[]) which isn't useful, but also isn't the end of the world.

@aturon
Copy link
Member

aturon commented Aug 25, 2017

@KodrAus, who has been shepherding this RFC, believes it's ready for consideration by the broader libs team. Therefore:

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 25, 2017

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Aug 25, 2017
@scottmcm
Copy link
Member

@KodrAus The unsafe version of this has all the same gotchas as transmute_copy, which is one of the most unsafe and discouraged functions in all of std. And it might even have more traps because it can lead to non-byte-oriented reads that could be undef because of padding bytes. I understand that it needs to behave the way it does, and it's certainly useful, so I have no objections to adding it, I just think it deserves a scarier name. Maybe something along the lines of aligned_transmute?

@KodrAus
Copy link
Contributor

KodrAus commented Aug 26, 2017

@scottmcm Personally I think the fact that the function is unsafe should be enough to make users and future readers go read the scarily worded docs it will have and understand those edges better.

But at the same time, I think it's definitely worthwhile to look at ways to make the function communicate its edges better, especially given it'll live on a very commonly used type. If everyone is on-board with communicating the fact that this is transmuting in the method name itself then we can do another round of thinking about the name.

@strega-nil
Copy link

@scottmcm imo, the issue with transmute is not what it does - there are many ways to get transmute. The issue is with how easy it is. This is not easy, and people don't turn to it just because it's the easiest solution at the time to get their programs typechecking.

bors added a commit to rust-lang/rust that referenced this pull request Aug 30, 2017
Add align_offset intrinsic

see rust-lang/rfcs#2043 for details and the plan towards stabilization (reexport in `core::mem` via various convenience functions)

as per @scottmcm 's [comment](rust-lang/rfcs#2043 (comment)), this is just the intrinsic (which is obviously unstable).
@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Sep 1, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Sep 1, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Sep 1, 2017
@oli-obk oli-obk changed the title Add alignto intrinsic and mem::alignto function Add align_offset intrinsic and [T]::align_to function Sep 1, 2017
@rfcbot
Copy link
Collaborator

rfcbot commented Sep 11, 2017

The final comment period is now complete.

@aturon aturon merged commit 5752662 into rust-lang:master Sep 11, 2017
@aturon
Copy link
Member

aturon commented Sep 11, 2017

This RFC has been merged! Tracking issue.

@oli-obk oli-obk deleted the is_aligned branch September 11, 2017 16:49
@Centril Centril added A-intrinsic Proposals relating to intrinsics. A-alignment Proposals relating to alignment. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-alignment Proposals relating to alignment. A-intrinsic Proposals relating to intrinsics. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.