Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add std::num::NonZeroU32 and friends, deprecate core::nonzero #2307

Merged
merged 5 commits into from
Mar 18, 2018

Conversation

SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Jan 21, 2018

Add std::num::NonZeroU32 and eleven other concrete types (one for each primitive integer type) to replace and deprecate core::nonzero::NonZero<T>. (Non-zero/non-null raw pointers are available through std::ptr::NonNull<U>.)

Rendered
Tracking issue

@SimonSapin
Copy link
Contributor Author

Not featured in the RFC because I couldn’t find a way to express it more objectively: I think the Zeroable trait is ugly and the fact that is exists it all is not good API design.

@eddyb
Copy link
Member

eddyb commented Jan 21, 2018

A more general approach would use const generics to encode the fact that an arbitrary range of primitive values (at some offset) is being removed from a type.

@SimonSapin
Copy link
Contributor Author

@eddyb Is this an argument against adding non-zero integer types? I’d like to keep details of a more general feature out of scope for this PR, thanks :)

@eddyb
Copy link
Member

eddyb commented Jan 21, 2018

Yes, if const generics were further along, I'd be more against this RFC. As it stands, a mention in alternatives is probably enough.

@Amanieu
Copy link
Member

Amanieu commented Jan 21, 2018

I would argue that -1 or !0 is much more likely to be an "invalid" value in the case of integers, for example:

  • File descriptors: -1 is often used to indicate "no file".
  • Generally any situation where an integer is used to represent an index into an array, -1 is often used to indicate None.

Are non-zero integers actually used in practice to justify stabilizing these types? I have only ever seen NonZero used for pointers.

@eddyb
Copy link
Member

eddyb commented Jan 21, 2018

I think @stoklund also used the max value in Cranelift, and the Rust compiler itself has a few types for which that's a dummy. So maybe we can defer this until a more general approach can be tried out.

@SimonSapin
Copy link
Contributor Author

Servo has a few types like pub struct FooId(NonZero<u32>); with values that start at 1, but as far as I understand they do so in order to use NonZero. The value themselves are not meaningful, only their equality comparison is. Some of these are sometimes used in Option<_>, but I don’t know how important the layout optimization really is. (Maybe we manipulate few enough of these values rarely enough that it doesn’t make a big difference.)

These could likely just as well have !0 be the forbidden value or even the whole upper half of the unsigned range (values with the high bit set), if such a type was available.

@SimonSapin
Copy link
Contributor Author

If this is an argument for rejecting this RFC one question is: who’s volunteering for championing the designing and RFC and implementation for that more general feature?

@bstrie
Copy link
Contributor

bstrie commented Jan 24, 2018

Agree that hardcoding the sentinel value to zero seems limiting. Also agree that waiting for const generics seems like a showstopper (and might also end up overengineered).

Is there any reason this has to be in core/std? Looking at the example implementation in the RFC, it looks like all of this is using stable features and could easily exist in a third-party crate. If possible, that seems like the way to go.

@eddyb
Copy link
Member

eddyb commented Jan 24, 2018

@bstrie The compiler has to be told what the invalid values are - hence const generics (alternatively, associated const might also be doable, although it could be clunkier).

@SimonSapin
Copy link
Contributor Author

@bstrie A wrapper type that maintains invariant in order to help with some algorithm’s correctness can be outside of std. For memory layout optimizations such as Option<_> to take advantage of those invariants though the compiler needs to know about them.

@bstrie
Copy link
Contributor

bstrie commented Jan 24, 2018

Ah, I glossed over the non_zero lang item there... figured that this implementation was taking advantage of some of the new enum niche optimizations somehow. Is a new lang item really preferable to stabilizing the Zeroable trait? (I see a lot of dislike for Zeroable in rust-lang/rust#27730, but not much explicit discussion of its downsides.)

@SimonSapin
Copy link
Contributor Author

SimonSapin commented Jan 24, 2018

@bstrie The non_zero lang item already exists. In the RFC as proposed it or the compiler doesn’t need to change at all, the NonZero<T> type that has the lang item becomes a private implementation detail of some public types like pub struct NonNull<T: ?Sized>(NonZero<*const T>); and pub struct NonZeroU32(NonZero<u32>);.

I’d forgotten to list the stabilizing NonZero<T: Zeroable> as an alternative, fixed. But the Motivation section already explained that it could be used with non-primitive types. What are the forbidden values of NonZero<Foo> with for example struct Foo(u8, u16);? With struct Foo(u8, &'static str);? Zeroable is an unsafe trait, what invariant do its impls need to maintain? This RFC’s proposal avoids those question entirely.

@Fleshgrinder
Copy link

Fleshgrinder commented Jan 28, 2018

I am coming from this Reddit where I basically asked for unsigned integer types without zero. I am using the term natural number to refer to ℕ⁺ = {1, 2, …}, hence, excluding zero, which I refer to as whole numbers ℕ⁰ = {0, 1, 2, …}.

I see great value in the addition of these types to Rust and have many use cases where they are useful. I do not know how const generics could replace natural numbers but if they would enable us to define range subtypes on the fly than that would definitely be the best thing (something similar to what tangledSpaghetti mentioned here).

@SimonSapin
Copy link
Contributor Author

@Fleshgrinder const generics would allow having in the standard library a type like this:

struct BoundedU32<const LOWER_BOUND: u32, const UPPER_BOUND: u32>(/* … */);

Then natural integers (up to bit storage capacity) become just a special case that you could define in your crates:

type NonZeroU32 = BoundedU32<1, u32::MAX>;

@Fleshgrinder
Copy link

I figured, well, that would definitely be better than having dedicated types due to the flexibility they offer.

@matthieu-m
Copy link

It seems to me that there are two different requests here:

  • Expressing the invariants of a type (0 is invalid),
  • Leveraging the absence of some values to optimize the layout of enums.

I am not sure that mixing them is the right approach. For example, imagine that I create two types: Odd<T> and Even<T> which contain, respectively, only odd and even integers.

I would not be able to use a Bounded<T, min: T, max: T> type to express the invariants, nor could I express to the enum layout optimizations which values are unused via a range.

It seems that for the case of enum layout optimization, relying on const fn applied to a bit-pattern may be a more flexible alternative. Maybe using:

trait Niche {
     const fn capacity() -> usize;
     const fn get_index(bits: Bits<Self>) -> Option<usize>;
     const fn set_index(bits: Bits<Self>, index: usize) -> Bits<Self>;
}

The nice thing about functions being that they are easily composed. For example, the implementation for Option<T> can query its underlying T for the presence of niche patterns.

@Fleshgrinder
Copy link

So I was playing a little with the nightly NonZero and there are a couple of oddities that I actually hoped would not be there if Rust provides these types of the box like we have it with normal integer types. Namely: usage in constants. The problem is that the zero check happens at runtime exactly as it would be if I implement this type totally on my own.

The situation would be the same with const generics. Still a good thing all in all since it cuts down on boilerplate a little …

@eddyb
Copy link
Member

eddyb commented Jan 28, 2018

@SimonSapin "Bounded" is the opposite of what you want (unless you make it wrap around, which would be confusing but equivalent in power) i.e. your example would be WithInvalid<u32, 0..=0>.

@matthieu-m Using anything other than a single range per enum requires potentially expensive rewrites when the enum discriminant is ever operated on.

We could be tracking multiple ranges and pick a contiguous piece of them for each enum, such that Option<Option<WithInvalid<WithInvalid<_, 1..=1>, 3..=3>>> uses 1 for the inner Option::None and 3 for the outer Option::None.

Even/odd is more interesting problem, pretty similar to taking advantage of pointer alignment.
If we're willing to evaluate const fns for this, it'd have to be an API that returns a byte offset into the type, a primitive integer size, and a range for that integer, given some starting point, e.g.:

struct InvalidRange {
    offset: usize,
    size: usize,
    range: RangeInclusive<u128>
}
trait HasInvalid {
    const fn next_invalid(previous: Option<InvalidRange>) -> Option<InvalidRange>;
}

@SimonSapin
Copy link
Contributor Author

I started it, but this is getting into the details of designing a feature that is not the one proposed in this RFC. Please let’s keep this thread about this proposal. We can discuss the existence of a potential more general feature as an argument for rejecting this RFC, without going into the details of how exactly that other feature works.

(Though once again I am not making that argument. We can do both.)

If someone does want to follow through discussing these details, please consider making a separate RFC or forum thread.

@fstirlitz
Copy link

The non-orthogonality of this proposal seems somewhat unsatisfying. What if some day someone wishes to have an AtomicNonZeroU32 somewhere? (Yes, I know AtomicU32 doesn't implement Zeroable now anyway, but...)

@SimonSapin
Copy link
Contributor Author

On AtomicU32 you can do foo.fetch_sub(1, Ordering::SeqCst) for example, which might set the value to zero. There is no way to maintain the full generality of available atomic operations while at the same time maintaining invariants such as non-zero.

Even if we made AtomicU32: Zeroable, today’s NonZero<T> requires T: Copy (to support the get method and most trait impls). AtomicU32 does not implement Copy because implicit copying like this is non-atomic.

@fstirlitz
Copy link

Okay, atomics are a bad example. My point was a little more abstract, though: eliminating genericity may make it harder/impossible to simultaneously forbid zero values and use other features. It may be the case that this genericity is not worth the trouble (I'm not especially aware of all the trade-offs involved), but I think it's at least worth thinking about.

@Fleshgrinder
Copy link

Maybe it would be better to implement the non-zero types as primitives, like I expected them to be, in that case, atomics would work like the work right now with the others. I honestly never thought about an integer without zero and fail to see the use case, however, for real natural numbers it should be quite easy to define such a primitive type.

@SimonSapin
Copy link
Contributor Author

SimonSapin commented Jan 29, 2018

I honestly never thought about an integer without zero and fail to see the use case

The main use case is using struct SomethingId(NonZeroU32); and having Option<SomethingId> take only 4 bytes on memory instead of 8. A 4 bytes difference is not much by itself, but when you need to store a couple IDs for example inside every DOM node of every document in a browser, it adds up.

Maybe it would be better to implement the non-zero types as primitives

Why?

I don’t think having NonZeroU32 be a primitive type instead of a standard library type makes much a difference for users (except that you wouldn’t need to import and we can achieve that by adding standard library types to the prelude, but I’m not convinced we want to in this case). It seems like it would mostly affect compiler implementation details.

in that case, atomics would work like the work right now with the others

I don’t understand, you explain some more what you mean here? (At the moment we have AtomicPtr, AtomicU32, AtomicBool and others which are standard library types, not primitives.)

@Fleshgrinder
Copy link

Integer: {-∞, …, -2, -1, 0, 1, 2, …, ∞}
Unsigned Integer (Whole Number): {0, 1, 2, …, ∞}

What I said is that I fail to see the use case for non-zero integers (hence {-∞, …, -2, -1, 1, 2, …, ∞}) but that I see the use case for non-zero unsigned integers (Natural Number; hence {1, 2, …, ∞}). The latter is what applies to identifiers (usually) and is the use case I am coming from.

With primitives I meant u32 gets is equivalent n32 and afterwards it should be possible to simply write const x: n32 = 1;. I also figured (and here I could be wrong) that afterwards it is possible to have an AtomicN32 since it is possible to have an AtomicU32 for a u32. As I see it (and I could be wrong here as well) the problem for NonZero + AtomicU32 is that some checks are only possible at runtime.

@SimonSapin
Copy link
Contributor Author

SimonSapin commented Jan 29, 2018

Ah ok. Yes, the NonZeroI* types are included just because they’re super easy to have once we have NonZeroU* but I’m not particularly attached to them. If there’s consensus that way I don’t mind removing them from the RFC and only keeping the unsigned one.

@aturon
Copy link
Member

aturon commented Mar 7, 2018

@joshtriplett Agreed! I think it should be listed as an unresolved question and hence recorded on the tracking issue.

@aturon
Copy link
Member

aturon commented Mar 8, 2018

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 8, 2018

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Mar 8, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 8, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

SimonSapin added a commit to SimonSapin/rust that referenced this pull request Mar 17, 2018
@scottmcm
Copy link
Member

scottmcm commented Mar 18, 2018

Bikeshed: how about Positive32 and friends? https://en.wikipedia.org/wiki/Positive_number http://mathworld.wolfram.com/PositiveInteger.html

Edit: I also like PositiveU32 as est31 suggests below.

@SimonSapin
Copy link
Contributor Author

SimonSapin commented Mar 18, 2018

The fact that “positive” does not include zero in English is something that I’ve learned a couple years ago but it is still subtle in my mind. This is not the case in French, “nombre positif” correctly translates to “non-negative number”. Relying on such a subtle (at least to some parts of the world) aspect of what “positive” means to express the core property of these types doesn’t sound great to me.

@scottmcm
Copy link
Member

scottmcm commented Mar 18, 2018

This is the usual meaning in programming, though. For example,

And overall, u32 existing means there's only one interpretation of num::Positive32 that makes sense.

@est31
Copy link
Member

est31 commented Mar 18, 2018

NonZeroU32 is a fine name IMO. Maybe PositiveU32 is better, but the u is important, as otherwise you might not know the representation: Positive32 might have a two's complement representation. This obviously affects the maximum value.

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 18, 2018

The final comment period is now complete.

@Centril Centril merged commit b728bf6 into rust-lang:master Mar 18, 2018
@Centril
Copy link
Contributor

Centril commented Mar 18, 2018

Huzzah! This RFC has been accepted.

Tracking issue: rust-lang/rust#49137

@SimonSapin SimonSapin deleted the concrete-nonzero branch March 18, 2018 15:54
@Ixrec
Copy link
Contributor

Ixrec commented Mar 19, 2018

FWIW, even though I'm aware that "positive" technically excludes zero, whenever a non-mathematician says "positive" I don't feel safe assuming that they really meant to exclude zero, because so many people use positive/negative colloquially without thinking about the zero case. Even if I was reading a language spec that used the word "positive", unless there was an explicit mention of zero or non-negative/non-positive very close by (which is the case in all of @scottmcm's links), I'd feel the need to double-check that I hadn't discovered an unintentional underspecification. So I think NonZero* is a lot clearer even in English.

@SimonSapin
Copy link
Contributor Author

@Ixrec I agree, but since this RFC is merged further discussion should probably go into the tracking issue: rust-lang/rust#49137

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 23, 2018
Add 12 num::NonZero* types for primitive integers, deprecate core::nonzero

RFC: rust-lang/rfcs#2307
Tracking issue: ~~rust-lang#27730 rust-lang#49137
Fixes rust-lang#27730
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Mar 23, 2018
Add 12 num::NonZero* types for primitive integers, deprecate core::nonzero

RFC: rust-lang/rfcs#2307
Tracking issue: ~~rust-lang#27730 rust-lang#49137
Fixes rust-lang#27730
@Centril Centril added the A-types-libstd Proposals & ideas introducing new types to the standard library. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-types-libstd Proposals & ideas introducing new types to the standard library. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.