Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Contains' method for ranges #1434

Merged
merged 2 commits into from
Mar 17, 2016
Merged

'Contains' method for ranges #1434

merged 2 commits into from
Mar 17, 2016

Conversation

ticki
Copy link
Contributor

@ticki ticki commented Dec 28, 2015


This trait provides the method `.contains()` and implements it for all the Range types.

## Add a `.contains(item: Self::Item)` iterator method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.contains<I: PartialEq<Self::Item>>(i: I).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@nagisa
Copy link
Member

nagisa commented Dec 28, 2015

Well, I’m not sure this is very necessary. What are usecases for this method? Would it be used oftenly (I didn’t find myself needing that, ever)? Since the start and end bounds are public fields, I guess this could be prototyped out-of-std.

@seanmonstar
Copy link
Contributor

Ideally, the answer to that question is the "Motivation" section of the RFC. Some examples of desired use should be there as well.

@ticki
Copy link
Contributor Author

ticki commented Dec 29, 2015

Will extend the motivation section.

@Zarathustra30
Copy link

Should this apply to RangeFull as well, for completeness's sake? It could be part of RangeArgument.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jan 4, 2016
@ticki
Copy link
Contributor Author

ticki commented Jan 4, 2016

Should this apply to RangeFull as well, for completeness's sake? It could be part of RangeArgument.

I don't really see why that's useful, but it makes sense for the sake of consistency.

@Emerentius
Copy link

I was still waiting for specialization to hit but I wanted to propose this as well as part of a greater proposal to get x in y checks into Rust with a Contains trait (or whatever other name). It's very clear and concise syntax and we already have the in keyword due to for-loops. Might as well use it.
The trait would be implemented for everything that has a contains() method right now.

I understand there's a certain apprehension around simple syntax changes like this that don't solve major problems. It's good to be careful. However, in this case, I think the risks and costs are very low as we don't need to add any keywords and the concept that x in y is a check that tells you whether y contains x carries minor mental overhead because it's close to natural language. In exchange one gains a small bump in expressiveness with terse containment checks. Many people have probably also seen this before in Python. Then again, some have also seen this in Javascript which has rather strange behaviour.

The advantage for adding this method (with or without the added syntax) on ranges is basically not having to repeat yourself what variable should be within the boundaries given by the range as opposed to two separate checks. It's also not very useful on ranges other than Range and InclusiveRange (and their variants exclusive-exclusive, exclusive-inclusive if we ever get those) as I see it. Maybe I'm missing some generic function where it's elegant to have the other ones.
Coupled with x in y syntax it would make for nice to read a in start..end.

@BurntSushi BurntSushi self-assigned this Jan 6, 2016
@ehiggs
Copy link

ehiggs commented Jan 11, 2016

The trait Contains from the alternatives section is a good idea. Range should implement a Contains trait and maybe get containers to implement it too. This would be part of work to get more traits into std so programs can rely on those without worrying about where the implementation is coming from.

Where would trait Contains go? std::ops? That seems like the best place for it to also be implemented as an in operator as @Emerentius suggested.

@BurntSushi
Copy link
Member

@ehiggs Ever since collections reform, there's been a sort-of moratorium on traits for collections. The primary reasoning, if I remember correctly, was to hold off until HKT. If this RFC were to move forward, I would expect contains to be an inherent method just like it is on the various containers.

@ticki
Copy link
Contributor Author

ticki commented Jan 11, 2016

I don't really think a in syntax is a good idea. It seems like a relatively infrequent method, so a plain method or trait would be more fitting.

I am, personally, in favor of a method for the Iterator trait, since this pattern is common when dealing with iterators, and using any for checking if an iterator contains a given items is bad style, since it 80% of the time carry some form of overhead.

@BurntSushi
Copy link
Member

I tend to like remaining conservative on adding new methods just because they "make sense" without a clear pressing desire to actually use them. In this case, the lack of a contains method is so simple and easy to work around that I doubt there will ever be much pressing desire for a method like this.

With that said, a contains method is an established idiom, so I see its addition as very low cost. I think I would personally be OK with adding it, but frankly, I don't have a strong opinion.

The contains iterator adapter method is interesting, but I'd like to see it fleshed out more. In particular, how would other implementations of Iterator implement it? Under what circumstances would it be useful?

@ticki
Copy link
Contributor Author

ticki commented Jan 11, 2016

tend to like remaining conservative on adding new methods just because they "make sense" without a clear pressing desire to actually use them.

That makes sense (couldn't resist). Sorry if my wording wasn't clear: The reason why I think these methods should be added is that they're very common, and they look much better than just plain "compare and check" way. Let me give an example:

fn is_teenager(age: u32) -> bool {
    (13..20).contains(age)
}

instead of

fn is_teenager(age: u32) -> bool {
    age >= 13 && age < 20
}

I think we can agree that the first is more readable and less verbose than the second. It's (in my experience) also less prone to off-by-one bugs (because you don't forget the =). The gain is not huge, but I think we can agree, that it's there.

In particular, how would other implementations of Iterator implement it?

In many cases, it can be specialised. A small list of iteratables that can have this specialized for better performance:

Aside from the trivial onces (Range, RangeTo, RangeFrom, RangeFull, Repeat, Once) there are:

  • StepBy
  • EscapeUnicode
  • EscapeDefault
  • Keys<'a, K, V>
  • Values<'a, K, V>
  • collections::btree_map::Range<'a, K, V>

I'm sure there are more (these just came to my mind).

Under what circumstances would it be useful?

When you want to know if your iterator contains some value without searching naiively through the whole iterator. The main advantage of such a method is not ergonomics, but performance.

@BurntSushi
Copy link
Member

I think we can agree that the first is more readable and less verbose than the second. It's (in my experience) also less prone to off-by-one bugs (because you don't forget the =). The gain is not huge, but I think we can agree, that it's there.

Somewhat agree.

When you want to know if your iterator contains some value without searching naiively through the whole iterator. The main advantage of such a method is not ergonomics, but performance.

I understand that. But under what circumstances do you have an iterator to a container and need to check that for membership as opposed to the original container? This is what I mean by use cases. Can you point out real code that would benefit from this?

More importantly, I'm not actually convinced that performance can be reliably achieved. For example, if you have an iterator over the keys of a HashSet, how do you test membership quickly if part of the iterator has already been exhausted? (It should be possible to do this for a BTreeSet.) I guess it probably depends on the implementation details of HashSet, which I'm not familiar with. But it doesn't seem obvious to me that it should work.

@ticki
Copy link
Contributor Author

ticki commented Jan 11, 2016

Can you point out real code that would benefit from this?

One word: Generics.

Whenever you write a function or a trait that takes an iterable, you cannot rely on the underlying collection.

More importantly, I'm not actually convinced that performance can be reliably achieved. For example, if you have an iterator over the keys of a HashSet, how do you test membership quickly if part of the iterator has already been exhausted?

HashSet's iterable will just iterate over the elements in the buckets. Thus you know, at any point, what bound your contained item should satisfy. This gives O(1) performance.

@BurntSushi
Copy link
Member

I'm not personally a fan of hand-waving "generics" as a compelling use case on its own, but I understand what you mean. Without something concrete in front of me, it's hard for me to see whether contains is a useful addition to std or not. One thing I will note is that I don't think it can be reasonably prototyped out of std, since I'm guessing most of the specializations will require access to implementation details. That, to me, is fairly compelling. (Unless I'm mistaken.)

HashSet's iterable will just iterate over the buckets consuming them. Thus you know, at any point, what bound your contained item should satisfy. This gives O(1) performance.

I think I buy it.

@ticki
Copy link
Contributor Author

ticki commented Jan 11, 2016

A quick grep from crates I've cloned on my computer a little over 4% of all iterator methods called are of the form, \.any\(\|\&?x\| ?\*?x ?==, of which approximately 40-60% of has unnecessary overhead (that is, could achieve better performance through specialization). 2.4% is no small amount.

An example is if I want my function to take a list, in which case I let my function take T: IntoIterator<Item = T, IntoIter = I>. Say I want to abort my function, if some element, e, isn't found. If the caller pass an efficient set data structure, this can be done in O(1), if using .contains().

since I'm guessing most of the specializations will require access to implementation details. That, to me, is fairly compelling. (Unless I'm mistaken.)

That's true. A problem which itertools, for example, has is the lack of specialisation, hence worse performance.

@raindev
Copy link

raindev commented Feb 21, 2016

One more vote for this proposal.

Here is why:

let number = some_computation();
number >= low && number < high
// vs
(low..high).contains(some_computation())

Actually I was surprised by the lack of contains function. It is more expressive and less error prone. Why repeat a binding name three times when I need a single mental operation? By the same line of reasoning we could get rid of ranges all together. They are not that useful after all. The same behaviour could be achieved by a while loop and counting. Oh, even for loops could be ditched as well. I'm a little bit disappointed that expressiveness isn't considered high priority in Rust. By the way I'd like contains to rather stay out of the language as an ordinary function, we have enough special cases already.

@kbknapp
Copy link

kbknapp commented Feb 22, 2016

I like this idea a lot. I use any([..]) quite a lot, and there's also quite a bit of x <= y || z >= w code that could be made more readable or at least less verbose.

@alexcrichton
Copy link
Member

🔔 This RFC is now entering its week-long final comment period 🔔

@alexcrichton alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Mar 3, 2016
@ticki
Copy link
Contributor Author

ticki commented Mar 13, 2016

@alexcrichton The final comment period is over? Is a decision made?

@alexcrichton
Copy link
Member

Unfortunately the libs team didn't get a chance to discuss this last week, but we should get around to it this coming week!

@ticki
Copy link
Contributor Author

ticki commented Mar 13, 2016

Sure!

@nodakai
Copy link

nodakai commented Mar 15, 2016

+1 for the Contains trait though I found some difficulty with RangeFull which doesn't take a type parameter.

Quick and dirty search for possible use cases in the rustc code:

$ ag --ignore-dir llvm --rust '<=.*&&.*<' src
src/libgraphviz/lib.rs
428:            low as usize <= c as usize && c as usize <= high as usize

src/libsyntax/parse/mod.rs
436:        s[1..].chars().all(|c| '0' <= c && c <= '9')

src/libsyntax/parse/lexer/mod.rs
1608:        Some(c) => lo <= c && c <= hi,
1638:    (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_' || (c > '\x7f' && c.is_xid_start())
1647:    (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_' ||

src/libsyntax/codemap.rs
158:        self.lo <= other.lo && other.hi <= self.hi

src/librand/distributions/range.rs
178:                            assert!(low <= v && v < high);
180:                            assert!(low <= v && v < high);
203:                            assert!(low <= v && v < high);
205:                            assert!(low <= v && v < high);

src/libcore/num/flt2dec/strategy/grisu.rs
134:    debug_assert!(alpha <= e && e <= gamma);
422:        if 2 * ulp <= plus1w && plus1w <= threshold - 4 * ulp {

src/libcore/num/flt2dec/mod.rs
490:            parts[0] = if dec_bounds.0 <= 0 && 0 < dec_bounds.1 {
500:            let parts = if dec_bounds.0 as i32 <= vis_exp && vis_exp < dec_bounds.1 as i32 {

src/libcore/num/dec2flt/rawfp.rs
287:    debug_assert!(T::min_sig() <= x.sig && x.sig <= T::max_sig(),

src/libcore/num/dec2flt/parse.rs
94:    while i < s.len() && b'0' <= s[i] && s[i] <= b'9' {

src/libcore/num/dec2flt/mod.rs
257:    let exponent_in_range = table::MIN_E <= e && e <= table::MAX_E;

src/librbml/lib.rs
663:            let r = if first_tag as usize <= r_tag && r_tag <= last_tag as usize {
976:        } else if 0x100 <= n && n < NUM_TAGS {

src/libcore/ops.rs
1518:        (*self.start <= *x) && (*x < *self.end)

src/librustc_mir/build/matches/test.rs
188:                // Test `val` by computing `lo <= val && val <= hi`, using primitive comparisons.

src/libcollections/string.rs
1874:            if self.start <= self.end && self.end <= self_vec.len() {

src/libstd/ascii.rs
585:            let upper = if 'a' as u32 <= i && i <= 'z' as u32 { i + 'A' as u32 - 'a' as u32 }
599:            let lower = if 'A' as u32 <= i && i <= 'Z' as u32 { i + 'a' as u32 - 'A' as u32 }
666:            let lower = if 'A' as u32 <= i && i <= 'Z' as u32 { i + 'a' as u32 - 'A' as u32 }

src/libstd/sync/mpsc/mod.rs
86://!     assert!(0 <= j && j < 10);

src/libstd/sys/unix/stack_overflow.rs
110:        if guard != 0 && guard - PAGE_SIZE <= addr && addr < guard {

src/librustc_metadata/index.rs
114:        assert!((end-start)%4 == 0 && start <= end && end <= buf.len());

src/test/run-pass/simd-intrinsic-generic-cast.rs
57:    fn in_range(x: i32) -> bool { -128 <= x && x < 128 }
63:    fn in_range(x: i32) -> bool { 0 <= x && x < 128 }

src/libcoretest/num/flt2dec/strategy/grisu.rs
25:        assert!(low <= cached.e && cached.e <= high,

@alexcrichton
Copy link
Member

The libs team discussed this RFC during triage yesterday and the decision was to merge. It sounded like there's some interesting possibilities with iterators here as well, but we concluded that in any case we'd like to have these methods regardless.

Thanks again for the RFC @ticki!

@alexcrichton alexcrichton merged commit a57d6c4 into rust-lang:master Mar 17, 2016
@alexcrichton
Copy link
Member

Tracking issue: rust-lang/rust#32311

@Centril Centril added the A-ranges Proposals relating to ranges. label Nov 23, 2018
AndrewKvalheim added a commit to AndrewKvalheim/rust-exercises that referenced this pull request May 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ranges Proposals relating to ranges. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.