-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the partition_dedup/by/by_key methods for slices #54058
Conversation
r? @shepmaster (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
Internal code is allowed to use unstable features. I'd expect that switching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
Thinking on this... I really want to call the two returned slices "buckets" because I think of this algorithm as "assigning values in the slice to one bucket or the other", based on the predicate of "is this equal to the previous one".
Having said that, I wonder if this function could/should more accurately be referred to as a type of partition. The return type certainly matches much more closely with Iterator::partition
.
Something like partition_dedup
?
|
This feels uncontroversial enough to me. You are really just moving some code from |
Whoops, was it to me to open this tracking issue ? |
I think that's fine. |
The last thing I'd suggest before merging is squashing the commits to remove the back and forth of the renaming and doc tweaks, etc. My suggestion would be squash everything except for "Make the |
Could you please show use cases for the "duplicates" part of the result? They should justify the added complexity of not just returning the deduped part. Also, in most cases what I need is to dedup sorted iterables and have an iterable as result (as the uniq method in D language std lib does: https://dlang.org/phobos/std_algorithm_iteration.html#uniq ). |
I don't know that I'd agree that there's much complexity added by doing so. The algorithm naturally results in the two pieces existing. That being said, it could be argued that by returning both pieces, it may restrict us on other implementations of algorithms that could also be used.
You are looking for |
Future Rust programmers that in their code will want to use that function will need to remember or look in the docs or ask the IDE to know what of the two results is the one they want, and they will have to write this, as done in the first post of this thread:
Every small thing like this adds up, and makes writing code slower. That's why you need good use cases to justify increased API complexity. Python designers know this well.
I think that such lazy iterator is more fundamental and more commonly useful than a slice dedup, and I'd like it in the std library more than slice dedup. |
First I asked myself the same thing: "Why didn't we return the deduplicated slice part?". I think that if we do that so the fonction must be named Secondly what happens to the users who wants to retrieve the duplicates? They will have to compute the slice part that contains all the duplicates: this is error prone. let mut slice = [1, 2, 2, 3, 3, 2];
let dedup = slice.partition_dedup();
assert_eq!(dedup, [1, 2, 3, 2]);
let duplicates = slice[dedup.len()..];
assert_eq!(duplicates, [3, 2]); A solution could be to name the function Another solution could be to drop the end (i.e. the duplicated elements) by changing the bounds of let slice = &mut [1, 2, 2, 3, 3, 2];
let slice2 = slice;
slice.dedup();
assert_eq!(slice, &mut [1, 2, 3, 2]);
let duplicates = &slice2[dedup.len()..];
assert_eq!(duplicates, &[3, 2]); EDIT: The last solution is not something that is possible, the |
They aren't exclusive propositions; I see no reason you shouldn't propose inclusion of I'd argue that the proposed method would be used at a lower level than the iterator one because it allows getting a collection of unique values without allocation.
Ah, you are referring to complexity of usage, not complexity of implementation. Such a distinction is important to highlight up front. That being said, I think that by using Since these are unstable methods, I'd be in favor of merging them such that they return the tuple. This allows users to use maximum possible functionality. I think that deciding if it should be stabilized as such would be an excellent question for the tracking issue and if no one uses the second value, then we simplify the signature. |
@bors r+ rollup |
📌 Commit 6240a699260b78ead5fb5204ae1a88007922e9ab has been approved by |
@leonardo-m to be clear, I think that you should make sure that your point-of-view is represented on the tracking issue as you are the best person to make that comment. 😉 |
Rollup of 16 pull requests Successful merges: - #53652 (define copy_within on slices) - #54261 (Make `dyn` a keyword in the 2018 edition) - #54280 (remove (more) CAS API from Atomic* types where not natively supported) - #54323 (rustbuild: drop color handling) - #54350 (Support specifying edition in doc test) - #54370 (Improve handling of type bounds in `bit_set.rs`.) - #54371 (add -Zui-testing to rustdoc) - #54374 (Make 'proc_macro::MultiSpan' public.) - #54402 (Use no_default_libraries for all NetBSD flavors) - #54409 (Detect `for _ in in bar {}` typo) - #54412 (add applicability to span_suggestion call) - #54413 (Add UI test for deref recursion limit printing twice) - #54415 (parser: Tweak function parameter parsing to avoid rollback on succesfull path) - #54420 (Compress `Liveness` data some more.) - #54422 (Simplify slice's first(_mut) and last(_mut) with get) - #54446 (Unify christianpoveda's emails) Failed merges: - #54058 (Introduce the partition_dedup/by/by_key methods for slices) r? @ghost
Rollup of 16 pull requests Successful merges: - #53652 (define copy_within on slices) - #54261 (Make `dyn` a keyword in the 2018 edition) - #54280 (remove (more) CAS API from Atomic* types where not natively supported) - #54323 (rustbuild: drop color handling) - #54350 (Support specifying edition in doc test) - #54370 (Improve handling of type bounds in `bit_set.rs`.) - #54371 (add -Zui-testing to rustdoc) - #54374 (Make 'proc_macro::MultiSpan' public.) - #54402 (Use no_default_libraries for all NetBSD flavors) - #54409 (Detect `for _ in in bar {}` typo) - #54412 (add applicability to span_suggestion call) - #54413 (Add UI test for deref recursion limit printing twice) - #54415 (parser: Tweak function parameter parsing to avoid rollback on succesfull path) - #54420 (Compress `Liveness` data some more.) - #54422 (Simplify slice's first(_mut) and last(_mut) with get) - #54446 (Unify christianpoveda's emails) Failed merges: - #54058 (Introduce the partition_dedup/by/by_key methods for slices) r? @ghost
☔ The latest upstream changes (presumably #54457) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #53508) made this pull request unmergeable. Please resolve the merge conflicts. |
@bors r+ rollup |
📌 Commit d560292 has been approved by |
⌛ Testing commit d560292 with merge cd6812e1d19fa95ac27c8936158ed3a447367579... |
💔 Test failed - status-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@bors retry |
Introduce the partition_dedup/by/by_key methods for slices This PR propose to add three methods to the slice type, the `partition_dedup`, `partition_dedup_by` and `partition_dedup_by_key`. The two other methods are based on `slice::partition_dedup_by`. These methods take a mutable slice, deduplicates it and moves all duplicates to the end of it, returning two mutable slices, the first containing the deduplicated elements and the second all the duplicates unordered. ```rust let mut slice = [1, 2, 2, 3, 3, 2]; let (dedup, duplicates) = slice.partition_dedup(); assert_eq!(dedup, [1, 2, 3, 2]); assert_eq!(duplicates, [3, 2]); ``` The benefits of adding these methods is that it is now possible to: - deduplicate a slice without having to allocate and possibly clone elements on the heap, really useful for embedded stuff that can't allocate for example. - not loose duplicate elements, because, when using `Vec::dedup`, duplicates elements are dropped. These methods add more flexibillity to the user. Note that this is near a copy/paste of the `Vec::dedup_by` function, once this method is stable the goal is to replace the algorithm in `Vec` by the following. ```rust pub fn Vec::dedup_by<F>(&mut self, same_bucket: F) where F: FnMut(&mut T, &mut T) -> bool { let (dedup, _) = self.as_mut_slice().partition_dedup_by(same_bucket); let len = dedup.len(); self.truncate(len); } ```
Rollup of 12 pull requests Successful merges: - #53518 (Add doc for impl From in char_convert) - #54058 (Introduce the partition_dedup/by/by_key methods for slices) - #54281 (Search box) - #54368 (Reduce code block sides padding) - #54498 (The project moved under the Mozilla umbrella) - #54518 (resolve: Do not block derive helper resolutions on single import resolutions) - #54522 (Fixed three small typos.) - #54529 (aarch64-pc-windows-msvc: Don't link libpanic_unwind to libtest.) - #54537 (Rename slice::exact_chunks() to slice::chunks_exact()) - #54539 (Fix js error) - #54557 (incr.comp.: Don't automatically enable -Zshare-generics for incr. comp. builds.) - #54558 (Improvements to finding LLVM's FileCheck) Failed merges: r? @ghost
This PR propose to add three methods to the slice type, the
partition_dedup
,partition_dedup_by
andpartition_dedup_by_key
. The two other methods are based onslice::partition_dedup_by
.These methods take a mutable slice, deduplicates it and moves all duplicates to the end of it, returning two mutable slices, the first containing the deduplicated elements and the second all the duplicates unordered.
The benefits of adding these methods is that it is now possible to:
Vec::dedup
, duplicates elements are dropped. These methods add more flexibillity to the user.Note that this is near a copy/paste of the
Vec::dedup_by
function, once this method is stable the goal is to replace the algorithm inVec
by the following.