-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_by
is weird
#374
Comments
I agree that we could make the examples in the documentation a bit clearer. You may be looking for |
It groups consecutive runs, just like python's |
This is an angry post about the contracts and expectations that users expect from methods with some name. I write in Rust not so long ago and CLion automatically advised me to import @bluss Hm, I have no experience with python and its itertools and I think it is also misleading. But I have a lot of Java/Scala/C#/.../SQL experience and it is the different. I understand that now you can't rename the method for backward compatibility, but at least you can add the result of an example to the documentation. Or maybe you can rename it in the next major version update? |
I just had the exact same problem of expecting group_by to work like in most other languages, and I think I have an idea: the current group_by function should be renamed chunk_by. Don't you think it would be a better name for its current behavior? |
I agree, |
The documentation states pretty clearly that it groups "consecutive elements". The example also shows this, though still I think #375 is a good change anyway. |
375: Clarify documentation for group_by r=jswrenn a=phimuemue This refers to #374 (comment). I tried to make the docs a bit clearer by `collect`ing into a vector, showing the groups explicitly. Co-authored-by: philipp <descpl@yahoo.de>
375: Clarify documentation for group_by r=jswrenn a=phimuemue This refers to #374 (comment). I tried to make the docs a bit clearer by `collect`ing into a vector, showing the groups explicitly. Co-authored-by: philipp <descpl@yahoo.de>
just want to start off by saying this is one of my favorite crates and makes it a lot easier to write functional Rust! that said, I have made this the current unfortunately though, in all of these cases it's easy to forget. worse, it's easy to pass unit tests and only run into this issue in production, once you're iterating through a larger set of data that exposes the bug due to lack of sorting. some potential improvements that could help here, which unfortunately all introduce breaking changes to varying degrees:
|
Adding another anecdote supporting the "I'm confused" position. I acknowledge that upon rereading "Consecutive elements that map to the same key (“runs”), are assigned to the same group." it lines up with the behavior, but as someone with a preconceived notion of what I expected (and unfortunately for the first set of data it lined up) I just glossed over that statement because it didn't make sense to me. It could be argued that users should understand what they're doing, but I'd counter that a quality of a strong interface is that it's hard to misuse and I think
While acknowledging that this may be strictly true, the existence of SQL's I'm a strong +1 for suggesting the function name be deprecated and moved to something else. That it can be broadly misunderstood is one thing, but that it can look like it's working as expected (when misunderstood) and then act "differently" with a different set of data is pretty troubling. |
Is |
I don't feel like I understand why |
I get that people are not familiar with python (where this came from) but its own documentation is interesting. So I'm quoting it here:
|
I think I understand the implementation and that it can theoretically be more efficient if the data is sorted or the aggregated groups aren't actually needed, I'm just not sure where this would be used. From the python docs:
Does anyone have some examples of where this would be useful outside of this case? That one string compression interview question perhaps (aaabbcccc -> a3b2c4) but surely that's not a compelling use case. It feels like maybe leaning in to interpreting this as is an optimized version of group_by for pre-sorted data makes sense. My totally biased and likely somewhat ignorant opinion is that this version of Support above seems to be for This result on StackOverflow is pretty damning for the same choice made by Python's itertools: https://stackoverflow.com/questions/773/how-do-i-use-itertools-groupby. (Bonus points for the extra StackOverflow toxicity of "The example on the Python docs is quite straightforward"). |
I used it mostly when using python and each time because I knew something about the iterable data (that elements were consecutively grouped for some reason) or I wanted to know things about groups in the data. I certainly see that group elements non-consecutively has more usage. I'm personally not opposed to changing the method name (I would say |
My instinctual reaction is "no, this can't effectively be fixed by documentation". I think the name "group by" means "SQL-style group by" to significantly more people than it means "Python-itertools-style group by". The StackOverflow question with a huge number of people publicly confused by how it's intended to be used supports that position. To me, the issue is the method name itself not aligning with the expected functionality - documentation won't fix that for the majority of consumers. Even within Python, it looks like the Panda GroupBy does not align with the itertools interpretation. Put another way, if it's feeling like the first line of the documentation should be "this method likely doesn't do what you think it does", I'd argue that's not a great result and a breaking change is a better result. A deprecation and replacement feels pretty palatable from that perspective - renames are relatively easy to accommodate in Rust. |
@jswrenn @phimuemue It's not ideal but I think we should consider deprecate |
I'm not opposed, but I'd like to wait for the naming debate in rust-lang/rust#80552 to settle. There's a strong case to be made for consistency with the Rust standard library. |
I think that naming debate has settled? The PR is up for standard library to use |
I stumbled yet again on this confusion after 2 hours of debugging while re-reading and missing consecutive multiple times. :) |
Previously, we were assembling very naively an attribute set composed of context we saw. But it was forgetting that `"${drv}${drv.drvPath}"` would contain 2 contexts with the same key, but with different values, one with `outputs = [ "out" ];` and `allOutputs = true;`. Following this reasoning and comparing with what Nix does, we ought to merge underlying values systematically. Hence, I bring `itertools` to perform a group by on the key and merge everything on the fly, it's not beautiful but it's the best I could find, notice that I don't use `group_by` but I talk about group by, that is, because `group_by` is a `group_by_consecutive`, see rust-itertools/itertools#374. Initially, I tried to do it without a `into_grouping_map_by`, it was akin to assemble the final `NixAttrs` directly, it was less readable and harder to pull out because we don't have a lot of in-place mutable functions on our data structures. Change-Id: I9933c9bd88ffe04de50dda14f21879b60d8b8cd4 Reviewed-on: https://cl.tvl.fyi/c/depot/+/10620 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
Oh... The naming debate has settled and rust-lang/rust#117678 was merged four days ago (and added to the 1.77.0 milestone). PR done and merged, it will be released in our next release 0.13.0. |
Summary: # Motivation And if there is one is one design mistake Rust ecosystem made, it is making [`group_by` weird](rust-itertools/itertools#374) and possible for users to, quote: > I spent several hours debugging my code written using the group_by function. Luckily for internal users, the build tooling responds accordingly and fails build for all usages of deprecated API. Thus, there is a need to adjust all of them manually. # [Release notes](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md#0130) ### Breaking - Removed implementation of `DoubleEndedIterator` for `ConsTuples` (#853) - Made `MultiProduct` fused and fixed on an empty iterator (#835, #834) - Changed `iproduct!` to return tuples for maxi one iterator too (#870) - Changed `PutBack::put_back` to return the old value (#880) - Removed deprecated `repeat_call, Itertools::{foreach, step, map_results, fold_results}` (#878) - Removed `TakeWhileInclusive::new` (#912) NOTE: Quick search didn't tell me anything related to breaking changes above, CI will tell. And, of course, scream to me if it breaks your personal build. Reviewed By: anps77 Differential Revision: D64306014 fbshipit-source-id: 881ac716e1dc23968d4a28000fdaccdbf9097ec2
Summary: # Motivation And if there is one is one design mistake Rust ecosystem made, it is making [`group_by` weird](rust-itertools/itertools#374) and possible for users to, quote: > I spent several hours debugging my code written using the group_by function. Luckily for internal users, the build tooling responds accordingly and fails build for all usages of deprecated API. Thus, there is a need to adjust all of them manually. # [Release notes](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md#0130) ### Breaking - Removed implementation of `DoubleEndedIterator` for `ConsTuples` (#853) - Made `MultiProduct` fused and fixed on an empty iterator (#835, #834) - Changed `iproduct!` to return tuples for maxi one iterator too (#870) - Changed `PutBack::put_back` to return the old value (#880) - Removed deprecated `repeat_call, Itertools::{foreach, step, map_results, fold_results}` (#878) - Removed `TakeWhileInclusive::new` (#912) NOTE: Quick search didn't tell me anything related to breaking changes above, CI will tell. And, of course, scream to me if it breaks your personal build. Reviewed By: anps77 Differential Revision: D64306014 fbshipit-source-id: 881ac716e1dc23968d4a28000fdaccdbf9097ec2
Summary: # Motivation And if there is one is one design mistake Rust ecosystem made, it is making [`group_by` weird](rust-itertools/itertools#374) and possible for users to, quote: > I spent several hours debugging my code written using the group_by function. Luckily for internal users, the build tooling responds accordingly and fails build for all usages of deprecated API. Thus, there is a need to adjust all of them manually. # [Release notes](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md#0130) ### Breaking - Removed implementation of `DoubleEndedIterator` for `ConsTuples` (#853) - Made `MultiProduct` fused and fixed on an empty iterator (#835, #834) - Changed `iproduct!` to return tuples for maxi one iterator too (#870) - Changed `PutBack::put_back` to return the old value (#880) - Removed deprecated `repeat_call, Itertools::{foreach, step, map_results, fold_results}` (#878) - Removed `TakeWhileInclusive::new` (#912) NOTE: Quick search didn't tell me anything related to breaking changes above, CI will tell. And, of course, scream to me if it breaks your personal build. Reviewed By: anps77 Differential Revision: D64306014 fbshipit-source-id: 881ac716e1dc23968d4a28000fdaccdbf9097ec2
Summary: # Motivation And if there is one is one design mistake Rust ecosystem made, it is making [`group_by` weird](rust-itertools/itertools#374) and possible for users to, quote: > I spent several hours debugging my code written using the group_by function. Luckily for internal users, the build tooling responds accordingly and fails build for all usages of deprecated API. Thus, there is a need to adjust all of them manually. # [Release notes](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md#0130) ### Breaking - Removed implementation of `DoubleEndedIterator` for `ConsTuples` (#853) - Made `MultiProduct` fused and fixed on an empty iterator (#835, #834) - Changed `iproduct!` to return tuples for maxi one iterator too (#870) - Changed `PutBack::put_back` to return the old value (#880) - Removed deprecated `repeat_call, Itertools::{foreach, step, map_results, fold_results}` (#878) - Removed `TakeWhileInclusive::new` (#912) NOTE: Quick search didn't tell me anything related to breaking changes above, CI will tell. And, of course, scream to me if it breaks your personal build. Reviewed By: anps77 Differential Revision: D64306014 fbshipit-source-id: 881ac716e1dc23968d4a28000fdaccdbf9097ec2
Summary: # Motivation And if there is one is one design mistake Rust ecosystem made, it is making [`group_by` weird](rust-itertools/itertools#374) and possible for users to, quote: > I spent several hours debugging my code written using the group_by function. Luckily for internal users, the build tooling responds accordingly and fails build for all usages of deprecated API. Thus, there is a need to adjust all of them manually. # [Release notes](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md#0130) ### Breaking - Removed implementation of `DoubleEndedIterator` for `ConsTuples` (#853) - Made `MultiProduct` fused and fixed on an empty iterator (#835, #834) - Changed `iproduct!` to return tuples for maxi one iterator too (#870) - Changed `PutBack::put_back` to return the old value (#880) - Removed deprecated `repeat_call, Itertools::{foreach, step, map_results, fold_results}` (#878) - Removed `TakeWhileInclusive::new` (#912) NOTE: Quick search didn't tell me anything related to breaking changes above, CI will tell. And, of course, scream to me if it breaks your personal build. Reviewed By: anps77 Differential Revision: D64306014 fbshipit-source-id: 881ac716e1dc23968d4a28000fdaccdbf9097ec2
Summary: # Motivation And if there is one is one design mistake Rust ecosystem made, it is making [`group_by` weird](rust-itertools/itertools#374) and possible for users to, quote: > I spent several hours debugging my code written using the group_by function. Luckily for internal users, the build tooling responds accordingly and fails build for all usages of deprecated API. Thus, there is a need to adjust all of them manually. # [Release notes](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md#0130) ### Breaking - Removed implementation of `DoubleEndedIterator` for `ConsTuples` (#853) - Made `MultiProduct` fused and fixed on an empty iterator (#835, #834) - Changed `iproduct!` to return tuples for maxi one iterator too (#870) - Changed `PutBack::put_back` to return the old value (#880) - Removed deprecated `repeat_call, Itertools::{foreach, step, map_results, fold_results}` (#878) - Removed `TakeWhileInclusive::new` (#912) NOTE: Quick search didn't tell me anything related to breaking changes above, CI will tell. And, of course, scream to me if it breaks your personal build. Reviewed By: anps77 Differential Revision: D64306014 fbshipit-source-id: 881ac716e1dc23968d4a28000fdaccdbf9097ec2
I spent several hours debugging my code written using the
group_by
function.I expected from this function the logic of grouping the same objects by a key, how surprised I was that he groups only some nearby elements while the group key has the same value.
In the documentation it is also indicated somehow strange and until the last, I was not sure if this is a bug or feature. I carefully read the documentation 5 times and it turned out to be a feature! Awful incomprehensible feature!
I do not think that this is the behavior that many expect for this method name. I demand to rename this method to something more suitable like
cons_group_by
orlazy_group_by
not to mislead users.Example
I'm not the only one to think so
The text was updated successfully, but these errors were encountered: