-
Notifications
You must be signed in to change notification settings - Fork 87
Fix issue 355 added get frequency getter to iterable #357
Fix issue 355 added get frequency getter to iterable #357
Conversation
Co-authored-by: Mateus Felipe C. C. Pinto <mateusfccp@gmail.com>
@mateusfccp Hey Mateus, can you approve? |
@kevmoo can you please review this one? |
I can't, I'm not a member of the team. |
lib/src/iterable_extensions.dart
Outdated
/// | ||
/// For example, `['a', 'b', 'b', 'c', 'c', 'c'].countFrequency()` | ||
/// returns `{'a': 1, 'b': 2, 'c': 3}`. | ||
Map<T, int> countFrequency() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would probably make it get frequencies
instead of a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, I agree this way is better, please check my changes
lib/src/iterable_extensions.dart
Outdated
Map<T, int> countFrequency() { | ||
var frequencyMap = <T, int>{}; | ||
for (var item in this) { | ||
frequencyMap[item] = (frequencyMap[item] ?? 0) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use update
, which avoids doing two lookups per item.
frequencyMap.update(item, _increment, ifAbsent: _one);
with top-level/static helpers:
int _increment(int value) => value + 1;
int _one() => 1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, I agree this way is better, please check my changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is a lookup? Shouldn't it O(1)
? So we are basically avoiding O(2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are avoiding O(2) and lowering the constant factor. I hope, that assumes the map is actually implemented efficiently, but it's at least possible.
Also, for degenerate cases, hash table lookup is linear. And quick-sort is quadratic. We always hope to not hit those cases, but when we do, it's better to have a low constant.
For library code that other people will use, especially low-level code like this, efficiency is important.
If all your helper libraries are 10% slower than necessary, then your entire application is at least 10% slower than necessary. (If helper libraries use helper libraries, it may compound.)
And there is nothing you can do about it.
So when you provide code for others to depend on, to build their algorithms and applications on, you should never say "a constant factor doesn't matter", because to end users, the difference between 15ms and 17ms may be a missed animation frame, and that matters.
(So, that's my pocket philosophy about writing code for others: The only person who can say whether performance is "good enough" is the application writer. A helper library doesn't know that. It may choose how it optimizes, for speed or size or a tradeoff, but it shouldn't say "it's good enough" and leave easy optimizations on the table. Every level below the application has to do their best and not be part of the problem.)
The one thing we should actually do is to measure. If operator[]
and operator[]=
are so heavily optimizied that they beat update
even if they do the same lookup twice, then we need to optimize update
too. I want to choose the operation that has the best possible performance, rather than using workarounds that are faster right now, and never come back and check if they stay that way. If update
isn't faster today, we should file an issue, so that it is tomorrow.
(I can see that I flip between pragmatic and ideological reasoning - be fast, because it matters, but don't be fast by doing it the wrong way, even if it is faster today. Tradeoffs. It's tradeoffs all the way down. So reasonable people can disagree on where to trade what. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not disagreeing, just wanted to understand the reasoning.
Thanks for the explanation.
(I actually didn't even know .update
could be potentially faster, I never considered that it would avoid a lookup.)
…com/pvlKryu/collection into fix-issue-355-countFrequency-method
This reverts commit 1fe3d16.
lib/src/iterable_extensions.dart
Outdated
@@ -601,6 +601,22 @@ extension IterableExtension<T> on Iterable<T> { | |||
yield slice; | |||
} | |||
} | |||
|
|||
/// Returns a map where the keys are the unique elements of the iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single line first paragraph. Format as sentences.
Don't use Returns
for a getter. (Don't start any DartDoc with it, there is always a better word).
I'd say:
/// The count of occurrences of each element.
///
/// The map has an entry for each distinct element of this iterable,
/// as determined by `==`, where the value is the number of eelements
/// in this iterable which are equal to the key.
/// If there are elements that are equal, but not identical, its unspecified
/// which of the elements is used as the key.
///
/// For example `['a', 'b', 'c', 'b', 'c', 'c'].frequencies`
/// is a map with entries like `{'a': , 'b': 2, 'c': 3}`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
expect(frequencyMap, {42: 1}); | ||
}); | ||
|
||
test('should return correct frequency map for Queue of integers', () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably overkill to go throguh all kinds of iterables.
I'd rather have more diversity in the elements.
For example having elements that are equal, but not identical, or having elements that are records.
It's the building of the map that is tricky, iterating the iterable will probably work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added get frequencies tests extended
group
@natebosch WDYT. Is this worth adding? |
Yes, I think it is worth adding. It's not as general as |
/// ``` | ||
/// | ||
/// Note: This method uses `==` to compare elements. | ||
/// For collections # `List`, `Set`, or `Map`, deep equality is not checked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... where not even shallow equality is checked, two lists are equal only if they are the same instance.
If that is a use-case you want to support, maybe we should to add a parameter for the equality, to allow you to ask for const ListEquality()
for the elements:
Map<T, int> frequencies([Equality<T>? equality]) {
var frequenceyMap = equality == null ? <T, int>{} :
equality is IdentityEquality ? LinkedHashMap<T, int>.identity() :
LinkedHashMap<T, int>(equality.equals, equality.hash);
// ...
}
or have a parallel ... frequenciesBy(Equality<T> equality) ...
.
Then you can do nested list equality as .frequenciesBy(const ListEquality(ListEquality()))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is "shallow" as it relates to this
, not as it relates to the elements of this
that are also collection typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. In that case the wording is at least ambiguous.
Consider:
/// For elements that are collections, like `List`, `Set` or `Map`, the `==` comparison only
/// checks the identity of the collection, not its contents.
/// but not identical (`identical`), | ||
/// it is unspecified which of the elements is used as the key in the map. | ||
/// For example, if there are multiple lists with the same content, | ||
/// the map will only keep one of them as a key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good example, though, since list equality is identity. Same below ...
|
||
group('get frequencies tests extended', () { | ||
test('list of equal but not identical strings', () { | ||
var list = ['apple', String.fromCharCodes('apple'.codeUnits)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String is not a good choice for this test because those will be identical when compiled for web.
I'd drop this one, and just use the MyObject
test below.
(Or keep this, but document that identity may differ between platforms.)
expect(frequencyMap, {const MyObject(1): 2, const MyObject(2): 1}); | ||
}); | ||
|
||
test('list with equal numbers but different types', () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good test. There is no way around web considering them all identical here, but different classes being equal is an important test for native numbers.
(I was about to suggest testing with double.nan
, but ... don't. You can't look it up in the map afterwards, and the default implementation of Map.entries
actually looks up the value, so you can't assume to be able to even inspect the returned map.)
}); | ||
|
||
test('list with elements that are objects', () { | ||
var list = [const MyObject(1), const MyObject(1), const MyObject(2)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use const
here, because that makes the objects actually identical, so it doesn't check equality being different from identitiy.
Or have one const
and one not.
Or add a second field to MyObject
, where only one, say the first one is used for equality: [const MyObject(1, "a"), const MyObject(1, "b") ...
would be equal but not identical.
test('should handle empty List', () { | ||
var list = []; | ||
var frequencyMap = list.frequencies; | ||
expect(frequencyMap, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider isEmpty
as matcher. I think it works for maps too, and it should a more precise an error message like "is not empty".
@@ -1352,6 +1353,122 @@ void main() { | |||
expect(l3.toList(), [4, 5]); | |||
}); | |||
}); | |||
group('get frequencies tests', () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 'frequenices'
, it's cleaner.
The names of groups and tests are concatenated with a space between them, so they can rely on context. Things like "should be correct" is a given for tests, so it doesn't add value. Consider:
group('frequencies', () {
test('of integers', () { ... });
test('of strings', () { ... });
test('of empty iterable', () { ... });
test('of empty single element iterable', () { ... });
group('(when equality is not identity)', () {
test('of records', () { ... });
/// ...
});
});
These names are only important when a test fails.
At that point, they should give enough context to make the error message meaningful.
Take "frequencies of single element iterable" with an error (if the result was empty) of
Expected: {42: 1}
Actual: {}
Which: has different length and is missing map key <42>
With that headline, it's fairly clear where and what the problem is.
expect(frequencyMap, {'a': 1, 'b': 1, 'c': 1}); | ||
}); | ||
|
||
test('should handle empty Set', () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the tests for sets and queues unless there is a reason to believe that the implementation treats them differently (which it doesn't).
We're not testing iteration, that is assumed to work, so all that matters is the sequence of values we get by iterating, and a List
is fine for that.
(If we start trying to detect lists or sets, to do more efficient iteration or maybe assuming that sets cannot have duplicates so we don't need to count - which isn't true because sets can have non-standard equality - then there is reason to test with, and without, those types specifically.)
expect(frequencyMap, {'apple': 2}); | ||
}); | ||
|
||
test('list of records', () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(So for naming, I'd drop the "list of" here, what is important is that it's records being compared, and it being a list isn't important to the test.
Maybe just test('of records', () { ... });
with the changes above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but if doing shallow comparison of lists is a goal, we need to parameterize with an equality.
@lrhn Thank you very much for the quick feedback and for the detailed and helpful comments. I need some time to review everything you mentioned and correct it. I will let you know soon! |
Closing as the dart-lang/collection repository is merged into the dart-lang/core monorepo. Please re-open this PR there! |
[yes] I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Description:
This Pull Request introduces a new extension getter frequencies for the Iterable class. The countFrequency method returns a map where the keys are the unique elements of the iterable and the values are the counts of those elements. This addition provides an easy and efficient way to count the frequency of elements in any iterable.
Changes:
Relevant Issues:
This PR addresses Issue dart-lang/core#679.
Thank you!