-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #31. Add \iter\unique which returns an iterator with unique va… #51
base: master
Are you sure you want to change the base?
Conversation
src/iter.php
Outdated
$hash = $hashFunction($value); | ||
} | ||
|
||
if (\in_array($hash, $hashSet, $strict)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be way more efficient with a keyed lookup and isset
or array_key_exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work with hashes, but what if we use raw values and values are arrays, objects or nulls, and what about types checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this implementation has quadratic time complexity and as such is a no-go. As @staabm said, the correct way to do this is to use the value as a key. Objects and other values are exactly why there is the $hashFunction (more typically $keyFunction). This also obviates the need for the $strict parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a good idea to use serialize() to get a key for the default implementation (if hash function is not provided) to keep the ability for strict comparison?
$hashSet[serialize($value)] = '';
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serialize
is probably not the best solution, it could be costy for objects with large object graph, and the objects could change during the iteration making it fail to deduplicate them.
For objects I would do something like $hashSet[spl_object_hash($value)] = $value
(storing value in the array prevents the object from being reassigned making it unique). The downside is that the objects will stay in memory for the duration of the iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staabm why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you dont need it and its slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant you shouldnt serialize scalars, sry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staabm how will you check the types then? $arr[5] and $arr['5'] will be the same keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good point 🤔
* uniqueness of the element | ||
* @return \Iterator | ||
*/ | ||
function unique($iterable, callable $hashFunction = null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can set here the default value for $hasFunction callable $hashFunction = 'serialize'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks
_assertIterable($iterable, 'First argument'); | ||
$hashSet = []; | ||
foreach ($iterable as $key => $value) { | ||
if ($hashFunction === null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could be moved out of the foreach
loop.
Something like this:
$hashFunction = $hashFunction ?? 'serialize';
$hash = $hashFunction($value); | ||
} | ||
|
||
if (isset($hashSet[$hash])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use array_key_exists()
here.
Tell me if I'm wrong but this implementation goes against all advantages of using iterator (memory safe). Because in order to check the unicity we build up a big array in memory.... I don't see an implementation that would be valid though.... |
This specific operation is indeed not very optimal because it needs to go through the whole iterator. |
Closes #31 . Add \iter\unique which returns an iterator with unique values.