Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #31. Add \iter\unique which returns an iterator with unique va… #51

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

alexkart
Copy link
Contributor

@alexkart alexkart commented Nov 22, 2017

Closes #31 . Add \iter\unique which returns an iterator with unique values.

@alexkart alexkart changed the title Closes #37. Add \iter\unique which returns an iterator with unique va… Closes #31. Add \iter\unique which returns an iterator with unique va… Nov 22, 2017
@alexkart
Copy link
Contributor Author

Sorry, specified wrong issue in the commit message (#37 instead of #31)

src/iter.php Outdated
$hash = $hashFunction($value);
}

if (\in_array($hash, $hashSet, $strict)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be way more efficient with a keyed lookup and isset or array_key_exists

Copy link
Contributor Author

@alexkart alexkart Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work with hashes, but what if we use raw values and values are arrays, objects or nulls, and what about types checking?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this implementation has quadratic time complexity and as such is a no-go. As @staabm said, the correct way to do this is to use the value as a key. Objects and other values are exactly why there is the $hashFunction (more typically $keyFunction). This also obviates the need for the $strict parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to use serialize() to get a key for the default implementation (if hash function is not provided) to keep the ability for strict comparison?

$hashSet[serialize($value)] = '';

Copy link
Contributor

@jvasseur jvasseur Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize is probably not the best solution, it could be costy for objects with large object graph, and the objects could change during the iteration making it fail to deduplicate them.

For objects I would do something like $hashSet[spl_object_hash($value)] = $value (storing value in the array prevents the object from being reassigned making it unique). The downside is that the objects will stay in memory for the duration of the iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staabm why?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you dont need it and its slow

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant you shouldnt serialize scalars, sry

Copy link
Contributor Author

@alexkart alexkart Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staabm how will you check the types then? $arr[5] and $arr['5'] will be the same keys.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point 🤔

* uniqueness of the element
* @return \Iterator
*/
function unique($iterable, callable $hashFunction = null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can set here the default value for $hasFunction callable $hashFunction = 'serialize'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

_assertIterable($iterable, 'First argument');
$hashSet = [];
foreach ($iterable as $key => $value) {
if ($hashFunction === null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could be moved out of the foreach loop.

Something like this:

$hashFunction = $hashFunction ?? 'serialize';

$hash = $hashFunction($value);
}

if (isset($hashSet[$hash])) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use array_key_exists() here.

@Neirda24
Copy link

Tell me if I'm wrong but this implementation goes against all advantages of using iterator (memory safe). Because in order to check the unicity we build up a big array in memory.... I don't see an implementation that would be valid though....

@drupol
Copy link

drupol commented Jun 13, 2021

This specific operation is indeed not very optimal because it needs to go through the whole iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

\iter\unique support
7 participants