Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #31. Add \iter\unique which returns an iterator with unique va… #51

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions src/iter.php
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,41 @@ function map(callable $function, $iterable) {
}
}

/**
* Leaves only unique occurrences by using a provided hash function.
*
* If hash function is not provided values of the iterable will be used for comparison. Storing values instead of hashes
* can require more memory but it prevents possible false positives if there are hash collisions.
*
* @param array|Traversable $iterable Iterable to remove duplicates from
* @param callable|null $hashFunction Hash function that returns the value which will be used to determine
* uniqueness of the element
* @param bool $strict If is set to true the types of the values from hash function will also be checked
* @return \Iterator
*/
function unique($iterable, callable $hashFunction = null, $strict = false) {
_assertIterable($iterable, 'First argument');

$hashSet = [];

foreach ($iterable as $key => $value) {

if ($hashFunction === null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could be moved out of the foreach loop.

Something like this:

$hashFunction = $hashFunction ?? 'serialize';

$hash = $value;
} else {
$hash = $hashFunction($value);
}

if (\in_array($hash, $hashSet, $strict)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be way more efficient with a keyed lookup and isset or array_key_exists

Copy link
Contributor Author

@alexkart alexkart Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work with hashes, but what if we use raw values and values are arrays, objects or nulls, and what about types checking?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this implementation has quadratic time complexity and as such is a no-go. As @staabm said, the correct way to do this is to use the value as a key. Objects and other values are exactly why there is the $hashFunction (more typically $keyFunction). This also obviates the need for the $strict parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to use serialize() to get a key for the default implementation (if hash function is not provided) to keep the ability for strict comparison?

$hashSet[serialize($value)] = '';

Copy link
Contributor

@jvasseur jvasseur Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize is probably not the best solution, it could be costy for objects with large object graph, and the objects could change during the iteration making it fail to deduplicate them.

For objects I would do something like $hashSet[spl_object_hash($value)] = $value (storing value in the array prevents the object from being reassigned making it unique). The downside is that the objects will stay in memory for the duration of the iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staabm why?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you dont need it and its slow

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant you shouldnt serialize scalars, sry

Copy link
Contributor Author

@alexkart alexkart Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staabm how will you check the types then? $arr[5] and $arr['5'] will be the same keys.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point 🤔

continue;
}

$hashSet[] = $hash;

yield $key => $value;
}
}

/**
* Applies a mapping function to all keys of an iterator.
*
Expand Down
43 changes: 43 additions & 0 deletions test/iterTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,49 @@ public function testMap() {
$this->assertSame([0, 3, 6, 9, 12, 15], toArray($mapped));
}

public function testUniqueWithoutHashFunction() {
$iterable = [1, 2, '2', '2', 3, 4, 4, null, null, 5, '', '', [1], [1], [2]];
$expected = [1, 2, '2', 3, 4, null, 5, '', [1], [2]];
$unique = unique($iterable, null, true);
$this->assertSame($expected, toArray($unique));
}

public function testUniqueWithoutHashFunctionNotStrict() {
$iterable = [null, '', 0, 1, '1', 1.0, true];
$expected = [null, 1];
$unique = unique($iterable, null);
$this->assertSame($expected, toArray($unique));
}

public function testUniqueStringsWithHashFunction() {
$iterable = [
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
'Proin tincidunt mollis dui id efficitur. Vivamus vitae tortor vitae velit imperdiet finibus vel eu lacus.',
];
$expected = [
'Lorem ipsum dolor sit amet, consectetur adipiscing elit.',
'Proin tincidunt mollis dui id efficitur. Vivamus vitae tortor vitae velit imperdiet finibus vel eu lacus.',
];
$unique = unique($iterable, function ($v) {
return crc32($v);
}, true);
$this->assertSame($expected, toArray($unique));
}

public function testUniqueObjectsWithHashFunction() {
$obj1 = new \stdClass();
$obj1->a = 1;
$obj2 = new \stdClass();
$obj2->a = 2;
$iterable = [$obj1, $obj1, $obj2];
$expected = [$obj1, $obj2];
$unique = unique($iterable, function ($v) {
return $v->a;
}, true);
$this->assertSame($expected, toArray($unique));
}

public function testMapKeys() {
$range = range(0, 5);
$mapped = mapKeys(function($n) { return $n * 3; }, $range);
Expand Down