Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.8] Add duplicates method to collection #28181

Merged
merged 2 commits into from
Apr 11, 2019
Merged

[5.8] Add duplicates method to collection #28181

merged 2 commits into from
Apr 11, 2019

Conversation

timacdonald
Copy link
Member

@timacdonald timacdonald commented Apr 11, 2019

This PR introduces the ability to get all the duplicate values from an Illuminate\Support\Collection.

Usage

Screen Shot 2019-04-11 at 5 39 11 pm

Loose and Strict

Screen Shot 2019-04-11 at 5 09 23 pm

I've come at this from a bunch of different angles. I am more than happy to hear ways you think this could be improved - but please confirm it works with all the test cases (or I can for you), as I've found a lot of variations on this do not pass all the tests e.g. methods in the array_diff family don't play nice with multidimensional arrays.

I'd also like to hear if anyone thinks there are additional test cases I should account for.

Does not work with integers and objects in loose comparison

Loose comparison between integers and objects will trigger a notice and fail. Good news is that the unique method actually triggers the notice when we call it from the duplicates method, so the issue already exists.

Eloquent keys

If you are happy with this idea / approach for the base collection I will update this PR or do another to make it take model keys into account on the Illuminate\Database\Eloquent\Collection


Fingers crossed I never have to google "PHP how to find duplicates in an array" again 🤞

@akiyamaSM
Copy link
Contributor

Thanks for the PR first. I'd love to know more about what the key represents. because in the second example the value "g'day" has as key 2 (which I understand two occurrences) but in the first example the value (1) has as key 2 which should have 3? if so what if the are two entries with the same count of duplicate?

@alberto1el
Copy link

@akiyamaSM the keys on the "duplicates" resulting collection are a reference to the keys of the original collection, the first ocurrence of each "item" is not taken as duplicate only second and so on, that is why on the first example we get the 1, and 2 keys back (second and third ocurrence of number 1 in the collection) and 2 in the gday example since gday is duplicated on the 2nd position of the collection.

To me it looks great @timacdonald

@timacdonald
Copy link
Member Author

timacdonald commented Apr 11, 2019

Hey - so the $key represents the index of the item in the original collection - not the number of occurrences of the item.

Here is a more explicit version...

$items = collect([
    0 => 1,
    1 => 1,
    2 => 1,
    3 => 2,
]);

$items->duplicates();

// [1 => 1, 2 => 1];

$items[0] is 1, but not a duplicate.
$items[1] is 1, and a duplicate.
$items[2] is 1, and a duplicate.
$items[3] is 2, and not a duplicate.

Which gives our final value [1 => 1, 2 => 1]

I umm'd and arrr'd about whether or not I should retain the key or just return a re-indexed collection. I thought it doesn't hurt to have it retain the original index, so I left it, and in some cases it might be handy to know the index.

Does that implementation seem right to you @akiyamaSM ? Would love to know if you think it works okay with that explanation.

I guess the idea was to replicate this plain PHP for collections.

Screen Shot 2019-04-11 at 6 24 30 pm

@akiyamaSM
Copy link
Contributor

Thanks @timacdonald & @alberto1el Its more clear now. Good luck for the proposal 💃

@shane-smith
Copy link

I would definitely make use of this.

I question whether the first occurrence should also be returned, though I can see that's not consistent with the plain PHP array example.

I would probably use this to get a list of indexes, and then use those to remove the duplicates from the original collection.

However, there's a chance that the first occurrence of the item isn't actually the one I want to keep. Perhaps I want to keep the second occurrence instead?

In a simple example (checking on integers), it makes no difference whatsoever which one is kept.

But in a more complex example (checking a specific key or property like 'framework'), it might be that other keys/properties are not duplicated. Let's say that each element had a 'framework' and a 'version' key.

I could use duplicates() to get a list of repeated frameworks, then I'd pass this list into some other logic that checks the 'version' of each duplicate and deletes all but the most recent one. In that case, I'd want the first index of that framework as well!

Sure, there are plenty of ways to get around this and end up with the result I'd be looking for. Just thought I'd throw another potential use case in here that may not have been considered.

@taylorotwell taylorotwell merged commit 79a1cf5 into laravel:5.8 Apr 11, 2019
@timacdonald
Copy link
Member Author

Hey Shane, thanks for the feedback on this. Seen as though this has been merged as is, hopefully it will still work for your use case.

I'm just glad to have something in there to be able to find duplicates! One less google search for me haha

@ahinkle
Copy link
Contributor

ahinkle commented Apr 11, 2019

Thanks for this contribution. Would you be able to submit a PR to the docs?

@timacdonald
Copy link
Member Author

timacdonald commented Apr 12, 2019

It is on my list. I just wanted to get the collection PR (#28193) out first. Will try and PR the docs tomorrow.

P.S. Thanks for reminding me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants