Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertThat(collection).doesNotHaveDuplicates() is not sorted, even if the collection is #3331

Closed
Bananeweizen opened this issue Jan 10, 2024 · 12 comments

Comments

@Bananeweizen
Copy link
Contributor

The error message of assertThat(collection).doesNotHaveDuplicates() lists the duplicates in a different order than the elements of the collection actually have. That is not an error per-se, but can make it harder to find the root cause of the failing assertion. So this might also be labeled improvement instead of bug.

  • assertj core version: 3.25.1

Test case reproducing the bug

    @Test
    void sortOrder() throws Exception {
        var list = List.of("org.objectweb.asm.source",
                "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
                "org.eclipse.emf.compare.rcp.ui.source.feature.group",
                "org.objectweb.asm",
                "org.eclipse.jgit.feature.group",
                "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
                "org.objectweb.asm",
                "org.apache.httpcomponents.httpclient",
                "org.eclipse.emf.compare.rcp.ui.source.feature.group",
                "org.eclipse.jgit.feature.group",
                "org.eclipse.emf.compare.ide.ui.source.feature.group");
        assertThat(list.stream().sorted()).doesNotHaveDuplicates();
    }

Note that the list gets sorted in the assertion. Still the output is not sorted, because the duplicates are collected in a set which is not insertion order stable.

Found duplicate(s):
  ["org.eclipse.emf.compare.diagram.gmf.source.feature.group",
    "org.eclipse.emf.compare.rcp.ui.source.feature.group",
    "org.objectweb.asm",
    "org.eclipse.jgit.feature.group"]
in:
  ["org.apache.httpcomponents.httpclient",
    "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
    "org.eclipse.emf.compare.diagram.gmf.source.feature.group",
    "org.eclipse.emf.compare.ide.ui.source.feature.group",
    "org.eclipse.emf.compare.rcp.ui.source.feature.group",
    "org.eclipse.emf.compare.rcp.ui.source.feature.group",
    "org.eclipse.jgit.feature.group",
    "org.eclipse.jgit.feature.group",
    "org.objectweb.asm",
    "org.objectweb.asm",
    "org.objectweb.asm.source"]
@joel-costigliola
Copy link
Member

Fair enough, if I understand this correctly the fix should consist of using a linked hash set to store the duplicates.

@Bananeweizen do you want to contribute this one?

@Bananeweizen
Copy link
Contributor Author

Yes, I can do that. I still have to understand if the currently used comparator has some useful meaning for another (unknown to me) aspect. If not, I'd also expect this to be a trivial replacement of the TreeSet by a LinkedHashSet.

@joel-costigliola
Copy link
Member

If memory serves, I think we used a treeset to have consistent ordering in our tests, this can be achieved better with a linked hash set though.

@ranjitshinde91
Copy link

ranjitshinde91 commented Jan 10, 2024

@Bananeweizen can I take this for my first contribution, would help me in getting familiar with the PR process

@Bananeweizen
Copy link
Contributor Author

sure

@ranjitshinde91
Copy link

ranjitshinde91 commented Jan 10, 2024

TreeSet with custom comparator is used to compare objects in Set collections. Hence simply changing TreeSet to LinkedHashSet does not work.

Added fix as part of this PR #3333

@kulgan
Copy link

kulgan commented Jan 13, 2024

@Bananeweizen quick question regarding the expected ordering. Given the following input which is found in the test ["Merry", "Frodo", null, null, "Merry", "Sam", "Frodo"], there are two possible ways to return the duplicates in order:

  1. ["Merry", "Frodo", null]
  2. [null, "Merry", "Frodo"]
    Option 1 starts with the first elements that is a duplicate, while the second starts with the first duplicated element. I think they are both valid in their own way, like for debugging purpose, option 2 tells me null is the first duplicated element, which I think is helpful, while option one seems like the natural way.
    I know @ranjitshinde91 implementation is based on 1. above, checking if option 2 can be considered.

@Bananeweizen
Copy link
Contributor Author

Quite honestly, I had not even thought about that until you asked. I think I would be fine with both. The question of which one would be more nice for setting a breakpoint afterwards also depends, on whether I as the developer consider the first duplicate wrong (e.g. the first Merry should not have been added to begin with), or the second one. More typical might be to consider the second duplicate wrong, so I can understand that you would eventually prefer option 2.
As I said, I would be fine with either variant.

@joel-costigliola
Copy link
Member

@kulgan good point, I agree with @Bananeweizen that there isn't much difference between the two options, so I would go with the simplest implementation on this one

@ranjitshinde91
Copy link

I agree with @Bananeweizen when I started implementation i considered second Merry to be wrong and added in duplicates.
Also 2nd Implementation is simpler and does not require second traversal of Interable done in private method.

I will make the necessary changes to switch to option 2, if that's okay ?

@joel-costigliola
Copy link
Member

Go for it @ranjitshinde91, thanks!

@ranjitshinde91
Copy link

updated the same PR with these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants