Should we use datatypes with more appropriate semantics at the cost of worse runtime performance and some code bloat?

In many functions, such as [`Stack.Package.findCandidates`](https://github.com/commercialhaskell/stack/blob/8997b11de7b5efa56b303a8dd8a33c173ce1fa3f/src/Stack/Package.hs#L1038), we currently use lists for parameters that should semantically be sets:
- We don't care about the order of the elements.
- Duplicate elements don't change the return value.

I think this practice is quite common and not without benefits: lists have nice notation and can often be fused away during compilation, leading to fast code.

I also find that this practice has some downsides:
- Reading the code, I'm often unsure if a parameter has list semantics or set semantics. What happens when there are duplicates? Is the order important?
- In the particular case of the `findCandidate` function, a set parameter could have prevented [a performance bug](https://github.com/commercialhaskell/stack/pull/2658) where duplicate elements caused a lot of unnecessary parsing and IO.

A similar argument can probably made for other types too.

I currently think that we should put correctness and readability above performance considerations and use the most "correct" datatypes possible.

How does everyone else feel about this tradeoff?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should we use datatypes with more appropriate semantics at the cost of worse runtime performance and some code bloat? #2669

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should we use datatypes with more appropriate semantics at the cost of worse runtime performance and some code bloat? #2669

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions