You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In many functions, such as Stack.Package.findCandidates, we currently use lists for parameters that should semantically be sets:
We don't care about the order of the elements.
Duplicate elements don't change the return value.
I think this practice is quite common and not without benefits: lists have nice notation and can often be fused away during compilation, leading to fast code.
I also find that this practice has some downsides:
Reading the code, I'm often unsure if a parameter has list semantics or set semantics. What happens when there are duplicates? Is the order important?
In the particular case of the findCandidate function, a set parameter could have prevented a performance bug where duplicate elements caused a lot of unnecessary parsing and IO.
A similar argument can probably made for other types too.
I currently think that we should put correctness and readability above performance considerations and use the most "correct" datatypes possible.
How does everyone else feel about this tradeoff?
The text was updated successfully, but these errors were encountered:
One option would be to introduce newtype ListSet a = ListSet [a], and delay the duplicate check for the main time it is relevant - toList :: Ord a => ListSet a -> [a]. Could also have something which does the check when in debug mode (dev mode), but otherwise skips the check. I'm thinking:
toListUnsafe::ListSeta-> [a]
#ifdef DEBUG_MODE
toListUnsafe (ListSet xs) =case getFirstDuplicate ofJust _ ->error"toListUnsafe encountered a list with duplicates"Nothing-> xs
#else
toListUnsafe (ListSet xs) = xs
#endif
Hm, I'm not sure that something like ListSet could would be a good solution. While it will add some defense against (performance) bugs stemming from accidental duplicates, I think it would make the code even harder to understand. Readers would have to lookup the definition of ListSet to understand how it behaves. Also, we'd rely on beta-testers to run into the problematic cases where the bugs pop up while users in the wild may still run into performance bugs without being able to easily diagnose the problem.
Of course that doesn't mean that we couldn't run into a case where a conversion along these lines causes a performance regression. But I think that we should be able to catch such cases by running a few benchmarks if in doubt.
Considering that the size of the data we're working on tends to be rather low, it seems fine to me to use Set. So actually, I'm backtracking on my original concern that was from the performance impact. Particularly since you have done the work to benchmark!
In summary, I am in favor of using Set in more places, when appropriate.
In many functions, such as
Stack.Package.findCandidates
, we currently use lists for parameters that should semantically be sets:I think this practice is quite common and not without benefits: lists have nice notation and can often be fused away during compilation, leading to fast code.
I also find that this practice has some downsides:
findCandidate
function, a set parameter could have prevented a performance bug where duplicate elements caused a lot of unnecessary parsing and IO.A similar argument can probably made for other types too.
I currently think that we should put correctness and readability above performance considerations and use the most "correct" datatypes possible.
How does everyone else feel about this tradeoff?
The text was updated successfully, but these errors were encountered: