Skip to content

Conversation

@JordanMartinez
Copy link
Contributor

No description provided.

@JordanMartinez
Copy link
Contributor Author

Benchmark results:

Array
===
mapMaybe
---------------
mapMaybe (101)
mean   = 14.41 μs
stddev = 32.39 μs
min    = 4.48 μs
max    = 551.21 μs
mapMaybe (10001)
mean   = 489.23 μs
stddev = 661.88 μs
min    = 336.78 μs
max    = 6.35 ms

nubEq
---------------
nubEq (101)
mean   = 85.85 μs
stddev = 75.58 μs
min    = 60.53 μs
max    = 1.11 ms
nubEq (10001)
mean   = 420.09 ms
stddev = 46.84 ms
min    = 370.88 ms
max    = 577.13 ms

union
---------------
union (101)
mean   = 127.70 μs
stddev = 101.86 μs
min    = 85.04 μs
max    = 1.71 ms
union (10001)
mean   = 476.17 ms
stddev = 48.16 ms
min    = 437.69 ms
max    = 779.99 ms

intersect
---------------
intersectBy (101)
mean   = 49.29 μs
stddev = 35.80 μs
min    = 41.31 μs
max    = 595.78 μs
intersectBy (10001)
mean   = 373.92 ms
stddev = 19.77 ms
min    = 357.94 ms
max    = 459.93 ms

difference
---------------
difference (101)
mean   = 70.58 μs
stddev = 52.80 μs
min    = 54.25 μs
max    = 642.70 μs
difference (10001)
mean   = 443.65 ms
stddev = 16.29 ms
min    = 423.03 ms
max    = 539.40 ms

@hdgarrood
Copy link
Contributor

Since these functions only behave interestingly in arrays which have some duplicates, we should probably ensure that the arrays we're benchmarking with also include some duplicates. If we only test with arrays in which every element is unique, and there's a performance issue that makes these functions perform especially poorly on arrays with lots of duplicates, we won't catch it with these benchmarks.

@JordanMartinez
Copy link
Contributor Author

I've updated the arrays we test to have half of their elements be unique and the other half to have duplicates based on the number (e.g. 3 means there are 3 duplicates of 3). Not sure whether the second half of the array should just be nothing but duplicates of the same number (e.g. 50 duplicates of 3) for shortNatsDup

@milesfrain
Copy link
Contributor

milesfrain commented Jan 8, 2021

The array creation in this PR currently involves a lot of magic numbers that are going to be tricky to edit if we want to modify sizes. And it would be good to do some more shuffling of the data.

Here's another way to generate the input data:
https://github.com/milesfrain/bench-array-demo/blob/main/test/Main.purs

Unfortunately, this uses quickcheck's shuffle, which won't work here due to a circular dependency. It might be possible to break this circular dependency with the following steps:


Edit: On second thought, maybe just reversing the first half of each input array is a reasonable enough approximation of a shuffle for most sorting algorithms to deal with.


Edit2: Here's a version that shuffles by interleaving the first half of the array with the reversed other half. https://github.com/milesfrain/bench-array-demo/blob/no-quickcheck/test/Main.purs

@milesfrain
Copy link
Contributor

Since there's no rush on #203, it might make sense to tackle things in this order:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants