-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel-letter-frequency: Add benchmark #542
parallel-letter-frequency: Add benchmark #542
Conversation
I'm not certain on whether I should use |
Thanks for the PR! I think we
I don't knows what the other maintainers think about it, but I definitely think it would be a better solution.
If it where not inside the test suite, but in a separate benchmark suite in Because we still need to decide where does the benchmark code belongs, I'll wait for the other maintainers position and not review the benchmark code a the moment. |
@rbasso, I get what you're saying. My reasoning for having it as a test instead of a separate benchmark is: since the exercise is to write parallel code then tests should check if it runs in parallel. Sequential code shouldn't pass the tests, because users might think that their solution is correct when it actually is not. A benchmark is the only way to test it in Haskell, as far as I know. I agree that we should wait for other people's opinions on it. |
For any other exercise I would suggest the benchmarks go in |
This is really problematic, because there are multiples situations where the tests could be run on a single core:
Even if the number of cores where not a problem, I think that testing for speed improvements misses the point:
While I'm willing to accept a benchmark suite, I strongly oppose using the tests suite to check for speed or parallel execution. |
measuredMetric = B.estPoint . anMean . reportAnalysis | ||
|
||
makeBench :: [Text] -> Int -> Benchmarkable | ||
makeBench texts n = nf (frequency n) texts -- am I right that I should use nf and not whnf? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nf
seems preferable, but it accepts a smaller set of types than whnf
.
In this case, because the returned value is a lazy Map
, I guess that using whfn
could give artificially lower and incorrect benchmark values.
I didn't tested though.
222d359
to
a976182
Compare
I've added a benchmark with cases for running it on 4 and 500 anthems. Each number of anthems would run with a single worker and physical-number-of-cores number of workers. All cases run first on a single core and then on all available cores, as I found that results are often different. It runs about 45 seconds on my machine. It's probably too many cases, but I'm not sure which we cases could get rid of. Here is the results of the sample solution And here is my solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for all the work in this PR @unmanbearpig. Performance is a hard subject in Haskell, so we really appreciate the effort. 👍
Sorry for taking so long to review it. I'm not being very constant as a maintainer here 😁 , and I have very little experience with the criterion
package.
The code seems great, but some fixes are still needed. I got some really interesting results playing with it here. 😄
I wrote a lot of comments, but please try not to take them too hard. I know very little about criterion
, so my comments should be read more like questions than as critics.
@exercism/haskell, we need to decide if we should compile - not run - the test suites as part of the Travis-CI check. This PR fails when building the benchmarks, but it passes Travis because we are ignoring them.
@@ -1,4 +1,5 @@ | |||
name: parallel-letter-frequency | |||
version: 0.1.0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we didn't documented it anywhere yet, but maybe we should avoid adding the version to the example's package.yaml
. Doing that will decrease the amount of file updates when bumping a version.
For now I think it is fine to leave it like that. This will probably be checked after we add the documentation in #538.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend not adding it. You never know what might look upon this as an example. So we should be consistent if we're not adding it.
|
||
benchmarks: | ||
bench: | ||
main: Benchmarks.hs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cabal-simple...: can't find source for Benchmarks in ...
Vide comments below in package.yaml
.
|
||
benchmarks: | ||
bench: | ||
main: Benchmarks.hs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename here doesn't match the Benchmark.hs
file in bench/
. This causes an error when running the benchmark:
Cabal-simple...: can't find source for Benchmarks in ...
benchmarks: | ||
bench: | ||
main: Benchmarks.hs | ||
source-dirs: bench |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vide comments below in package.yaml
concerning ghc-options
.
benchmarks: | ||
bench: | ||
main: Benchmarks.hs | ||
source-dirs: bench |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I had to add...
ghc-options: -threaded -with-rtsopts=-N
... to make it run multi-threaded and with multiple processors.
|
||
-- run on 1 core | ||
setNumCapabilities 1 | ||
defaultMain $ benchGroup 1 numsOfWorkers <$> numsOfAnthems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It runs about 45 seconds on my machine. It's probably too many cases, but I'm not sure which we cases could get rid of.
I would drop this block for two reasons:
- The second
defaultMain
would overwrite the HTML output file from the first one. I guess Criterion was not designed to be run like that. - IMHO, the performance variation caused by
numOfWorkers
andnumsOfAnthems
should give us enough information to see how performance changes depending on the size of the task and the number of threads.
Also, consider these ideas:
- Instead of
nub $ sort [1, processors]
, consider[1..processors]
. This would increase the number of benchmarks a lot, but give a nice curve showing how does performance changes for each additional processor. The last one is usually an interesting case. - Do we get any additional information from the
4 anthems
cases?
defaultMain $ benchGroup 1 numsOfWorkers <$> numsOfAnthems | ||
|
||
-- run on all cores | ||
setNumCapabilities processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have almost no experience with criterion
, so I'm not sure of it, but I guess that, if you decide to remove the first defaultMain
group, we wouldn't need this line anymore, right?
Just as a reference to anyone interested in reviewing this or another PR with benchmarks, the following commands may help: Clean the cache
Build everything with
|
a8705b8
to
d00ad33
Compare
bin/test-example
Outdated
runstack "test" | ||
if ! runstack "test"; then | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petertseng, could you please take a look at these changes in the test scripts when you get some time. I have some kind of fever right now, so my brain isn't 100%, and I don't want to merge anything that could break the tests scripts you wrote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it has preserved the intent, but in that case the comments above need to be removed since now it's no longer an implicit exit value.
bin/test-example
Outdated
@@ -50,7 +58,9 @@ if [ "$exampletype" = "success" ]; then | |||
# Implicit exit value: this is last thing in this path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments now must be removed, since they are otherwise inaccurate.
I think I've fixed all of the issues. I've also added a note about the benchmark to |
I'm personally thinking things are OK. At least I don't see anything that I would ask to change. Haven't run it yet to try it out, that'll have to happen tomorrow. But because of this, no need to wait for me if someone beats me to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems great!
I'll just ask for a final change... 😁
I was having problems with Travis-CI while reviewing this PR, so I took some time to fix a few things with #550 and #551. While reading the logs, I noticed that your didn't changed test-stub
to check if the benchmarks would compile with it, but it is desirable to assert that, so that we can be more confident that the users will be able to compile and run the benchmarks.
Considering that enabling the tests in Travis-CI wasn't the primary objective of this PR, I created #552 to solve that, and now we don't need more changes in the scripts in this PR.
Sorry for all the trouble, but I'll have to ask you to rebase this PR on master removing any changes to test-example
. Besides that, I think we are good to go! 👍
bin/test-example
Outdated
else | ||
BENCH=false | ||
fi | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to enable running the benchmarks was generating the following warning when testing the stub solution for parallel-letter-frequency
in Travis-CI:
WARNING: Specified source-dir "bench" does not exist
Warning: Directory listed in parallel-letter-frequency.cabal file does not exist: bench
This was happening because the test-stub
shell script wasn't changed to check the stub solutions against the benchmarks.
|
||
### Benchmark | ||
|
||
Check how changing number of workers affects performance of your solution by running the benchmark. Use `stack bench` to run it. Feel free to modify `bench/Benchmark.hs` to explore your solution's performance on different inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems great! 👍
@@ -1,5 +1,7 @@ | |||
name: parallel-letter-frequency | |||
|
|||
ghc-options: -threaded -with-rtsopts=-N -O2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vide comment for package.yaml
.
version: 0.1.0.2 | ||
version: 0.1.0.3 | ||
|
||
ghc-options: -threaded -with-rtsopts=-N -O2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line causes the following warnings:
Configuring parallel-letter-frequency-0.1.0.3...
Warning: 'ghc-options: -threaded' has no effect for libraries. It should only be used for executables.
Warning: 'ghc-options: -with-rtsopts' has no effect for libraries. It should only be used for executables.
Besides the warnings, the only real side effect of this is enabling -threaded -with-rtsopts=-N -O2
for the test suite, which doesn't need it, but the slower compilation from -O2
probably has an insignificant effect in the overall building time.
I've added cases for running it on 4 and 500 anthems. Each number of anthems would run with a single worker and physical-number-of-cores number of workers. All cases run first on a single core and then on all available cores, as I found that results are often different.
b5b452c
to
145b7b0
Compare
@rbasso, I've just rebased it on top of master, deleted my script changes, and moved |
We usually don't care about the PRs history, and leave only a few commits to make the repository's history nice and clean. In this case, I guess all the changes belong together, so a single commit seems a good idea. To avoid giving you more trouble, I'll squash-merge it here. 👍 Thanks a lot! |
Fixes #519
I've added a benchmark test that makes sure that
frequency
function runs at least 10% faster (arbitrary number) with multiple workers that with a single worker.It increases the time it takes to run the tests by about 5 seconds on my machine, not sure if it's acceptable. It might be possible to make it faster by tweaking Criterion settings.
Another concern is the benchmark test would only work correctly on a machine with at least 2 cores.
It prints the benchmark results into the console:
I guess another option would be to add a separate benchmark and have users check the speed themselves, not sure which option is better. If it looks good I'm going to add some information about it into
HINTS.md
.Maybe we could make it optional somehow?