parallel-letter-frequency: Add benchmark #542

unmanbearpig · 2017-05-11T07:28:05Z

Fixes #519

I've added a benchmark test that makes sure that frequency function runs at least 10% faster (arbitrary number) with multiple workers that with a single worker.

It increases the time it takes to run the tests by about 5 seconds on my machine, not sure if it's acceptable. It might be possible to make it faster by tweaking Criterion settings.

Another concern is the benchmark test would only work correctly on a machine with at least 2 cores.

It prints the benchmark results into the console:

frequency
  no texts mean no letters
  one letter
  case insensitivity
  many empty texts still mean no letters
  many times the same text gives a predictable result
  punctuation doesn't count
  numbers don't count
  all three anthems, together, 1 worker
  all three anthems, together, 4 workers
Run with a single worker
benchmarking...
time                 4.487 ms   (4.398 ms .. 4.544 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 4.566 ms   (4.557 ms .. 4.577 ms)
std dev              17.05 μs   (11.71 μs .. 25.01 μs)

Run with 4 workers
benchmarking...
time                 1.425 ms   (1.345 ms .. 1.490 ms)
                     0.977 R²   (0.951 R² .. 0.992 R²)
mean                 1.556 ms   (1.539 ms .. 1.585 ms)
std dev              50.19 μs   (33.35 μs .. 71.29 μs)
variance introduced by outliers: 13% (moderately inflated)

  multiple workers run faster than 1

Finished in 4.9143 seconds
10 examples, 0 failures

I guess another option would be to add a separate benchmark and have users check the speed themselves, not sure which option is better. If it looks good I'm going to add some information about it into HINTS.md.

Maybe we could make it optional somehow?

unmanbearpig · 2017-05-11T07:30:26Z

I'm not certain on whether I should use nf or whnf here https://github.com/exercism/xhaskell/pull/542/files#diff-63fa4ff5bfd229a3aed31a4232a2f3f8R110

rbasso · 2017-05-11T07:52:21Z

Thanks for the PR!

I think we ~~shouldn't~~ should make here a clear distinction about what belongs to tests and what belongs to benchmarks. I usually divide like this:

Tests give us a reasonable assurance that our functions have the correct behavior.
Benchmarks give feedback about how fast the code runs.

I guess another option would be to add a separate benchmark and have users check the speed themselves, not sure which option is better.

I don't knows what the other maintainers think about it, but I definitely think it would be a better solution.

Maybe we could make it optional somehow?

If it where not inside the test suite, but in a separate benchmark suite in package.yaml, it would be automatically optional.

Because we still need to decide where does the benchmark code belongs, I'll wait for the other maintainers position and not review the benchmark code a the moment.

unmanbearpig · 2017-05-11T08:22:15Z

@rbasso, I get what you're saying. My reasoning for having it as a test instead of a separate benchmark is: since the exercise is to write parallel code then tests should check if it runs in parallel. Sequential code shouldn't pass the tests, because users might think that their solution is correct when it actually is not. A benchmark is the only way to test it in Haskell, as far as I know.
It might be more straightforward for users, as they wouldn't have to think about running a separate stack bench command. Instead they would be able to keep running stack test until it passes, like in any other exercise. Seems like a bit less friction and slightly better user experience.

I agree that we should wait for other people's opinions on it.

petertseng · 2017-05-13T00:03:43Z

For any other exercise I would suggest the benchmarks go in stack bench. Convince me why it should be otherwise here! Ah, because the exercise is specifically about parallelism? That does in fact make the exercise slightly unique. I have noted it as such in x-common.

rbasso · 2017-05-16T07:57:29Z

Another concern is the benchmark test would only work correctly on a machine with at least 2 cores.

This is really problematic, because there are multiples situations where the tests could be run on a single core:

Users with single core machines (uncommon nowadays)
Development environment inside a virtual machine (some developers like that)
Continuous Integration in single core machines (I guess that Travis-CI gives us two cores ATM)

Even if the number of cores where not a problem, I think that testing for speed improvements misses the point:

It is in general impossible, because parallel execution isn't a property of the code. Haskell is non-strict, so it is up to the compiler and run-time system to decide when/how to evaluate expressions.
Testing speed is a proxy for testing parallel execution, not the same thing.
A busy host/guest OS could make a normally faster parallel benchmark be scored as slower than the non-parallel version.
A slower, parallel solution is correct IMHO, so it shouldn't be refused in any circumstance.
It is a really smart hack to measure the variation in speed, but hacks tend to compose in exponentially harder to manage projects.

While I'm willing to accept a benchmark suite, I strongly oppose using the tests suite to check for speed or parallel execution.

rbasso · 2017-05-16T08:29:36Z

exercises/parallel-letter-frequency/test/Tests.hs

+measuredMetric = B.estPoint . anMean . reportAnalysis
+
+makeBench :: [Text] -> Int -> Benchmarkable
+makeBench texts n = nf (frequency n) texts -- am I right that I should use nf and not whnf?


nf seems preferable, but it accepts a smaller set of types than whnf.

In this case, because the returned value is a lazy Map, I guess that using whfn could give artificially lower and incorrect benchmark values.

I didn't tested though.

unmanbearpig · 2017-05-17T16:28:43Z

I've added a benchmark with cases for running it on 4 and 500 anthems. Each number of anthems would run with a single worker and physical-number-of-cores number of workers. All cases run first on a single core and then on all available cores, as I found that results are often different.

It runs about 45 seconds on my machine. It's probably too many cases, but I'm not sure which we cases could get rid of.

Here is the results of the sample solution
Note that it runs faster on a single core in all cases except for 500 anthems on 8 workers.

And here is my solution
I've added a case with 256 workers as it's sometimes faster even than the best case of the sample solution, even on 1 core, which is really surprising to me. Similar situation with 1 core being faster in some cases regardless of the number of workers.

rbasso

Thanks again for all the work in this PR @unmanbearpig. Performance is a hard subject in Haskell, so we really appreciate the effort. 👍

Sorry for taking so long to review it. I'm not being very constant as a maintainer here 😁 , and I have very little experience with the criterion package.

The code seems great, but some fixes are still needed. I got some really interesting results playing with it here. 😄

I wrote a lot of comments, but please try not to take them too hard. I know very little about criterion, so my comments should be read more like questions than as critics.

@exercism/haskell, we need to decide if we should compile - not run - the test suites as part of the Travis-CI check. This PR fails when building the benchmarks, but it passes Travis because we are ignoring them.

rbasso · 2017-05-21T01:14:40Z

exercises/parallel-letter-frequency/examples/success-standard/package.yaml

@@ -1,4 +1,5 @@
 name: parallel-letter-frequency
+version: 0.1.0.3


I guess we didn't documented it anywhere yet, but maybe we should avoid adding the version to the example's package.yaml. Doing that will decrease the amount of file updates when bumping a version.

For now I think it is fine to leave it like that. This will probably be checked after we add the documentation in #538.

I recommend not adding it. You never know what might look upon this as an example. So we should be consistent if we're not adding it.

rbasso · 2017-05-21T01:40:36Z

exercises/parallel-letter-frequency/examples/success-standard/package.yaml

+
+benchmarks:
+  bench:
+    main: Benchmarks.hs


Cabal-simple...: can't find source for Benchmarks in ...

Vide comments below in package.yaml.

rbasso · 2017-05-21T01:44:48Z

exercises/parallel-letter-frequency/package.yaml

+
+benchmarks:
+  bench:
+    main: Benchmarks.hs


The filename here doesn't match the Benchmark.hs file in bench/. This causes an error when running the benchmark:

Cabal-simple...: can't find source for Benchmarks in ...

rbasso · 2017-05-21T02:04:14Z

exercises/parallel-letter-frequency/examples/success-standard/package.yaml

+benchmarks:
+  bench:
+    main: Benchmarks.hs
+    source-dirs: bench


Vide comments below in package.yaml concerning ghc-options.

rbasso · 2017-05-21T02:06:04Z

exercises/parallel-letter-frequency/package.yaml

+benchmarks:
+  bench:
+    main: Benchmarks.hs
+    source-dirs: bench


Here I had to add...

ghc-options: -threaded -with-rtsopts=-N

... to make it run multi-threaded and with multiple processors.

rbasso · 2017-05-21T03:00:56Z

exercises/parallel-letter-frequency/bench/Benchmark.hs

+
+          -- run on 1 core
+          setNumCapabilities 1
+          defaultMain $ benchGroup 1 numsOfWorkers <$> numsOfAnthems


It runs about 45 seconds on my machine. It's probably too many cases, but I'm not sure which we cases could get rid of.

I would drop this block for two reasons:

The second defaultMain would overwrite the HTML output file from the first one. I guess Criterion was not designed to be run like that.

IMHO, the performance variation caused by numOfWorkers and numsOfAnthems should give us enough information to see how performance changes depending on the size of the task and the number of threads.

Also, consider these ideas:

Instead of nub $ sort [1, processors], consider [1..processors]. This would increase the number of benchmarks a lot, but give a nice curve showing how does performance changes for each additional processor. The last one is usually an interesting case.

Do we get any additional information from the 4 anthems cases?

rbasso · 2017-05-21T03:19:57Z

exercises/parallel-letter-frequency/bench/Benchmark.hs

+          defaultMain $ benchGroup 1 numsOfWorkers <$> numsOfAnthems
+
+          -- run on all cores
+          setNumCapabilities processors


I have almost no experience with criterion, so I'm not sure of it, but I guess that, if you decide to remove the first defaultMain group, we wouldn't need this line anymore, right?

rbasso · 2017-05-21T05:17:06Z

Just as a reference to anyone interested in reviewing this or another PR with benchmarks, the following commands may help:

Clean the cache

stack clean

Build everything with `-Wall -Werror`

stack build --pedantic --test --no-run-tests --bench --no-run-benchmarks

Run the tests

stack test

Run the benchmarks and generate a HTML report

stack bench --benchmark-arguments '--output=benchmark.html'

If you are using Stack with Nix, you may also need to add --no-nix-pure to the last command.

Run `hlint` on all the sources

hlint .

rbasso · 2017-05-24T01:54:48Z

bin/test-example

-    runstack "test"
+    if ! runstack "test"; then
+        exit 1
+    fi


@petertseng, could you please take a look at these changes in the test scripts when you get some time. I have some kind of fever right now, so my brain isn't 100%, and I don't want to merge anything that could break the tests scripts you wrote.

Well, it has preserved the intent, but in that case the comments above need to be removed since now it's no longer an implicit exit value.

petertseng · 2017-05-24T02:02:43Z

bin/test-example

@@ -50,7 +58,9 @@ if [ "$exampletype" = "success" ]; then
    # Implicit exit value: this is last thing in this path,


these comments now must be removed, since they are otherwise inaccurate.

unmanbearpig · 2017-05-24T17:56:38Z

I think I've fixed all of the issues. I've also added a note about the benchmark to HINTS.md. What do you think?

petertseng · 2017-05-25T07:39:29Z

I'm personally thinking things are OK. At least I don't see anything that I would ask to change. Haven't run it yet to try it out, that'll have to happen tomorrow. But because of this, no need to wait for me if someone beats me to it.

rbasso

Seems great!

I'll just ask for a final change... 😁

I was having problems with Travis-CI while reviewing this PR, so I took some time to fix a few things with #550 and #551. While reading the logs, I noticed that your didn't changed test-stub to check if the benchmarks would compile with it, but it is desirable to assert that, so that we can be more confident that the users will be able to compile and run the benchmarks.

Considering that enabling the tests in Travis-CI wasn't the primary objective of this PR, I created #552 to solve that, and now we don't need more changes in the scripts in this PR.

Sorry for all the trouble, but I'll have to ask you to rebase this PR on master removing any changes to test-example. Besides that, I think we are good to go! 👍

rbasso · 2017-05-25T07:52:15Z

bin/test-example

+else
+    BENCH=false
+fi
+


The changes to enable running the benchmarks was generating the following warning when testing the stub solution for parallel-letter-frequency in Travis-CI:

WARNING: Specified source-dir "bench" does not exist Warning: Directory listed in parallel-letter-frequency.cabal file does not exist: bench

This was happening because the test-stub shell script wasn't changed to check the stub solutions against the benchmarks.

rbasso · 2017-05-25T07:58:53Z

exercises/parallel-letter-frequency/HINTS.md

+
+### Benchmark
+
+Check how changing number of workers affects performance of your solution by running the benchmark. Use `stack bench` to run it. Feel free to modify `bench/Benchmark.hs` to explore your solution's performance on different inputs.


Seems great! 👍

rbasso · 2017-05-25T08:01:44Z

exercises/parallel-letter-frequency/examples/success-standard/package.yaml

@@ -1,5 +1,7 @@
 name: parallel-letter-frequency

+ghc-options: -threaded -with-rtsopts=-N -O2


Vide comment for package.yaml.

rbasso · 2017-05-25T08:09:37Z

exercises/parallel-letter-frequency/package.yaml

-version: 0.1.0.2
+version: 0.1.0.3
+
+ghc-options: -threaded -with-rtsopts=-N -O2


This line causes the following warnings:

Configuring parallel-letter-frequency-0.1.0.3... Warning: 'ghc-options: -threaded' has no effect for libraries. It should only be used for executables. Warning: 'ghc-options: -with-rtsopts' has no effect for libraries. It should only be used for executables.

Besides the warnings, the only real side effect of this is enabling -threaded -with-rtsopts=-N -O2 for the test suite, which doesn't need it, but the slower compilation from -O2 probably has an insignificant effect in the overall building time.

I've added cases for running it on 4 and 500 anthems. Each number of anthems would run with a single worker and physical-number-of-cores number of workers. All cases run first on a single core and then on all available cores, as I found that results are often different.

unmanbearpig · 2017-05-25T14:55:12Z

@rbasso, I've just rebased it on top of master, deleted my script changes, and moved ghc-options into bench section.
Should I clean up and squash some commits before we do the merge into master or should we keep the history as it is?

rbasso · 2017-05-26T00:27:12Z

Should I clean up and squash some commits before we do the merge into master or should we keep the history as it is?

We usually don't care about the PRs history, and leave only a few commits to make the repository's history nice and clean. In this case, I guess all the changes belong together, so a single commit seems a good idea.

To avoid giving you more trouble, I'll squash-merge it here. 👍

Thanks a lot!

petertseng mentioned this pull request May 12, 2017

Identify computer science algorithm exercises exercism/problem-specifications#761

Open

rbasso reviewed May 16, 2017

View reviewed changes

unmanbearpig force-pushed the parallel-letter-frequency-benchmark branch 4 times, most recently from 222d359 to a976182 Compare May 17, 2017 16:20

rbasso suggested changes May 21, 2017

View reviewed changes

unmanbearpig force-pushed the parallel-letter-frequency-benchmark branch 2 times, most recently from a8705b8 to d00ad33 Compare May 22, 2017 14:49

rbasso reviewed May 24, 2017

View reviewed changes

petertseng reviewed May 24, 2017

View reviewed changes

rbasso mentioned this pull request May 25, 2017

travis: Build exercise's benchmarks #552

Merged

rbasso suggested changes May 25, 2017

View reviewed changes

rbasso changed the title ~~WIP: parallel-letter-frequency: Test parallelism speed increase~~ parallel-letter-frequency: Add benchmark May 25, 2017

unmanbearpig added 7 commits May 25, 2017 17:21

Fix Benchmark name in package.yaml

7a2cb58

Add main type signature to benchmark

66f2537

Don't add package version into the example

2032436

Don't benchmark on single core

2627e15

Get current number of threads instead of processors

2f4f414

Use rtsopts to run benchmark in parallel

121fd6c

unmanbearpig added 4 commits May 25, 2017 17:36

Delete 4 anthems case

cc4c0ab

Benchmark on 1..processors numbers of workers

a4d7d1d

Add a note about benchmark to HINTS.md

ee2bc67

Use multithreading and optimization options only for benchmarks

145b7b0

unmanbearpig force-pushed the parallel-letter-frequency-benchmark branch from b5b452c to 145b7b0 Compare May 25, 2017 14:48

rbasso approved these changes May 26, 2017

View reviewed changes

rbasso merged commit af5f05b into exercism:master May 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel-letter-frequency: Add benchmark #542

parallel-letter-frequency: Add benchmark #542

unmanbearpig commented May 11, 2017 •

edited

Loading

unmanbearpig commented May 11, 2017

rbasso commented May 11, 2017 •

edited

Loading

unmanbearpig commented May 11, 2017

petertseng commented May 13, 2017

rbasso commented May 16, 2017 •

edited

Loading

rbasso May 16, 2017

unmanbearpig commented May 17, 2017 •

edited

Loading

rbasso left a comment •

edited

Loading

rbasso May 21, 2017

petertseng May 24, 2017

rbasso May 21, 2017 •

edited

Loading

rbasso May 21, 2017

rbasso May 21, 2017

rbasso May 21, 2017

rbasso May 21, 2017 •

edited

Loading

rbasso May 21, 2017

rbasso commented May 21, 2017 •

edited

Loading

rbasso May 24, 2017 •

edited

Loading

petertseng May 24, 2017 •

edited

Loading

petertseng May 24, 2017

unmanbearpig commented May 24, 2017

petertseng commented May 25, 2017

rbasso left a comment •

edited

Loading

rbasso May 25, 2017 •

edited

Loading

rbasso May 25, 2017

rbasso May 25, 2017

rbasso May 25, 2017 •

edited

Loading

unmanbearpig commented May 25, 2017

rbasso commented May 26, 2017 •

edited

Loading

		@@ -1,4 +1,5 @@
		name: parallel-letter-frequency
		version: 0.1.0.3

		@@ -50,7 +58,9 @@ if [ "$exampletype" = "success" ]; then
		# Implicit exit value: this is last thing in this path,


		### Benchmark

		Check how changing number of workers affects performance of your solution by running the benchmark. Use `stack bench` to run it. Feel free to modify `bench/Benchmark.hs` to explore your solution's performance on different inputs.

		@@ -1,5 +1,7 @@
		name: parallel-letter-frequency

		ghc-options: -threaded -with-rtsopts=-N -O2

parallel-letter-frequency: Add benchmark #542

parallel-letter-frequency: Add benchmark #542

Conversation

unmanbearpig commented May 11, 2017 • edited Loading

unmanbearpig commented May 11, 2017

rbasso commented May 11, 2017 • edited Loading

unmanbearpig commented May 11, 2017

petertseng commented May 13, 2017

rbasso commented May 16, 2017 • edited Loading

Choose a reason for hiding this comment

unmanbearpig commented May 17, 2017 • edited Loading

rbasso left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbasso May 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbasso May 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbasso commented May 21, 2017 • edited Loading

Clean the cache

Build everything with -Wall -Werror

Run the tests

Run the benchmarks and generate a HTML report

Run hlint on all the sources

rbasso May 24, 2017 • edited Loading

Choose a reason for hiding this comment

petertseng May 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unmanbearpig commented May 24, 2017

petertseng commented May 25, 2017

rbasso left a comment • edited Loading

Choose a reason for hiding this comment

rbasso May 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbasso May 25, 2017 • edited Loading

Choose a reason for hiding this comment

unmanbearpig commented May 25, 2017

rbasso commented May 26, 2017 • edited Loading

unmanbearpig commented May 11, 2017 •

edited

Loading

rbasso commented May 11, 2017 •

edited

Loading

rbasso commented May 16, 2017 •

edited

Loading

unmanbearpig commented May 17, 2017 •

edited

Loading

rbasso left a comment •

edited

Loading

rbasso May 21, 2017 •

edited

Loading

rbasso May 21, 2017 •

edited

Loading

rbasso commented May 21, 2017 •

edited

Loading

Build everything with `-Wall -Werror`

Run `hlint` on all the sources

rbasso May 24, 2017 •

edited

Loading

petertseng May 24, 2017 •

edited

Loading

rbasso left a comment •

edited

Loading

rbasso May 25, 2017 •

edited

Loading

rbasso May 25, 2017 •

edited

Loading

rbasso commented May 26, 2017 •

edited

Loading