Project benchmarks #312

jrogov · 2019-10-10T15:22:21Z

Sometimes it's necessary for a project to measure it's performance, and however easy to use, Benchee.run is usually a one-shot solution.

I propose ExUnit-like task 'mix benchmark' and DSL to create benchmarks for a project.

These benchmarks are plain .exs Elixir scripts placed in the ./benchmarks directory of a project, and whenever 'mix benchmark' is run, it compiles all the modules within these scripts and runs benchmarks specified within them. It's also possible to pass individual files and directories as arguments to the task to run contained benchmarks, e.g. 'mix benchmark dir1 file1 file2 dir2'.

Changes:

Separated print.configuration flag for printing system specification to print.system flag. (I think it's not neccessary to print spec over and over as benchmarks are run). Tests are fixed correspondingly.
./lib/mix/tasks contains benchmark task
./lib/benchee/project contains macro for benchmark DSL.

Syntax for DSL is better described with an example. One is contained in the new './example' folder and can be found here: https://github.com/bencheeorg/benchee/blob/94979ee88a784664c2ee3c716e32c0f4fadb06be/example/project_benchmark.exs

Notes:

README is not updated yet. Will do as soon as all the changes are approved.
No tests for this DSL were created. (see 1)
Task doesn't support any of Benchee options. Comments for if it's really required are welcome.

P.S. Kind of a wish: I would strongly recommend to propose this library as a part of standard Elixir distribution, along with ExUnit, since this functionality might be useful for projects of any kind.

devonestes · 2019-10-10T15:44:54Z

Hey @arikai, thanks for putting this together!

We've had discussions about this kind of thing before, like here: #236

One of the things that I think Tobi and I both really love is that Benchee's API is "just a function." This both makes it really easy for folks to use, and makes interop with Erlang much easier (which is something we both care about as well).

We do the same kind of thing you're talking about in this library already, but we do it with a bash script instead of a mix task.

If what you're looking for is a way to run all your benchmarks with one command, I'd recommend going with a bash script (or a Makefile!). Let's see what @PragTob says about this, though, before making any decision. I'm ok with having this if he is, but my initial feeling is that it's a lot of rather tricky code to maintain and it's only purpose is to replace the need for a bash script.

Of course, if you want to build something on top of benchee that offers this sort of DSL, you're welcome to do so!

And I love that you would want this in Elixir core, but they've made it really clear that the only thing going in Elixir core is what's needed to develop the language itself, and this doesn't fall into that category, so I kind of agree with them on keeping this as a separate package.

OvermindDL1 · 2019-10-10T20:49:18Z

For note, I already have a mix bench ... functionality in my sandbox project, it's something I've been meaning to pull out sometime, if it would actually be useful?

jrogov · 2019-10-11T09:35:49Z

@devonestes

From what I've read on the issue linked above, you do approve this idea in general, but have concerns on the maintenance and usecases of this addition.

So here are my two cents.

1. Why not shell scripts?

Of course you can do this with shell scripts. Hell, what can't you do with them?

However, there are cons.

Bash is basically restricted to commands provided by mix, its plugin tasks, and some one-liners for iex. Something more complex - write .exs. But why do we need to use 2 languages to just use convenient devtools? Why do we need to reimplement it over and over again (or copy) for this functionality?

I believe, this is exactly the reason dev tools exist and mix in particular, and just prettty much libraries and frameworks in general: to encapsulate a pattern used over and over again. And to use bash in projects like this seems veeery hacky and not convenient at all. Bash is more like "use available tools for your particular case", not "add functionality to tools". And project-wide benchmarks is a very common pattern.

2. Maintaining second API seems like another burden.

But please look closely at what this DSL and task do.
DSL simply creates a bunch of functions (setup, teardown, run*) and sets attributes for the latter.
Task simply compiles benchmarks, searches for this attribute in compiled modules and simply calls "run*" functions (with call to optional module-wide setup and teardown)

But what run* functions actually do? They simply call Benchee.run with options defined with the DSL and options passed with the task (although there is none for now). And options defined with the DSL are exactly the same you would usually pass to Benchee.run!

So when we talk about maintaining this "second API" we talk about just adding some switches for OptionParser in task module.

That's it.

3. What are the pros of the DSL to simple API of one function?

Let's talk about tests. It's surely possible to make them as simple .exs scripts. It's surely possible to run them via bash script. But why? Is it easier to use? No. Is it cleaner to use? No. Is it easier to write them? Heck no, DSL looks way cleaner and easier to read. Tests are easier to change and maintain. DSL is usually closer to the natural language (although, more domain specific) than tests written in two programming languages, one of them being some quirky shell script.

I do not deny that single-function-API is way too easier to use for one-shot benchmarks. But when we're talking about libraries changes in which may degrade performance, and, what's by orders of magnitude worse, may cascade into projects that use them, causing to degrade them as well.

Still waiting for the @PragTob 's thoughts about this.

jrogov · 2019-10-11T09:36:49Z

@OvermindDL1 Sure, especially if you have some interesting ideas and features that I missed to implement.

jrogov · 2019-10-11T13:38:32Z

I still can't get around this strange error from build with 1.6.6 + OTP21 (second build) that I can't even reproduce on my machine, so help needed.

OvermindDL1 · 2019-10-11T14:08:41Z

Mine is very specific to my sandbox project, but it's an easy start for something bigger.

First my addons to mix.exs (this should become it's own mix task instead):

  defp aliases do
    [
      bench: &bench/1
    ]
  end

  defp bench(arglist) do
    Mix.Tasks.Run.run(["bench/bench.exs" | arglist])
  end

Then bench/bench.exs (would be part of that mix task):

System.argv()
|> Enum.each(fn bench ->
  try do
    Code.require_file("bench/#{bench}_bench.exs")
    mod = String.to_atom("Elixir.#{Macro.camelize(bench)}Bench")

    if :erlang.function_exported(mod, :module_info, 0) do
      if(:erlang.function_exported(mod, :classifiers, 0), do: mod.classifiers(), else: [nil])
      |> case do
        nil -> [nil]
        [] -> [nil]
        clas when is_list(clas) -> clas
      end
      |> Enum.each(fn cla ->
        if cla do
          title = "Benchmarking Classifier: #{cla}"
          sep = String.duplicate("=", String.length(title))
          IO.puts("\n#{title}\n#{sep}\n")
        end

        setup = if(:erlang.function_exported(mod, :setup, 1), do: mod.setup(cla), else: nil)

        m =
          cond do
            :erlang.function_exported(mod, :time, 2) -> mod.time(cla, setup)
            :erlang.function_exported(mod, :time, 1) -> mod.time(cla)
            true -> 5
          end

        inputs =
          cond do
            :erlang.function_exported(mod, :inputs, 2) -> mod.inputs(cla, setup)
            :erlang.function_exported(mod, :inputs, 1) -> mod.inputs(cla)
            true -> nil
          end

        actions =
          cond do
            :erlang.function_exported(mod, :actions, 2) -> mod.actions(cla, setup)
            true -> mod.actions(cla)
          end

        Benchee.run(
          actions,
          [
            time: m,
            warmup: m,
            memory_time: m,
            print: %{fast_warning: false}
          ] ++ if(inputs, do: [inputs: inputs], else: [])
        )

        if(:erlang.function_exported(mod, :teardown, 2), do: mod.teardown(cla, setup))
      end)
    end
  rescue
    r -> IO.puts("Unknown exception: #{Exception.format(:error, r, __STACKTRACE__)}")
  catch
    {type, reason} when type in [:throw, :exit] -> IO.puts("Unknown error: #{Exception.format(type, reason, __STACKTRACE__)}")
    e -> IO.puts("Unknown error: #{Exception.format(:throw, e, __STACKTRACE__)}")
  end
end)

And basically it can run bench benchmark modules (the naming pattern should probably also be changed, as stated this is my naming convention and is not at all generic). It can also run with a variety of input types as well with each being it's own benchee run, among other features.

But for example here is my most recent benchmark module, it is not using all of the features even remotely, and I could leave out some of the no-op functions, but I like to keep them in for documentation:

defmodule MapLookupOrThrowBench do
  def classifiers(), do: nil

  def time(_), do: 2

  def inputs(_cla),
    do: %{
          "small-2-existing" => {2, 1..2 |> Map.new(&{&1, to_string(&1)})},
          "medium-10-existing" => {10, 1..10 |> Map.new(&{&1, to_string(&1)})},
          "large-10000-existing" => {10000, 1..10000 |> Map.new(&{&1, to_string(&1)})},
          "small-1-not-existing" => {2, 1..1 |> Map.new(&{&1, to_string(&1)})},
          "medium-9-not-existing" => {10, 1..9 |> Map.new(&{&1, to_string(&1)})},
          "large-9999-not-existing" => {10000, 1..9999 |> Map.new(&{&1, to_string(&1)})},
    }

  def setup(_cla), do: nil

  def teardown(_cla, _setup), do: nil

  def actions(_cla, _setup),
    do: %{
          "Map.fetch" => fn {k, m} ->
            Map.fetch(m, k)
          end,
          "catch-Map.fetch!" => fn {k, m} ->
            try do Map.fetch!(m, k)
            rescue _ -> :error
            catch _ -> :error
            end
          end,
          "Access" => fn {k, m} ->
            m[k]
          end,
          "catch-direct-always-fail" => fn {_k, m} ->
            try do m.a
            rescue _ -> :error
            catch _ -> :error
            end
          end,
          "case" => fn {k, m} ->
            case m do
              %{^k => v} -> v
              _ -> :error
            end
          end,
    }
end

Honestly, for a benchmark scaffolding thing I'd look more into how Rust's Criterion API does it, it's very clean and nice, especially with a few macro's tossed in.

PragTob · 2019-10-12T17:43:17Z

👋

Hi everyone and thanks for your contributions first and foremost 🎉 🙏

I mostly agree with @devonestes - I love it that benchee uses simple functions to do its job. I don't want to use macros where functions can do the job. That there's no default way to run all benchmarks I can see, however for that I'd rather suggest or include a mix task or an easy bash script. I also don't see that the requirement to run all of them nicely is enough to introduce a DSL.

However, I see and know that people want a DSL (although I still don't really understand why - I don't believe that taking benchmarks more seriously is bound to there being a DSL, if you care you run them and if not you don't), and we build tools and I believe that they should also cater to the people who use them. And if a DSL makes people happier/somewhat likelier to benchmark I'm happy to provide one (at a certain cost).

My opinion/solution for this is to not include it in "core" but as a separate package (initial name idea: benchee_dsl) - benchee is designed in a way so that it's easy to add stuff around it, why not a DSL? I always wanted to build this myself for funzies after reading the meta programming book but I haven't yet gotten around to it 😁
I'm also happy to have this extension be in the bencheeorg/official.

In that project I'd have a few remarks though (likely in the README):

Usage of the simple function API is encouraged
features might not be implemented/added in step with "core"

The last one is important to me, it's sometimes already not so great to upgrade all plugins. It's also another reason why I don't want this in "core" because in "core" everything should always be up to date and fully working.

The DSL itself is remarkably similar to what I had in mind looking at it 👌 (mostly the benchmark <--> scenario relationship and how hooks are incorporated) I have some minor points but we can discuss those another time.

I haven't looked at the implementation yet, but it'd be important to me that it's easy to add new options etc (highest goal: if we add a new option to benchee I don't even need to touch it, it just works).

How does that sound for everyone?

(gotta run now, sorry, here have a bunny)

jrogov · 2019-10-14T09:51:55Z

@PragTob Then I am really looking forward for you to check the implementation.

And, as I already said, it's not some all-new custom API that changes the way Benchee works, the Benchee as it exists (single-function-API) is barely touched (on option is divided in two, you can barely call this a change).

It is simply a wrapper around Benchee.run/2 with custom interface for passing options and benchmarks itself.

As for the

(highest goal: if we add a new option to benchee I don't even need to touch it, it just works)

It's easy to implement really. Even though I'd advocate for just a subset of them (for starters, I can imagine NO case where one might want to pass before_scenario callback as a COMMAND-LINE OPTION)

PragTob · 2019-10-14T18:32:23Z

@arikai I didn't mean command line options (I haven't checked the PR yet) but more about the DSL in the example file. Like how you specify time 5 if it was able to do some_new_option 44 without me telling it about it I'd accept a bit more implementation complexity.

I'll see when I'll have time to review it that week, due to my opinion that macros should only be used when functions absolutely don't work my macro isn't as fluent as it should be 😅

jrogov · 2019-10-15T09:27:00Z

@PragTob
If I understand you correctly, it's important to you to be able to pass the very SAME options with DSL as you would with Benchee.run.
As I already mentioned in my previous message

It is simply a wrapper around Benchee.run/2 with custom interface for passing options and benchmarks itself.

And basically that's it. Surely it wraps some blocks into functions and what else, but the options are exactly the same you are to support for Benchee.run options and per-scenario options.

About your assumption

due to my opinion that macros should only be used when functions absolutely don't work my macro isn't as fluent as it should be

It's true for languages with rather simple macro preprocessing system (think C family, Erlang), but Elixir has more of a Lisp-like macro. It means that macro return somewhat of Elixir Intermidiate Representation that it works with during the actual compilation of the files.
And what really distinguishes Lisp-like macro system is that you can use any function you would use in the actual code, with difference that (if not quoted) it will be executed during compilation. It means that we can do whatever complex stuff we want, and the only thing we will be (hardly) slowing down is compilation itself: the parsing of our DSL or what else does not happen in the runtime. In fact, all we have after complete compilation of the modules in this Elixir DSL (or pretty much most of them) is a simple module with some functions and attributes.

Please check the implementation as soon as you can.

PragTob · 2019-10-15T10:33:30Z

@arikai my main point wasn't that it should be the same options (that's a given) but that when I add new one to benchee I don't need to add it manually. So kinda that it builds the available DSL methods automatically from the available keys in Benchee.Configuration at compilation time or something like this (maybe this is how it's implemented, as I said, haven't checked yet).

So that when I introduce a new key in benchee like reduction_time automagically benchee_dsl also had a new dsl function reduction_time

jrogov · 2019-10-15T12:14:09Z

Please refer to my previous answer

2. Maintaining second API seems like another burden.

But please look closely at what this DSL and task do.
DSL simply creates a bunch of functions (setup, teardown, run*) and sets attributes for the latter.
Task simply compiles benchmarks, searches for this attribute in compiled modules and simply calls "run*" functions (with call to optional module-wide setup and teardown)

But what run* functions actually do? They simply call Benchee.run with options defined with the DSL and options passed with the task (although there is none for now). And options defined with the DSL are exactly the same you would usually pass to Benchee.run!

So when we talk about maintaining this "second API" we talk about just adding some switches for OptionParser in task module.

That's it.

The only thing is that it's just a parameter to a macro, not a function: exactly the same way Benchee.run/2 accepts options as an optional second argument.

PragTob

Hey,

thanks for all your work. I'd have preferred it though if we talked about the idea and the DSL design before we got in here.

Not sure where to best continue this conversation as I still don't want to have it in core but rather a benchee_dsl project that still needs to be setup :) That also has the advantage that not everything has to work/be ready for release from the get go.

There are lots of linline comments, but the biggest one is probably the options in the DSL. I'd rather have them all in the same place:

benchmark "...", option1: value1, option2: value2 do

end

# or

benchmark "..." do
  option1 value1
  option2 value2

  # ...
end

Right now input breaks from the all in the option thing.

Implementation wise I'd like to split it up a bit more to help with readability and maintainability.

Thanks!
Tobi

PragTob · 2019-10-15T19:37:11Z

example/project_benchmark.exs

+  #
+  # If defined, must return {:ok, state} for benchmarks to run
+  setup do
+    {:ok, nil}


why do we need {:ok, state} ? I know ExUnit does it, but why can't we just return state ? If something goes wrong it can just raise or what am I missing?

It's rather a matter of convention. raise means some serious runtime error that should have never happened if the code were write.
Surely, we can, just throw things, but it will be more implicit I think: if you just look on the result type of the setup, you already know what it should return to work correctly. To tell programmers to just throw their error would be less obvious IMO than that.

I just never quite understood it. In what scenario will my setup not work? Usually the code I call will already raise at some point. It might just be me, I've never seen setup implementation that returns anything but {:ok, state}

Well, the code that returns ok/error pair usually should never throw (although might raise), at least with any other reason then MatchError, but that's on the caller side, so not function's problem, but programmer that's calling it :)

Surely it might be easier to write happy-path-code all the way, just a matter of taste I guess.

Share your final thoughts on this :)

I'm fine with it for ExUnits sake 🤷‍♂️

example/project_benchmark.exs

lib/mix/tasks/benchmark.ex

PragTob · 2019-10-15T20:22:07Z

lib/mix/tasks/benchmark.ex

+          |> Path.join()
+          |> Path.wildcard()
+        else
+          [file]


just file should be enough here shouldn't it?

The function must return list of files found for the further use in iterator functions.

Sure? It's being flat_maped, so the wrapped list should be removed anyhow.

Uh, not sure if it was a question, but here's a piece of iex session:

iex(1)> [:a, :b, :c] |> Enum.flat_map(&[&1]) [:a, :b, :c] iex(2)> [:a, :b, :c] |> Enum.flat_map(& &1) ** (Protocol.UndefinedError) protocol Enumerable not implemented for :a of type Atom

I think I meant more because we return [file] here and alter flat_map why can't we just return file and then map instead of flat_map?

lib/mix/tasks/benchmark.ex

PragTob · 2019-10-16T18:25:47Z

@arikai where do we want to go from here, shall I set up a new benchee_dsl repo for us to put this in and then PR against this one? PR the printing of system against this one?

Thanks!

jrogov · 2019-12-23T12:57:40Z

@PragTob
Sorry for sudden disappearance!
I addressed most of your remarks and the rest are waiting for your opinion!

About separate repo: I do believe that these two rather small modules belong to Benchee, since there are no drawback for such decision. I would've considered separate repo if the project were to be included into e.g. final mix release and therefore a size of a final archive would be important, but this isn't the case.

Please do inform me if there are any drawbacks.

PragTob · 2020-02-01T20:09:09Z

@arikai right back at you :) Sorry, I was busy with different stuff as well. Thanks for all the work 💚

There are some errors here, but I believe they can be fixed/figured out. Some of them seem to be formatter incompatibilities between versions which we could fix, along with some dialyzer warnings that also seem questionable.

There might be no observable code/artifact drawbacks for such a decision, but there are others:

@devonestes and I want to keep the benchee "core" macro free as we believe benchmarks don't inherently need them (at least last I talked to Devon about it)
having it in core means that every upgrade/addition anyone ever does needs to be added to the DSL which makes the barrier bigger and increases maintenance overhead
having it in here means the documentation should also live here, I'm happier to have it in a separate space where it can have its separate docs
I like to keep benchee "core" as small as possible, for separation of concerns and so that other things can build on top of it
foremost, benchee was built to be extensible, treating the DSL as a separate project enforces this and might show us where we're missing official APIs that a project like this might need

Anyhow, so long 👋

devonestes · 2020-02-02T07:21:17Z

Just to chime in here - I agree with all of the points @PragTob mentions above, but the primary reason I don’t want to have macros in Benchee is that they can’t be used in Erlang/Gleam/LFE/etc. We’ve tried hard to make sure this library is useful for the entire BEAM community, and unfortunately macros - since they’re specific to Elixir - don’t play nicely with those other languages.

I’m all for folks building language-specific libraries on top of the core we provide in Benchee if there are things (such as this DSL) that are possible in one language but not in another, but I would really like to see Benchee itself remain useful to any language on the BEAM and not to have sections of this library that are specific to one language or another.

If one day Elixir macros become usable in those other languages, I’d be all for revisiting this, actually, but until that happens I would strongly prefer to see this functionality in a separate package.

PragTob · 2020-02-02T10:11:09Z

@devonestes ah I actually forgot about that, great point thank you 💚

I'm fine having the extra package in the benchee org and maintaining it. I just won't promise that it'll always support all the newest options/things in benchee "core" :)

PragTob · 2022-07-08T15:15:52Z

I recon our friends from @hrzndhrn have a dsl package out there for folks. I'll review it and then maybe start a convo with them if they'd want to move it to the main benchee org.

https://github.com/hrzndhrn/benchee_dsl

Yaroslav Rogov added 3 commits October 10, 2019 18:03

Separate system spec printing from print.configuration flag

56f7f67

Add elixir_ls to gitignore

003ee51

Add project benchmarks macro and task

94979ee

Yaroslav Rogov added 3 commits October 11, 2019 15:07

Fix Elixir 1.6.6. Fix dialyzer and credo warnings.

3143ca7

Fix formatting

238b978

Try to fix this Elixir 1.6.6 dialyzer warning

b38d7cf

PragTob reviewed Oct 15, 2019

View reviewed changes

Address @PragTob remarks during bencheeorg#312

349f7d3

NickNeck mentioned this pull request May 2, 2020

Bench modules (ExUnit-style benchmarks definitions) (DSL) #236

Open

PragTob changed the base branch from master to main July 5, 2020 09:41

PragTob closed this Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project benchmarks #312

Project benchmarks #312

jrogov commented Oct 10, 2019

devonestes commented Oct 10, 2019

OvermindDL1 commented Oct 10, 2019

jrogov commented Oct 11, 2019

jrogov commented Oct 11, 2019

jrogov commented Oct 11, 2019

OvermindDL1 commented Oct 11, 2019 •

edited

Loading

PragTob commented Oct 12, 2019

jrogov commented Oct 14, 2019 •

edited

Loading

PragTob commented Oct 14, 2019

jrogov commented Oct 15, 2019

PragTob commented Oct 15, 2019

jrogov commented Oct 15, 2019

2. Maintaining second API seems like another burden.

PragTob left a comment

PragTob Oct 15, 2019

jrogov Oct 16, 2019

PragTob Oct 16, 2019

jrogov Dec 23, 2019

PragTob Feb 1, 2020

PragTob Oct 15, 2019

jrogov Oct 16, 2019

PragTob Oct 16, 2019

jrogov Dec 23, 2019

PragTob Feb 1, 2020

PragTob commented Oct 16, 2019

jrogov commented Dec 23, 2019

PragTob commented Feb 1, 2020

devonestes commented Feb 2, 2020

PragTob commented Feb 2, 2020

PragTob commented Jul 8, 2022

Project benchmarks #312

Project benchmarks #312

Conversation

jrogov commented Oct 10, 2019

devonestes commented Oct 10, 2019

OvermindDL1 commented Oct 10, 2019

jrogov commented Oct 11, 2019

1. Why not shell scripts?

2. Maintaining second API seems like another burden.

3. What are the pros of the DSL to simple API of one function?

jrogov commented Oct 11, 2019

jrogov commented Oct 11, 2019

OvermindDL1 commented Oct 11, 2019 • edited Loading

PragTob commented Oct 12, 2019

jrogov commented Oct 14, 2019 • edited Loading

PragTob commented Oct 14, 2019

jrogov commented Oct 15, 2019

PragTob commented Oct 15, 2019

jrogov commented Oct 15, 2019

2. Maintaining second API seems like another burden.

PragTob left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PragTob commented Oct 16, 2019

jrogov commented Dec 23, 2019

PragTob commented Feb 1, 2020

devonestes commented Feb 2, 2020

PragTob commented Feb 2, 2020

PragTob commented Jul 8, 2022

OvermindDL1 commented Oct 11, 2019 •

edited

Loading

jrogov commented Oct 14, 2019 •

edited

Loading