Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark the production code rather than some arbitrary thing #5200

Merged
merged 3 commits into from
Mar 13, 2023

Conversation

effectfully
Copy link
Contributor

Hopefully fixes the issue described here

@effectfully
Copy link
Contributor Author

Not running the benchmarks yet, because the machine is busy apparently.

@effectfully effectfully force-pushed the effectfully/benchmarking/stop-the-irrelevance branch from 23270c8 to d3b87a3 Compare March 9, 2023 16:05
@effectfully
Copy link
Contributor Author

/benchmark plutus-benchmark:validation

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Click here to check the status of your benchmark.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Comparing benchmark results of ' plutus-benchmark:validation' on '9611fd59f' (base) and 'd3b87a355' (PR)

Results table
Script 9611fd5 d3b87a3 Change
auction_1-1 151.1 μs 165.8 μs +9.7%
auction_1-2 653.6 μs 684.6 μs +4.7%
auction_1-3 646.7 μs 669.9 μs +3.6%
auction_1-4 200.0 μs 215.4 μs +7.7%
auction_2-1 154.3 μs 168.2 μs +9.0%
auction_2-2 653.2 μs 675.2 μs +3.4%
auction_2-3 854.0 μs 878.8 μs +2.9%
auction_2-4 644.0 μs 659.8 μs +2.5%
auction_2-5 200.1 μs 213.6 μs +6.7%
crowdfunding-success-1 185.2 μs 195.0 μs +5.3%
crowdfunding-success-2 185.4 μs 194.9 μs +5.1%
crowdfunding-success-3 185.7 μs 195.8 μs +5.4%
currency-1 237.0 μs 242.5 μs +2.3%
escrow-redeem_1-1 328.9 μs 344.8 μs +4.8%
escrow-redeem_1-2 329.3 μs 345.2 μs +4.8%
escrow-redeem_2-1 393.2 μs 406.3 μs +3.3%
escrow-redeem_2-2 395.7 μs 404.7 μs +2.3%
escrow-redeem_2-3 395.2 μs 404.8 μs +2.4%
escrow-refund-1 138.0 μs 144.4 μs +4.6%
future-increase-margin-1 236.7 μs 242.2 μs +2.3%
future-increase-margin-2 522.8 μs 546.2 μs +4.5%
future-increase-margin-3 521.1 μs 543.7 μs +4.3%
future-increase-margin-4 495.1 μs 501.6 μs +1.3%
future-increase-margin-5 861.6 μs 873.3 μs +1.4%
future-pay-out-1 236.5 μs 241.2 μs +2.0%
future-pay-out-2 521.5 μs 546.9 μs +4.9%
future-pay-out-3 520.8 μs 544.1 μs +4.5%
future-pay-out-4 854.0 μs 870.9 μs +2.0%
future-settle-early-1 236.8 μs 241.5 μs +2.0%
future-settle-early-2 521.1 μs 543.9 μs +4.4%
future-settle-early-3 521.5 μs 543.7 μs +4.3%
future-settle-early-4 645.3 μs 653.0 μs +1.2%
game-sm-success_1-1 383.8 μs 398.4 μs +3.8%
game-sm-success_1-2 172.5 μs 185.1 μs +7.3%
game-sm-success_1-3 648.5 μs 664.9 μs +2.5%
game-sm-success_1-4 198.7 μs 215.5 μs +8.5%
game-sm-success_2-1 381.0 μs 392.7 μs +3.1%
game-sm-success_2-2 173.0 μs 185.0 μs +6.9%
game-sm-success_2-3 648.4 μs 665.7 μs +2.7%
game-sm-success_2-4 199.2 μs 216.5 μs +8.7%
game-sm-success_2-5 650.4 μs 664.9 μs +2.2%
game-sm-success_2-6 199.0 μs 215.1 μs +8.1%
multisig-sm-1 394.9 μs 407.7 μs +3.2%
multisig-sm-2 387.7 μs 400.0 μs +3.2%
multisig-sm-3 385.1 μs 404.5 μs +5.0%
multisig-sm-4 395.5 μs 410.3 μs +3.7%
multisig-sm-5 572.3 μs 579.5 μs +1.3%
multisig-sm-6 393.6 μs 405.9 μs +3.1%
multisig-sm-7 388.2 μs 401.3 μs +3.4%
multisig-sm-8 386.6 μs 405.3 μs +4.8%
multisig-sm-9 394.4 μs 409.4 μs +3.8%
multisig-sm-10 572.1 μs 580.9 μs +1.5%
ping-pong-1 324.7 μs 330.3 μs +1.7%
ping-pong-2 325.5 μs 330.7 μs +1.6%
ping-pong_2-1 186.6 μs 193.9 μs +3.9%
prism-1 144.6 μs 155.4 μs +7.5%
prism-2 403.9 μs 418.9 μs +3.7%
prism-3 343.9 μs 353.2 μs +2.7%
pubkey-1 121.1 μs 131.7 μs +8.8%
stablecoin_1-1 963.9 μs 982.1 μs +1.9%
stablecoin_1-2 168.0 μs 181.4 μs +8.0%
stablecoin_1-3 1.103 ms 1.123 ms +1.8%
stablecoin_1-4 176.0 μs 190.1 μs +8.0%
stablecoin_1-5 1.391 ms 1.421 ms +2.2%
stablecoin_1-6 219.6 μs 236.1 μs +7.5%
stablecoin_2-1 963.6 μs 991.1 μs +2.9%
stablecoin_2-2 167.7 μs 179.5 μs +7.0%
stablecoin_2-3 1.099 ms 1.131 ms +2.9%
stablecoin_2-4 175.5 μs 190.8 μs +8.7%
token-account-1 173.0 μs 180.7 μs +4.5%
token-account-2 313.5 μs 330.2 μs +5.3%
uniswap-1 397.3 μs 418.1 μs +5.2%
uniswap-2 203.3 μs 215.0 μs +5.8%
uniswap-3 1.783 ms 1.837 ms +3.0%
uniswap-4 292.9 μs 312.4 μs +6.7%
uniswap-5 1.154 ms 1.185 ms +2.7%
uniswap-6 280.8 μs 301.0 μs +7.2%
vesting-1 346.5 μs 356.8 μs +3.0%

@effectfully
Copy link
Contributor Author

/benchmark plutus-benchmark:validation

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Click here to check the status of your benchmark.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2023

Comparing benchmark results of ' plutus-benchmark:validation' on '9611fd59f' (base) and 'd3b87a355' (PR)

Results table
Script 9611fd5 d3b87a3 Change
auction_1-1 150.7 μs 164.6 μs +9.2%
auction_1-2 651.8 μs 678.6 μs +4.1%
auction_1-3 640.5 μs 661.1 μs +3.2%
auction_1-4 200.0 μs 214.5 μs +7.2%
auction_2-1 154.2 μs 167.4 μs +8.6%
auction_2-2 651.6 μs 676.1 μs +3.8%
auction_2-3 853.7 μs 880.7 μs +3.2%
auction_2-4 647.0 μs 655.8 μs +1.4%
auction_2-5 200.2 μs 213.3 μs +6.5%
crowdfunding-success-1 185.5 μs 194.7 μs +5.0%
crowdfunding-success-2 185.0 μs 195.0 μs +5.4%
crowdfunding-success-3 184.3 μs 195.4 μs +6.0%
currency-1 234.3 μs 241.5 μs +3.1%
escrow-redeem_1-1 326.1 μs 345.1 μs +5.8%
escrow-redeem_1-2 326.9 μs 344.4 μs +5.4%
escrow-redeem_2-1 389.4 μs 404.0 μs +3.7%
escrow-redeem_2-2 392.6 μs 403.2 μs +2.7%
escrow-redeem_2-3 392.8 μs 405.0 μs +3.1%
escrow-refund-1 137.7 μs 143.9 μs +4.5%
future-increase-margin-1 234.2 μs 241.2 μs +3.0%
future-increase-margin-2 521.0 μs 544.3 μs +4.5%
future-increase-margin-3 517.2 μs 543.7 μs +5.1%
future-increase-margin-4 490.5 μs 499.4 μs +1.8%
future-increase-margin-5 849.1 μs 872.6 μs +2.8%
future-pay-out-1 235.7 μs 241.8 μs +2.6%
future-pay-out-2 517.7 μs 543.3 μs +4.9%
future-pay-out-3 517.7 μs 544.3 μs +5.1%
future-pay-out-4 846.8 μs 869.1 μs +2.6%
future-settle-early-1 235.1 μs 241.3 μs +2.6%
future-settle-early-2 517.3 μs 543.1 μs +5.0%
future-settle-early-3 517.3 μs 542.4 μs +4.9%
future-settle-early-4 639.4 μs 649.5 μs +1.6%
game-sm-success_1-1 380.8 μs 395.0 μs +3.7%
game-sm-success_1-2 172.8 μs 185.2 μs +7.2%
game-sm-success_1-3 647.3 μs 666.6 μs +3.0%
game-sm-success_1-4 198.6 μs 214.8 μs +8.2%
game-sm-success_2-1 378.5 μs 395.0 μs +4.4%
game-sm-success_2-2 172.1 μs 185.9 μs +8.0%
game-sm-success_2-3 644.0 μs 666.9 μs +3.6%
game-sm-success_2-4 197.6 μs 215.7 μs +9.2%
game-sm-success_2-5 646.1 μs 670.9 μs +3.8%
game-sm-success_2-6 198.1 μs 215.1 μs +8.6%
multisig-sm-1 393.2 μs 409.6 μs +4.2%
multisig-sm-2 383.9 μs 402.7 μs +4.9%
multisig-sm-3 383.6 μs 406.9 μs +6.1%
multisig-sm-4 387.8 μs 413.0 μs +6.5%
multisig-sm-5 559.8 μs 584.0 μs +4.3%
multisig-sm-6 388.7 μs 410.7 μs +5.7%
multisig-sm-7 381.1 μs 402.7 μs +5.7%
multisig-sm-8 381.3 μs 407.1 μs +6.8%
multisig-sm-9 391.3 μs 413.5 μs +5.7%
multisig-sm-10 567.0 μs 585.4 μs +3.2%
ping-pong-1 322.0 μs 331.1 μs +2.8%
ping-pong-2 323.1 μs 335.0 μs +3.7%
ping-pong_2-1 185.0 μs 194.7 μs +5.2%
prism-1 143.7 μs 154.8 μs +7.7%
prism-2 401.0 μs 422.1 μs +5.3%
prism-3 343.6 μs 354.9 μs +3.3%
pubkey-1 121.3 μs 133.4 μs +10.0%
stablecoin_1-1 964.1 μs 980.8 μs +1.7%
stablecoin_1-2 167.4 μs 178.1 μs +6.4%
stablecoin_1-3 1.098 ms 1.118 ms +1.8%
stablecoin_1-4 175.3 μs 189.6 μs +8.2%
stablecoin_1-5 1.387 ms 1.417 ms +2.2%
stablecoin_1-6 219.6 μs 234.4 μs +6.7%
stablecoin_2-1 964.1 μs 982.0 μs +1.9%
stablecoin_2-2 167.8 μs 178.3 μs +6.3%
stablecoin_2-3 1.101 ms 1.122 ms +1.9%
stablecoin_2-4 176.0 μs 189.7 μs +7.8%
token-account-1 173.1 μs 180.0 μs +4.0%
token-account-2 314.6 μs 327.5 μs +4.1%
uniswap-1 397.9 μs 416.0 μs +4.5%
uniswap-2 203.2 μs 213.4 μs +5.0%
uniswap-3 1.780 ms 1.820 ms +2.2%
uniswap-4 290.0 μs 309.3 μs +6.7%
uniswap-5 1.144 ms 1.180 ms +3.1%
uniswap-6 279.9 μs 298.7 μs +6.7%
vesting-1 342.1 μs 354.1 μs +3.5%

@effectfully
Copy link
Contributor Author

effectfully commented Mar 10, 2023

OK, in addition to the issues referenced in the description of the PR, we have more problems:

  1. benchmarking on master is pessimistic, because unsafeEvaluateCekNoEmit' returns an EvaluationResult rather than an Either and the former is strict, meaning forcing it will result in discharging being performed. Not sure if it actually matters in our specific benchmarks (if they all compute to, say, a constant, then there's basically no difference)
  2. the actual production evaluator appears to be 4-5% slower on average than whatever we benchmark on master currently. Not good, but could be much worse

So why the 4-5% difference? I've checked the Core, this is what we have on master:

-- RHS size: {terms: 7, types: 6, coercions: 63, joins: 0/0}
unsafeEvaluateCekNoEmit'2
  :: forall {s}.
     State# s
     -> (# State# s,
           ExBudgetInfo RestrictingSt DefaultUni DefaultFun s #)
unsafeEvaluateCekNoEmit'2
  = \ (@s_adFA) (eta_adFB :: State# s_adFA) ->
      $wrestricting
        (unsafeEvaluateCekNoEmit'3 `cast` <Co:63>)
        9223372036854775807#
        9223372036854775807#
        eta_adFB

-- RHS size: {terms: 26, types: 114, coercions: 98, joins: 0/0}
unsafeEvaluateCekNoEmit'
  :: Term NamedDeBruijn DefaultUni DefaultFun ()
     -> EvaluationResult (Term NamedDeBruijn DefaultUni DefaultFun ())
unsafeEvaluateCekNoEmit'
  = \ (x_abKi :: Term NamedDeBruijn DefaultUni DefaultFun ()) ->
      case runCekDeBruijn
             (unsafeEvaluateCekNoEmit'3 `cast` <Co:63>)
             defaultCekParameters1
             (unsafeEvaluateCekNoEmit'2 `cast` <Co:16>)
             (noEmitter1 `cast` <Co:19>)
             x_abKi
      of
      { (e_a9Z4, ds_dbMx, ds1_dbMy) ->
      case e_a9Z4 of {
        Left ds2_abKw ->
          case ds2_abKw of { ErrorWithCause evalErr_abKB cause_abKC ->
          case evalErr_abKB of {
            InternalEvaluationError err_abMs ->
              unsafeEvaluateCekNoEmit'1 err_abMs cause_abKC;
            UserEvaluationError ds3_adLv -> EvaluationFailure
          }
          };
        Right term1_abMv -> $WEvaluationSuccess term1_abMv
      }
      }

and this is what we have in this branch:

-- RHS size: {terms: 7, types: 6, coercions: 63, joins: 0/0}
evaluateCekLikeInProd1
  :: forall {s}.
     State# s
     -> (# State# s,
           ExBudgetInfo RestrictingSt DefaultUni DefaultFun s #)
evaluateCekLikeInProd1
  = \ (@s_ahLF) (eta_ahLG :: State# s_ahLF) ->
      $wrestricting
        (evaluateCekLikeInProd2 `cast` <Co:63>)
        9223372036854775807#
        9223372036854775807#
        eta_ahLG

-- RHS size: {terms: 16, types: 76, coercions: 99, joins: 0/0}
evaluateCekLikeInProd
  :: Term NamedDeBruijn DefaultUni DefaultFun ()
     -> Either
          (CekEvaluationException NamedDeBruijn DefaultUni DefaultFun)
          (Term NamedDeBruijn DefaultUni DefaultFun ())
evaluateCekLikeInProd
  = \ (term_acKg :: Term NamedDeBruijn DefaultUni DefaultFun ()) ->
      case getEvalCtx of {
        Left l_ahIy -> Left l_ahIy;
        Right r_ahIA ->
          case runCekDeBruijn
                 (evaluateCekLikeInProd17 `cast` <Co:63>)
                 (r_ahIA `cast` <Co:1>)
                 (evaluateCekLikeInProd1 `cast` <Co:16>)
                 (noEmitter1 `cast` <Co:19>)
                 term_acKg
          of
          { (getRes_adJt, ds_dfRE, ds1_dfRF) ->
          getRes_adJt
          }
      }

I can't pinpoint what makes this evaluator slower than the one we have on master. The only differences I'm able to observe are:

  1. the Either vs EvaluationResult issue mentioned above
  2. master's evaluator shares pretty-printing, Typeable etc instance across the budgeter and the CEK machine runner, while evaluator in this branch doesn't. I can't imagine this making a noticeable difference, but it's an odd detail
  3. builtins are instantiated differently. This can easily make the difference, because builtins are heavily optimized and one failed INLINE can affect evaluation times enormously. In the evaluator used on master all BuiltinRuntimes are pulled to the top-level and turned into CAFs (last time I checked), while the production evaluator encapsulated all the thunks under a lambda that later gets instantiated. Such a huge difference can have huge impact on evaluation times, but it would be very hard to tell what exactly causes it

Overall, we should just finish off this PR, i.e. document it and test that the evaluator used for benchmarking doesn't perform discharging (like we test for the production evaluator in plutus-ledger-api), and create a ticket for investigating the performance gap further.

@@ -16,11 +17,10 @@ import UntypedPlutusCore as UPLC
`cabal bench -- plutus-benchmark:validation --benchmark-options crowdfunding`.
-}
main :: IO ()
main = benchWith mkCekBM
main = evaluate getEvalCtx *> benchWith mkCekBM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this definitely ensure that this is shared across the runs? It seems like it would be simpler to use https://hackage.haskell.org/package/criterion-1.6.0.0/docs/Criterion-Types.html#v:env

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, or perhaps it wasn't clear: the creation of the evaluation context has non-trivial cost, but is shared by the ledger between script evaluations. So it shouldn't be part of the benchmarked code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this definitely ensure that this is shared across the runs?

getEvalCtx is a CAF and doesn't get inlined (it has a NOINLINE pragma and I've verified that it's not getting inlined by looking at the Core anyway). I can't imagine it not being reused, why would GHC ever recompute a non-functional monomorphic top-level value?

Maybe it should be evaluate (force getEvalCtx), though.

It seems like it would be simpler to use https://hackage.haskell.org/package/criterion-1.6.0.0/docs/Criterion-Types.html#v:env

Did you read the description? The function seems completely irrelevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, yes, of course it should be evaluate (force getEvalCtx), there's that lazy Either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you read the description? The function seems completely irrelevant.

Yes, and I've used it before. I thought the point of it was to clearly compute a value (it lets you use IO so you can do evaluate) before the benchmarking begins, and feed it into the individual benchmarks as a function argument. We could even share it across all the benchmarking runs, which would save us some time.

getEvalCtx is a CAF and doesn't get inlined (it has a NOINLINE pragma and I've verified that it's not getting inlined by looking at the Core anyway). I can't imagine it not being reused, why would GHC ever recompute a non-functional monomorphic top-level value?

This seems like far more complex reasoning than using a function argument and env? That guarantees it'll only be computed once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point of it was to clearly compute a value (it lets you use IO so you can do evaluate) before the benchmarking begins, and feed it into the individual benchmarks as a function argument.

But it's not what this function is for!

Motivation. In earlier versions of criterion, all benchmark inputs were always created when a program started running. By deferring the creation of an environment when its associated benchmarks need the its, we avoid two problems that this strategy caused:

  • Memory pressure distorted the results of unrelated benchmarks. If one benchmark needed e.g. a gigabyte-sized input, it would force the garbage collector to do extra work when running some other benchmark that had no use for that input. Since the data created by an environment is only available when it is in scope, it should be garbage collected before other benchmarks are run.
  • The time cost of generating all needed inputs could be significant in cases where no inputs (or just a few) were really needed. This occurred often, for instance when just one out of a large suite of benchmarks was run, or when a user would list the collection of benchmarks without running any.

OK, forget about CAF etc. We only care about getEvalCtx being forced before benchmarking starts. evaluate does exactly that. Once it's forced, it can't be unforced and whatever they do with env isn't more reliable than that.

And getEvalCtx isn't an input to the benchmarks to even consider env. It's a value that the function being benchmarked uses under the hood, you can't feed the value to the function, it's already inside!

If you want to change evaluateCekLikeInProd to accept an EvaluationContext as an input, I don't mind that, in that case it would make more sense to call env, although as I said whatever lazy magic it does, we don't need or care about it, it's completely irrelevant for our use case, so even in that case it would make to force getEvalCtx manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to change evaluateCekLikeInProd to accept an EvaluationContext as an input

Yes sorry, that's what I was suggesting. evaluateCekLikeInProd just calls getEvalCtx and then calls evaluateTerm. Note that that's actually less like the production version, which really does take it as a parameter and it's cached externally. This way we're relying on some complicated logic that we can force it here and then apparently recompute it inside a function and have it not repeat work. It seems much clearer to just... pass it as an argument!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then, sounds reasonable.

Feel like doing it yourself? I'm trying to give away as much non-costing-related work as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

:: Either
(UPLC.CekEvaluationException UPLC.NamedDeBruijn UPLC.DefaultUni UPLC.DefaultFun)
EvaluationContext
getEvalCtx = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be legit to just do mkDynEvaluationContext <version> PLC.defaultCostModelParams. Since we're trying to not include this in the benchmark run.

This is also fine, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're trying to not include this in the benchmark run.

What "this"?

In any case, I wanted to stay as close to the production evaluator as possible and not do any "reasoning". getEvalCtx is a CAF that is forced before benchmarking starts and is used across all benchmarks, hence I don't see how it would distort the results (assuming I've fixed the forcing to go to NF rather than WHNF, but even the latter can't distort the results in a meaningful manner).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "this"?

This function goes through the dance of computing the integers-only version of the cost model, and then feeding it back into the function that converts it into the original form. That's good, insofar as it mirrors the production version, except that this happens in the shared "compute the evaluation context" block, and so shouldn't be included in the benchmarks anyway. So it's fine to use the faster version that just creates the evaluation context without jumping through the hoops.

plutus-benchmark/validation/Common.hs Outdated Show resolved Hide resolved
plutus-benchmark/validation/Common.hs Show resolved Hide resolved
EvaluationContext
getEvalCtx = do
costParams <-
maybe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bezirg do we not have a function somewhere for going from CostModelParams to the list of integers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should even just so we can test that they round-trip.

effectfully and others added 2 commits March 10, 2023 18:14
Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
@effectfully
Copy link
Contributor Author

/benchmark plutus-benchmark:validation

@github-actions
Copy link
Contributor

Click here to check the status of your benchmark.

@github-actions
Copy link
Contributor

Comparing benchmark results of ' plutus-benchmark:validation' on '4497b2f7d' (base) and '098aed22f' (PR)

Results table
Script 4497b2f 098aed2 Change
auction_1-1 151.1 μs 166.7 μs +10.3%
auction_1-2 654.7 μs 689.2 μs +5.3%
auction_1-3 647.2 μs 673.1 μs +4.0%
auction_1-4 200.2 μs 218.3 μs +9.0%
auction_2-1 154.7 μs 169.7 μs +9.7%
auction_2-2 658.0 μs 687.9 μs +4.5%
auction_2-3 858.2 μs 902.2 μs +5.1%
auction_2-4 642.3 μs 674.6 μs +5.0%
auction_2-5 199.9 μs 218.4 μs +9.3%
crowdfunding-success-1 184.7 μs 198.3 μs +7.4%
crowdfunding-success-2 184.5 μs 197.9 μs +7.3%
crowdfunding-success-3 185.3 μs 197.9 μs +6.8%
currency-1 235.1 μs 245.4 μs +4.4%
escrow-redeem_1-1 327.1 μs 350.9 μs +7.3%
escrow-redeem_1-2 327.7 μs 350.2 μs +6.9%
escrow-redeem_2-1 391.2 μs 414.9 μs +6.1%
escrow-redeem_2-2 393.2 μs 413.0 μs +5.0%
escrow-redeem_2-3 393.2 μs 410.9 μs +4.5%
escrow-refund-1 137.8 μs 147.1 μs +6.7%
future-increase-margin-1 235.5 μs 245.1 μs +4.1%
future-increase-margin-2 519.4 μs 551.1 μs +6.1%
future-increase-margin-3 518.7 μs 549.7 μs +6.0%
future-increase-margin-4 492.7 μs 507.2 μs +2.9%
future-increase-margin-5 856.1 μs 885.0 μs +3.4%
future-pay-out-1 236.4 μs 245.3 μs +3.8%
future-pay-out-2 520.3 μs 549.6 μs +5.6%
future-pay-out-3 521.3 μs 549.6 μs +5.4%
future-pay-out-4 853.8 μs 882.2 μs +3.3%
future-settle-early-1 236.1 μs 246.0 μs +4.2%
future-settle-early-2 518.7 μs 551.4 μs +6.3%
future-settle-early-3 519.5 μs 552.5 μs +6.4%
future-settle-early-4 640.9 μs 666.3 μs +4.0%
game-sm-success_1-1 382.8 μs 401.7 μs +4.9%
game-sm-success_1-2 173.1 μs 187.5 μs +8.3%
game-sm-success_1-3 647.1 μs 674.2 μs +4.2%
game-sm-success_1-4 198.8 μs 218.6 μs +10.0%
game-sm-success_2-1 380.7 μs 399.3 μs +4.9%
game-sm-success_2-2 172.1 μs 188.2 μs +9.4%
game-sm-success_2-3 642.2 μs 673.3 μs +4.8%
game-sm-success_2-4 197.8 μs 217.8 μs +10.1%
game-sm-success_2-5 646.5 μs 669.4 μs +3.5%
game-sm-success_2-6 198.5 μs 218.1 μs +9.9%
multisig-sm-1 394.0 μs 413.8 μs +5.0%
multisig-sm-2 385.0 μs 406.4 μs +5.6%
multisig-sm-3 382.4 μs 410.5 μs +7.3%
multisig-sm-4 390.4 μs 415.9 μs +6.5%
multisig-sm-5 564.8 μs 589.5 μs +4.4%
multisig-sm-6 388.3 μs 411.3 μs +5.9%
multisig-sm-7 379.6 μs 405.8 μs +6.9%
multisig-sm-8 379.9 μs 410.8 μs +8.1%
multisig-sm-9 389.6 μs 412.0 μs +5.7%
multisig-sm-10 561.4 μs 586.9 μs +4.5%
ping-pong-1 319.9 μs 336.5 μs +5.2%
ping-pong-2 321.3 μs 333.5 μs +3.8%
ping-pong_2-1 184.4 μs 196.9 μs +6.8%
prism-1 143.4 μs 157.1 μs +9.6%
prism-2 399.3 μs 431.6 μs +8.1%
prism-3 341.6 μs 356.2 μs +4.3%
pubkey-1 120.9 μs 132.8 μs +9.8%
stablecoin_1-1 964.0 μs 1.007 ms +4.5%
stablecoin_1-2 167.4 μs 181.7 μs +8.5%
stablecoin_1-3 1.096 ms 1.148 ms +4.7%
stablecoin_1-4 174.6 μs 194.1 μs +11.2%
stablecoin_1-5 1.381 ms 1.458 ms +5.6%
stablecoin_1-6 218.6 μs 240.1 μs +9.8%
stablecoin_2-1 958.5 μs 1.006 ms +5.0%
stablecoin_2-2 167.5 μs 181.3 μs +8.2%
stablecoin_2-3 1.092 ms 1.148 ms +5.1%
stablecoin_2-4 175.1 μs 193.6 μs +10.6%
token-account-1 172.8 μs 182.9 μs +5.8%
token-account-2 313.2 μs 331.8 μs +5.9%
uniswap-1 396.8 μs 422.8 μs +6.6%
uniswap-2 202.5 μs 220.1 μs +8.7%
uniswap-3 1.776 ms 1.857 ms +4.6%
uniswap-4 290.9 μs 318.7 μs +9.6%
uniswap-5 1.141 ms 1.201 ms +5.3%
uniswap-6 278.7 μs 305.1 μs +9.5%
vesting-1 342.0 μs 361.1 μs +5.6%

@effectfully
Copy link
Contributor Author

Hm, it's +6.39% now. Let's try again.

@effectfully
Copy link
Contributor Author

/benchmark plutus-benchmark:validation

@github-actions
Copy link
Contributor

Click here to check the status of your benchmark.

@github-actions
Copy link
Contributor

Comparing benchmark results of ' plutus-benchmark:validation' on '4497b2f7d' (base) and '098aed22f' (PR)

Results table
Script 4497b2f 098aed2 Change
auction_1-1 150.9 μs 166.9 μs +10.6%
auction_1-2 652.0 μs 685.7 μs +5.2%
auction_1-3 646.3 μs 669.5 μs +3.6%
auction_1-4 199.2 μs 218.0 μs +9.4%
auction_2-1 154.5 μs 169.6 μs +9.8%
auction_2-2 655.8 μs 684.3 μs +4.3%
auction_2-3 849.4 μs 882.5 μs +3.9%
auction_2-4 647.9 μs 665.1 μs +2.7%
auction_2-5 200.6 μs 216.9 μs +8.1%
crowdfunding-success-1 184.4 μs 198.0 μs +7.4%
crowdfunding-success-2 185.5 μs 197.1 μs +6.3%
crowdfunding-success-3 184.6 μs 197.6 μs +7.0%
currency-1 235.5 μs 243.9 μs +3.6%
escrow-redeem_1-1 327.6 μs 345.8 μs +5.6%
escrow-redeem_1-2 327.8 μs 346.4 μs +5.7%
escrow-redeem_2-1 391.0 μs 406.0 μs +3.8%
escrow-redeem_2-2 391.6 μs 407.0 μs +3.9%
escrow-redeem_2-3 390.8 μs 411.6 μs +5.3%
escrow-refund-1 138.3 μs 146.1 μs +5.6%
future-increase-margin-1 235.8 μs 243.4 μs +3.2%
future-increase-margin-2 518.8 μs 544.8 μs +5.0%
future-increase-margin-3 517.7 μs 545.4 μs +5.4%
future-increase-margin-4 490.5 μs 502.4 μs +2.4%
future-increase-margin-5 855.0 μs 871.7 μs +2.0%
future-pay-out-1 235.5 μs 242.0 μs +2.8%
future-pay-out-2 518.7 μs 543.4 μs +4.8%
future-pay-out-3 517.9 μs 543.6 μs +5.0%
future-pay-out-4 846.4 μs 869.0 μs +2.7%
future-settle-early-1 235.7 μs 243.2 μs +3.2%
future-settle-early-2 517.9 μs 544.4 μs +5.1%
future-settle-early-3 517.6 μs 545.0 μs +5.3%
future-settle-early-4 640.7 μs 653.6 μs +2.0%
game-sm-success_1-1 380.4 μs 395.3 μs +3.9%
game-sm-success_1-2 171.8 μs 186.2 μs +8.4%
game-sm-success_1-3 644.4 μs 667.2 μs +3.5%
game-sm-success_1-4 197.8 μs 216.4 μs +9.4%
game-sm-success_2-1 379.5 μs 393.5 μs +3.7%
game-sm-success_2-2 171.9 μs 186.8 μs +8.7%
game-sm-success_2-3 644.8 μs 667.0 μs +3.4%
game-sm-success_2-4 198.5 μs 216.7 μs +9.2%
game-sm-success_2-5 648.9 μs 660.4 μs +1.8%
game-sm-success_2-6 198.9 μs 216.1 μs +8.6%
multisig-sm-1 392.6 μs 407.1 μs +3.7%
multisig-sm-2 385.6 μs 399.2 μs +3.5%
multisig-sm-3 382.5 μs 405.3 μs +6.0%
multisig-sm-4 394.8 μs 408.5 μs +3.5%
multisig-sm-5 564.9 μs 580.2 μs +2.7%
multisig-sm-6 391.9 μs 405.2 μs +3.4%
multisig-sm-7 384.9 μs 401.2 μs +4.2%
multisig-sm-8 382.7 μs 405.4 μs +5.9%
multisig-sm-9 392.7 μs 409.7 μs +4.3%
multisig-sm-10 567.0 μs 580.2 μs +2.3%
ping-pong-1 322.1 μs 330.4 μs +2.6%
ping-pong-2 323.0 μs 334.5 μs +3.6%
ping-pong_2-1 185.3 μs 195.3 μs +5.4%
prism-1 144.5 μs 156.6 μs +8.4%
prism-2 398.0 μs 418.3 μs +5.1%
prism-3 341.6 μs 354.0 μs +3.6%
pubkey-1 121.0 μs 132.3 μs +9.3%
stablecoin_1-1 960.4 μs 994.9 μs +3.6%
stablecoin_1-2 167.6 μs 184.6 μs +10.1%
stablecoin_1-3 1.109 ms 1.133 ms +2.2%
stablecoin_1-4 175.9 μs 191.7 μs +9.0%
stablecoin_1-5 1.398 ms 1.432 ms +2.4%
stablecoin_1-6 220.6 μs 237.2 μs +7.5%
stablecoin_2-1 967.4 μs 989.8 μs +2.3%
stablecoin_2-2 168.5 μs 179.8 μs +6.7%
stablecoin_2-3 1.107 ms 1.134 ms +2.4%
stablecoin_2-4 176.4 μs 191.9 μs +8.8%
token-account-1 174.8 μs 181.3 μs +3.7%
token-account-2 315.9 μs 328.6 μs +4.0%
uniswap-1 399.9 μs 417.0 μs +4.3%
uniswap-2 203.9 μs 215.5 μs +5.7%
uniswap-3 1.793 ms 1.836 ms +2.4%
uniswap-4 293.0 μs 313.3 μs +6.9%
uniswap-5 1.154 ms 1.181 ms +2.3%
uniswap-6 282.3 μs 303.8 μs +7.6%
vesting-1 344.6 μs 360.1 μs +4.5%

@effectfully
Copy link
Contributor Author

effectfully commented Mar 10, 2023

+5.07%. That's a pretty big gap in benchmarking results TBH.

@effectfully
Copy link
Contributor Author

Comments addressed, @michaelpj merge if you're OK with what's in here.

@michaelpj
Copy link
Contributor

I'm happy modulo the thing about how we ensure that the evaluation context is shared. I just think we could establish it more simply.

@michaelpj michaelpj merged commit 9ab93f4 into master Mar 13, 2023
@michaelpj michaelpj deleted the effectfully/benchmarking/stop-the-irrelevance branch March 13, 2023 11:24
@bezirg
Copy link
Contributor

bezirg commented Mar 14, 2023

The only difference I can see is that:

  • before, in every benchmark run, the runCekDebruijn was called with the CAF defaultCekParameters :: MachineParameters
  • now, in every benchmark run, the runCekDeBruijn evaluates (toMachineParameters pv ectx)

I don't know if this is the culprit of the 5%. This shouldn't attribute to such a big difference.

@effectfully
Copy link
Contributor Author

now, in every benchmark run, the runCekDeBruijn evaluates (toMachineParameters pv ectx)

It's basically zero-cost:

newtype EvaluationContext = EvaluationContext
    { machineParameters :: DefaultMachineParameters
    }

toMachineParameters :: ProtocolVersion -> EvaluationContext -> DefaultMachineParameters
toMachineParameters _ = machineParameters

before, in every benchmark run, the runCekDebruijn was called with the CAF defaultCekParameters :: MachineParameters

Now we have a different CAF: getEvalCtx.

@effectfully effectfully added No Changelog Required Add this to skip the Changelog Check and removed No Changelog Required Add this to skip the Changelog Check labels Jun 28, 2023
v0d1ch pushed a commit to v0d1ch/plutus that referenced this pull request Dec 6, 2024
…sectMBO#5200)

* Benchmark the production code rather than some arbitrary thing

* Apply suggestions from code review

Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>

* Address comments

---------

Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants