Benchmark the production code rather than some arbitrary thing #5200

effectfully · 2023-03-09T04:06:56Z

Hopefully fixes the issue described here

effectfully · 2023-03-09T04:09:25Z

Not running the benchmarks yet, because the machine is busy apparently.

effectfully · 2023-03-09T17:20:01Z

/benchmark plutus-benchmark:validation

github-actions · 2023-03-09T17:53:25Z

Click here to check the status of your benchmark.

github-actions · 2023-03-09T18:53:21Z

Comparing benchmark results of ' plutus-benchmark:validation' on '9611fd59f' (base) and 'd3b87a355' (PR)

Results table

Script	`9611fd5`	`d3b87a3`	Change
auction_1-1	151.1 μs	165.8 μs	+9.7%
auction_1-2	653.6 μs	684.6 μs	+4.7%
auction_1-3	646.7 μs	669.9 μs	+3.6%
auction_1-4	200.0 μs	215.4 μs	+7.7%
auction_2-1	154.3 μs	168.2 μs	+9.0%
auction_2-2	653.2 μs	675.2 μs	+3.4%
auction_2-3	854.0 μs	878.8 μs	+2.9%
auction_2-4	644.0 μs	659.8 μs	+2.5%
auction_2-5	200.1 μs	213.6 μs	+6.7%
crowdfunding-success-1	185.2 μs	195.0 μs	+5.3%
crowdfunding-success-2	185.4 μs	194.9 μs	+5.1%
crowdfunding-success-3	185.7 μs	195.8 μs	+5.4%
currency-1	237.0 μs	242.5 μs	+2.3%
escrow-redeem_1-1	328.9 μs	344.8 μs	+4.8%
escrow-redeem_1-2	329.3 μs	345.2 μs	+4.8%
escrow-redeem_2-1	393.2 μs	406.3 μs	+3.3%
escrow-redeem_2-2	395.7 μs	404.7 μs	+2.3%
escrow-redeem_2-3	395.2 μs	404.8 μs	+2.4%
escrow-refund-1	138.0 μs	144.4 μs	+4.6%
future-increase-margin-1	236.7 μs	242.2 μs	+2.3%
future-increase-margin-2	522.8 μs	546.2 μs	+4.5%
future-increase-margin-3	521.1 μs	543.7 μs	+4.3%
future-increase-margin-4	495.1 μs	501.6 μs	+1.3%
future-increase-margin-5	861.6 μs	873.3 μs	+1.4%
future-pay-out-1	236.5 μs	241.2 μs	+2.0%
future-pay-out-2	521.5 μs	546.9 μs	+4.9%
future-pay-out-3	520.8 μs	544.1 μs	+4.5%
future-pay-out-4	854.0 μs	870.9 μs	+2.0%
future-settle-early-1	236.8 μs	241.5 μs	+2.0%
future-settle-early-2	521.1 μs	543.9 μs	+4.4%
future-settle-early-3	521.5 μs	543.7 μs	+4.3%
future-settle-early-4	645.3 μs	653.0 μs	+1.2%
game-sm-success_1-1	383.8 μs	398.4 μs	+3.8%
game-sm-success_1-2	172.5 μs	185.1 μs	+7.3%
game-sm-success_1-3	648.5 μs	664.9 μs	+2.5%
game-sm-success_1-4	198.7 μs	215.5 μs	+8.5%
game-sm-success_2-1	381.0 μs	392.7 μs	+3.1%
game-sm-success_2-2	173.0 μs	185.0 μs	+6.9%
game-sm-success_2-3	648.4 μs	665.7 μs	+2.7%
game-sm-success_2-4	199.2 μs	216.5 μs	+8.7%
game-sm-success_2-5	650.4 μs	664.9 μs	+2.2%
game-sm-success_2-6	199.0 μs	215.1 μs	+8.1%
multisig-sm-1	394.9 μs	407.7 μs	+3.2%
multisig-sm-2	387.7 μs	400.0 μs	+3.2%
multisig-sm-3	385.1 μs	404.5 μs	+5.0%
multisig-sm-4	395.5 μs	410.3 μs	+3.7%
multisig-sm-5	572.3 μs	579.5 μs	+1.3%
multisig-sm-6	393.6 μs	405.9 μs	+3.1%
multisig-sm-7	388.2 μs	401.3 μs	+3.4%
multisig-sm-8	386.6 μs	405.3 μs	+4.8%
multisig-sm-9	394.4 μs	409.4 μs	+3.8%
multisig-sm-10	572.1 μs	580.9 μs	+1.5%
ping-pong-1	324.7 μs	330.3 μs	+1.7%
ping-pong-2	325.5 μs	330.7 μs	+1.6%
ping-pong_2-1	186.6 μs	193.9 μs	+3.9%
prism-1	144.6 μs	155.4 μs	+7.5%
prism-2	403.9 μs	418.9 μs	+3.7%
prism-3	343.9 μs	353.2 μs	+2.7%
pubkey-1	121.1 μs	131.7 μs	+8.8%
stablecoin_1-1	963.9 μs	982.1 μs	+1.9%
stablecoin_1-2	168.0 μs	181.4 μs	+8.0%
stablecoin_1-3	1.103 ms	1.123 ms	+1.8%
stablecoin_1-4	176.0 μs	190.1 μs	+8.0%
stablecoin_1-5	1.391 ms	1.421 ms	+2.2%
stablecoin_1-6	219.6 μs	236.1 μs	+7.5%
stablecoin_2-1	963.6 μs	991.1 μs	+2.9%
stablecoin_2-2	167.7 μs	179.5 μs	+7.0%
stablecoin_2-3	1.099 ms	1.131 ms	+2.9%
stablecoin_2-4	175.5 μs	190.8 μs	+8.7%
token-account-1	173.0 μs	180.7 μs	+4.5%
token-account-2	313.5 μs	330.2 μs	+5.3%
uniswap-1	397.3 μs	418.1 μs	+5.2%
uniswap-2	203.3 μs	215.0 μs	+5.8%
uniswap-3	1.783 ms	1.837 ms	+3.0%
uniswap-4	292.9 μs	312.4 μs	+6.7%
uniswap-5	1.154 ms	1.185 ms	+2.7%
uniswap-6	280.8 μs	301.0 μs	+7.2%
vesting-1	346.5 μs	356.8 μs	+3.0%

effectfully · 2023-03-09T21:12:54Z

/benchmark plutus-benchmark:validation

github-actions · 2023-03-09T21:13:07Z

Click here to check the status of your benchmark.

github-actions · 2023-03-09T22:12:50Z

Comparing benchmark results of ' plutus-benchmark:validation' on '9611fd59f' (base) and 'd3b87a355' (PR)

Results table

Script	`9611fd5`	`d3b87a3`	Change
auction_1-1	150.7 μs	164.6 μs	+9.2%
auction_1-2	651.8 μs	678.6 μs	+4.1%
auction_1-3	640.5 μs	661.1 μs	+3.2%
auction_1-4	200.0 μs	214.5 μs	+7.2%
auction_2-1	154.2 μs	167.4 μs	+8.6%
auction_2-2	651.6 μs	676.1 μs	+3.8%
auction_2-3	853.7 μs	880.7 μs	+3.2%
auction_2-4	647.0 μs	655.8 μs	+1.4%
auction_2-5	200.2 μs	213.3 μs	+6.5%
crowdfunding-success-1	185.5 μs	194.7 μs	+5.0%
crowdfunding-success-2	185.0 μs	195.0 μs	+5.4%
crowdfunding-success-3	184.3 μs	195.4 μs	+6.0%
currency-1	234.3 μs	241.5 μs	+3.1%
escrow-redeem_1-1	326.1 μs	345.1 μs	+5.8%
escrow-redeem_1-2	326.9 μs	344.4 μs	+5.4%
escrow-redeem_2-1	389.4 μs	404.0 μs	+3.7%
escrow-redeem_2-2	392.6 μs	403.2 μs	+2.7%
escrow-redeem_2-3	392.8 μs	405.0 μs	+3.1%
escrow-refund-1	137.7 μs	143.9 μs	+4.5%
future-increase-margin-1	234.2 μs	241.2 μs	+3.0%
future-increase-margin-2	521.0 μs	544.3 μs	+4.5%
future-increase-margin-3	517.2 μs	543.7 μs	+5.1%
future-increase-margin-4	490.5 μs	499.4 μs	+1.8%
future-increase-margin-5	849.1 μs	872.6 μs	+2.8%
future-pay-out-1	235.7 μs	241.8 μs	+2.6%
future-pay-out-2	517.7 μs	543.3 μs	+4.9%
future-pay-out-3	517.7 μs	544.3 μs	+5.1%
future-pay-out-4	846.8 μs	869.1 μs	+2.6%
future-settle-early-1	235.1 μs	241.3 μs	+2.6%
future-settle-early-2	517.3 μs	543.1 μs	+5.0%
future-settle-early-3	517.3 μs	542.4 μs	+4.9%
future-settle-early-4	639.4 μs	649.5 μs	+1.6%
game-sm-success_1-1	380.8 μs	395.0 μs	+3.7%
game-sm-success_1-2	172.8 μs	185.2 μs	+7.2%
game-sm-success_1-3	647.3 μs	666.6 μs	+3.0%
game-sm-success_1-4	198.6 μs	214.8 μs	+8.2%
game-sm-success_2-1	378.5 μs	395.0 μs	+4.4%
game-sm-success_2-2	172.1 μs	185.9 μs	+8.0%
game-sm-success_2-3	644.0 μs	666.9 μs	+3.6%
game-sm-success_2-4	197.6 μs	215.7 μs	+9.2%
game-sm-success_2-5	646.1 μs	670.9 μs	+3.8%
game-sm-success_2-6	198.1 μs	215.1 μs	+8.6%
multisig-sm-1	393.2 μs	409.6 μs	+4.2%
multisig-sm-2	383.9 μs	402.7 μs	+4.9%
multisig-sm-3	383.6 μs	406.9 μs	+6.1%
multisig-sm-4	387.8 μs	413.0 μs	+6.5%
multisig-sm-5	559.8 μs	584.0 μs	+4.3%
multisig-sm-6	388.7 μs	410.7 μs	+5.7%
multisig-sm-7	381.1 μs	402.7 μs	+5.7%
multisig-sm-8	381.3 μs	407.1 μs	+6.8%
multisig-sm-9	391.3 μs	413.5 μs	+5.7%
multisig-sm-10	567.0 μs	585.4 μs	+3.2%
ping-pong-1	322.0 μs	331.1 μs	+2.8%
ping-pong-2	323.1 μs	335.0 μs	+3.7%
ping-pong_2-1	185.0 μs	194.7 μs	+5.2%
prism-1	143.7 μs	154.8 μs	+7.7%
prism-2	401.0 μs	422.1 μs	+5.3%
prism-3	343.6 μs	354.9 μs	+3.3%
pubkey-1	121.3 μs	133.4 μs	+10.0%
stablecoin_1-1	964.1 μs	980.8 μs	+1.7%
stablecoin_1-2	167.4 μs	178.1 μs	+6.4%
stablecoin_1-3	1.098 ms	1.118 ms	+1.8%
stablecoin_1-4	175.3 μs	189.6 μs	+8.2%
stablecoin_1-5	1.387 ms	1.417 ms	+2.2%
stablecoin_1-6	219.6 μs	234.4 μs	+6.7%
stablecoin_2-1	964.1 μs	982.0 μs	+1.9%
stablecoin_2-2	167.8 μs	178.3 μs	+6.3%
stablecoin_2-3	1.101 ms	1.122 ms	+1.9%
stablecoin_2-4	176.0 μs	189.7 μs	+7.8%
token-account-1	173.1 μs	180.0 μs	+4.0%
token-account-2	314.6 μs	327.5 μs	+4.1%
uniswap-1	397.9 μs	416.0 μs	+4.5%
uniswap-2	203.2 μs	213.4 μs	+5.0%
uniswap-3	1.780 ms	1.820 ms	+2.2%
uniswap-4	290.0 μs	309.3 μs	+6.7%
uniswap-5	1.144 ms	1.180 ms	+3.1%
uniswap-6	279.9 μs	298.7 μs	+6.7%
vesting-1	342.1 μs	354.1 μs	+3.5%

effectfully · 2023-03-10T02:21:28Z

OK, in addition to the issues referenced in the description of the PR, we have more problems:

benchmarking on master is pessimistic, because unsafeEvaluateCekNoEmit' returns an EvaluationResult rather than an Either and the former is strict, meaning forcing it will result in discharging being performed. Not sure if it actually matters in our specific benchmarks (if they all compute to, say, a constant, then there's basically no difference)
the actual production evaluator appears to be 4-5% slower on average than whatever we benchmark on master currently. Not good, but could be much worse

So why the 4-5% difference? I've checked the Core, this is what we have on master:

-- RHS size: {terms: 7, types: 6, coercions: 63, joins: 0/0}
unsafeEvaluateCekNoEmit'2
  :: forall {s}.
     State# s
     -> (# State# s,
           ExBudgetInfo RestrictingSt DefaultUni DefaultFun s #)
unsafeEvaluateCekNoEmit'2
  = \ (@s_adFA) (eta_adFB :: State# s_adFA) ->
      $wrestricting
        (unsafeEvaluateCekNoEmit'3 `cast` <Co:63>)
        9223372036854775807#
        9223372036854775807#
        eta_adFB

-- RHS size: {terms: 26, types: 114, coercions: 98, joins: 0/0}
unsafeEvaluateCekNoEmit'
  :: Term NamedDeBruijn DefaultUni DefaultFun ()
     -> EvaluationResult (Term NamedDeBruijn DefaultUni DefaultFun ())
unsafeEvaluateCekNoEmit'
  = \ (x_abKi :: Term NamedDeBruijn DefaultUni DefaultFun ()) ->
      case runCekDeBruijn
             (unsafeEvaluateCekNoEmit'3 `cast` <Co:63>)
             defaultCekParameters1
             (unsafeEvaluateCekNoEmit'2 `cast` <Co:16>)
             (noEmitter1 `cast` <Co:19>)
             x_abKi
      of
      { (e_a9Z4, ds_dbMx, ds1_dbMy) ->
      case e_a9Z4 of {
        Left ds2_abKw ->
          case ds2_abKw of { ErrorWithCause evalErr_abKB cause_abKC ->
          case evalErr_abKB of {
            InternalEvaluationError err_abMs ->
              unsafeEvaluateCekNoEmit'1 err_abMs cause_abKC;
            UserEvaluationError ds3_adLv -> EvaluationFailure
          }
          };
        Right term1_abMv -> $WEvaluationSuccess term1_abMv
      }
      }

and this is what we have in this branch:

-- RHS size: {terms: 7, types: 6, coercions: 63, joins: 0/0}
evaluateCekLikeInProd1
  :: forall {s}.
     State# s
     -> (# State# s,
           ExBudgetInfo RestrictingSt DefaultUni DefaultFun s #)
evaluateCekLikeInProd1
  = \ (@s_ahLF) (eta_ahLG :: State# s_ahLF) ->
      $wrestricting
        (evaluateCekLikeInProd2 `cast` <Co:63>)
        9223372036854775807#
        9223372036854775807#
        eta_ahLG

-- RHS size: {terms: 16, types: 76, coercions: 99, joins: 0/0}
evaluateCekLikeInProd
  :: Term NamedDeBruijn DefaultUni DefaultFun ()
     -> Either
          (CekEvaluationException NamedDeBruijn DefaultUni DefaultFun)
          (Term NamedDeBruijn DefaultUni DefaultFun ())
evaluateCekLikeInProd
  = \ (term_acKg :: Term NamedDeBruijn DefaultUni DefaultFun ()) ->
      case getEvalCtx of {
        Left l_ahIy -> Left l_ahIy;
        Right r_ahIA ->
          case runCekDeBruijn
                 (evaluateCekLikeInProd17 `cast` <Co:63>)
                 (r_ahIA `cast` <Co:1>)
                 (evaluateCekLikeInProd1 `cast` <Co:16>)
                 (noEmitter1 `cast` <Co:19>)
                 term_acKg
          of
          { (getRes_adJt, ds_dfRE, ds1_dfRF) ->
          getRes_adJt
          }
      }

I can't pinpoint what makes this evaluator slower than the one we have on master. The only differences I'm able to observe are:

the Either vs EvaluationResult issue mentioned above
master's evaluator shares pretty-printing, Typeable etc instance across the budgeter and the CEK machine runner, while evaluator in this branch doesn't. I can't imagine this making a noticeable difference, but it's an odd detail
builtins are instantiated differently. This can easily make the difference, because builtins are heavily optimized and one failed INLINE can affect evaluation times enormously. In the evaluator used on master all BuiltinRuntimes are pulled to the top-level and turned into CAFs (last time I checked), while the production evaluator encapsulated all the thunks under a lambda that later gets instantiated. Such a huge difference can have huge impact on evaluation times, but it would be very hard to tell what exactly causes it

Overall, we should just finish off this PR, i.e. document it and test that the evaluator used for benchmarking doesn't perform discharging (like we test for the production evaluator in plutus-ledger-api), and create a ticket for investigating the performance gap further.

michaelpj · 2023-03-10T10:42:17Z

plutus-benchmark/validation/BenchCek.hs

@@ -16,11 +17,10 @@ import UntypedPlutusCore as UPLC
     `cabal bench -- plutus-benchmark:validation --benchmark-options crowdfunding`.
 -}
 main :: IO ()
-main = benchWith mkCekBM
+main = evaluate getEvalCtx *> benchWith mkCekBM


Does this definitely ensure that this is shared across the runs? It seems like it would be simpler to use https://hackage.haskell.org/package/criterion-1.6.0.0/docs/Criterion-Types.html#v:env

Ah, or perhaps it wasn't clear: the creation of the evaluation context has non-trivial cost, but is shared by the ledger between script evaluations. So it shouldn't be part of the benchmarked code.

Does this definitely ensure that this is shared across the runs?

getEvalCtx is a CAF and doesn't get inlined (it has a NOINLINE pragma and I've verified that it's not getting inlined by looking at the Core anyway). I can't imagine it not being reused, why would GHC ever recompute a non-functional monomorphic top-level value?

Maybe it should be evaluate (force getEvalCtx), though.

It seems like it would be simpler to use https://hackage.haskell.org/package/criterion-1.6.0.0/docs/Criterion-Types.html#v:env

Did you read the description? The function seems completely irrelevant.

Lol, yes, of course it should be evaluate (force getEvalCtx), there's that lazy Either.

Did you read the description? The function seems completely irrelevant.

Yes, and I've used it before. I thought the point of it was to clearly compute a value (it lets you use IO so you can do evaluate) before the benchmarking begins, and feed it into the individual benchmarks as a function argument. We could even share it across all the benchmarking runs, which would save us some time.

getEvalCtx is a CAF and doesn't get inlined (it has a NOINLINE pragma and I've verified that it's not getting inlined by looking at the Core anyway). I can't imagine it not being reused, why would GHC ever recompute a non-functional monomorphic top-level value?

This seems like far more complex reasoning than using a function argument and env? That guarantees it'll only be computed once.

I thought the point of it was to clearly compute a value (it lets you use IO so you can do evaluate) before the benchmarking begins, and feed it into the individual benchmarks as a function argument.

But it's not what this function is for!

Motivation. In earlier versions of criterion, all benchmark inputs were always created when a program started running. By deferring the creation of an environment when its associated benchmarks need the its, we avoid two problems that this strategy caused:

Memory pressure distorted the results of unrelated benchmarks. If one benchmark needed e.g. a gigabyte-sized input, it would force the garbage collector to do extra work when running some other benchmark that had no use for that input. Since the data created by an environment is only available when it is in scope, it should be garbage collected before other benchmarks are run.

The time cost of generating all needed inputs could be significant in cases where no inputs (or just a few) were really needed. This occurred often, for instance when just one out of a large suite of benchmarks was run, or when a user would list the collection of benchmarks without running any.

OK, forget about CAF etc. We only care about getEvalCtx being forced before benchmarking starts. evaluate does exactly that. Once it's forced, it can't be unforced and whatever they do with env isn't more reliable than that.

And getEvalCtx isn't an input to the benchmarks to even consider env. It's a value that the function being benchmarked uses under the hood, you can't feed the value to the function, it's already inside!

If you want to change evaluateCekLikeInProd to accept an EvaluationContext as an input, I don't mind that, in that case it would make more sense to call env, although as I said whatever lazy magic it does, we don't need or care about it, it's completely irrelevant for our use case, so even in that case it would make to force getEvalCtx manually.

If you want to change evaluateCekLikeInProd to accept an EvaluationContext as an input

Yes sorry, that's what I was suggesting. evaluateCekLikeInProd just calls getEvalCtx and then calls evaluateTerm. Note that that's actually less like the production version, which really does take it as a parameter and it's cached externally. This way we're relying on some complicated logic that we can force it here and then apparently recompute it inside a function and have it not repeat work. It seems much clearer to just... pass it as an argument!

OK then, sounds reasonable.

Feel like doing it yourself? I'm trying to give away as much non-costing-related work as possible.

Thanks a lot!

plutus-ledger-api/src/PlutusLedgerApi/Common/Eval.hs

michaelpj · 2023-03-10T10:53:45Z

plutus-benchmark/validation/Common.hs

+    :: Either
+            (UPLC.CekEvaluationException UPLC.NamedDeBruijn UPLC.DefaultUni UPLC.DefaultFun)
+            EvaluationContext
+getEvalCtx = do


I think it would be legit to just do mkDynEvaluationContext <version> PLC.defaultCostModelParams. Since we're trying to not include this in the benchmark run.

This is also fine, though.

Since we're trying to not include this in the benchmark run.

What "this"?

In any case, I wanted to stay as close to the production evaluator as possible and not do any "reasoning". getEvalCtx is a CAF that is forced before benchmarking starts and is used across all benchmarks, hence I don't see how it would distort the results (assuming I've fixed the forcing to go to NF rather than WHNF, but even the latter can't distort the results in a meaningful manner).

What "this"?

This function goes through the dance of computing the integers-only version of the cost model, and then feeding it back into the function that converts it into the original form. That's good, insofar as it mirrors the production version, except that this happens in the shared "compute the evaluation context" block, and so shouldn't be included in the benchmarks anyway. So it's fine to use the faster version that just creates the evaluation context without jumping through the hoops.

plutus-benchmark/validation/Common.hs

michaelpj · 2023-03-10T10:56:32Z

plutus-benchmark/validation/Common.hs

+            EvaluationContext
+getEvalCtx = do
+    costParams <-
+        maybe


@bezirg do we not have a function somewhere for going from CostModelParams to the list of integers?

We probably should even just so we can test that they round-trip.

Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>

effectfully · 2023-03-10T17:55:37Z

/benchmark plutus-benchmark:validation

github-actions · 2023-03-10T17:55:48Z

Click here to check the status of your benchmark.

github-actions · 2023-03-10T18:55:33Z

Comparing benchmark results of ' plutus-benchmark:validation' on '4497b2f7d' (base) and '098aed22f' (PR)

Results table

Script	`4497b2f`	`098aed2`	Change
auction_1-1	151.1 μs	166.7 μs	+10.3%
auction_1-2	654.7 μs	689.2 μs	+5.3%
auction_1-3	647.2 μs	673.1 μs	+4.0%
auction_1-4	200.2 μs	218.3 μs	+9.0%
auction_2-1	154.7 μs	169.7 μs	+9.7%
auction_2-2	658.0 μs	687.9 μs	+4.5%
auction_2-3	858.2 μs	902.2 μs	+5.1%
auction_2-4	642.3 μs	674.6 μs	+5.0%
auction_2-5	199.9 μs	218.4 μs	+9.3%
crowdfunding-success-1	184.7 μs	198.3 μs	+7.4%
crowdfunding-success-2	184.5 μs	197.9 μs	+7.3%
crowdfunding-success-3	185.3 μs	197.9 μs	+6.8%
currency-1	235.1 μs	245.4 μs	+4.4%
escrow-redeem_1-1	327.1 μs	350.9 μs	+7.3%
escrow-redeem_1-2	327.7 μs	350.2 μs	+6.9%
escrow-redeem_2-1	391.2 μs	414.9 μs	+6.1%
escrow-redeem_2-2	393.2 μs	413.0 μs	+5.0%
escrow-redeem_2-3	393.2 μs	410.9 μs	+4.5%
escrow-refund-1	137.8 μs	147.1 μs	+6.7%
future-increase-margin-1	235.5 μs	245.1 μs	+4.1%
future-increase-margin-2	519.4 μs	551.1 μs	+6.1%
future-increase-margin-3	518.7 μs	549.7 μs	+6.0%
future-increase-margin-4	492.7 μs	507.2 μs	+2.9%
future-increase-margin-5	856.1 μs	885.0 μs	+3.4%
future-pay-out-1	236.4 μs	245.3 μs	+3.8%
future-pay-out-2	520.3 μs	549.6 μs	+5.6%
future-pay-out-3	521.3 μs	549.6 μs	+5.4%
future-pay-out-4	853.8 μs	882.2 μs	+3.3%
future-settle-early-1	236.1 μs	246.0 μs	+4.2%
future-settle-early-2	518.7 μs	551.4 μs	+6.3%
future-settle-early-3	519.5 μs	552.5 μs	+6.4%
future-settle-early-4	640.9 μs	666.3 μs	+4.0%
game-sm-success_1-1	382.8 μs	401.7 μs	+4.9%
game-sm-success_1-2	173.1 μs	187.5 μs	+8.3%
game-sm-success_1-3	647.1 μs	674.2 μs	+4.2%
game-sm-success_1-4	198.8 μs	218.6 μs	+10.0%
game-sm-success_2-1	380.7 μs	399.3 μs	+4.9%
game-sm-success_2-2	172.1 μs	188.2 μs	+9.4%
game-sm-success_2-3	642.2 μs	673.3 μs	+4.8%
game-sm-success_2-4	197.8 μs	217.8 μs	+10.1%
game-sm-success_2-5	646.5 μs	669.4 μs	+3.5%
game-sm-success_2-6	198.5 μs	218.1 μs	+9.9%
multisig-sm-1	394.0 μs	413.8 μs	+5.0%
multisig-sm-2	385.0 μs	406.4 μs	+5.6%
multisig-sm-3	382.4 μs	410.5 μs	+7.3%
multisig-sm-4	390.4 μs	415.9 μs	+6.5%
multisig-sm-5	564.8 μs	589.5 μs	+4.4%
multisig-sm-6	388.3 μs	411.3 μs	+5.9%
multisig-sm-7	379.6 μs	405.8 μs	+6.9%
multisig-sm-8	379.9 μs	410.8 μs	+8.1%
multisig-sm-9	389.6 μs	412.0 μs	+5.7%
multisig-sm-10	561.4 μs	586.9 μs	+4.5%
ping-pong-1	319.9 μs	336.5 μs	+5.2%
ping-pong-2	321.3 μs	333.5 μs	+3.8%
ping-pong_2-1	184.4 μs	196.9 μs	+6.8%
prism-1	143.4 μs	157.1 μs	+9.6%
prism-2	399.3 μs	431.6 μs	+8.1%
prism-3	341.6 μs	356.2 μs	+4.3%
pubkey-1	120.9 μs	132.8 μs	+9.8%
stablecoin_1-1	964.0 μs	1.007 ms	+4.5%
stablecoin_1-2	167.4 μs	181.7 μs	+8.5%
stablecoin_1-3	1.096 ms	1.148 ms	+4.7%
stablecoin_1-4	174.6 μs	194.1 μs	+11.2%
stablecoin_1-5	1.381 ms	1.458 ms	+5.6%
stablecoin_1-6	218.6 μs	240.1 μs	+9.8%
stablecoin_2-1	958.5 μs	1.006 ms	+5.0%
stablecoin_2-2	167.5 μs	181.3 μs	+8.2%
stablecoin_2-3	1.092 ms	1.148 ms	+5.1%
stablecoin_2-4	175.1 μs	193.6 μs	+10.6%
token-account-1	172.8 μs	182.9 μs	+5.8%
token-account-2	313.2 μs	331.8 μs	+5.9%
uniswap-1	396.8 μs	422.8 μs	+6.6%
uniswap-2	202.5 μs	220.1 μs	+8.7%
uniswap-3	1.776 ms	1.857 ms	+4.6%
uniswap-4	290.9 μs	318.7 μs	+9.6%
uniswap-5	1.141 ms	1.201 ms	+5.3%
uniswap-6	278.7 μs	305.1 μs	+9.5%
vesting-1	342.0 μs	361.1 μs	+5.6%

effectfully · 2023-03-10T21:29:12Z

Hm, it's +6.39% now. Let's try again.

effectfully · 2023-03-10T21:29:16Z

/benchmark plutus-benchmark:validation

github-actions · 2023-03-10T21:29:25Z

Click here to check the status of your benchmark.

github-actions · 2023-03-10T22:29:13Z

Comparing benchmark results of ' plutus-benchmark:validation' on '4497b2f7d' (base) and '098aed22f' (PR)

Results table

Script	`4497b2f`	`098aed2`	Change
auction_1-1	150.9 μs	166.9 μs	+10.6%
auction_1-2	652.0 μs	685.7 μs	+5.2%
auction_1-3	646.3 μs	669.5 μs	+3.6%
auction_1-4	199.2 μs	218.0 μs	+9.4%
auction_2-1	154.5 μs	169.6 μs	+9.8%
auction_2-2	655.8 μs	684.3 μs	+4.3%
auction_2-3	849.4 μs	882.5 μs	+3.9%
auction_2-4	647.9 μs	665.1 μs	+2.7%
auction_2-5	200.6 μs	216.9 μs	+8.1%
crowdfunding-success-1	184.4 μs	198.0 μs	+7.4%
crowdfunding-success-2	185.5 μs	197.1 μs	+6.3%
crowdfunding-success-3	184.6 μs	197.6 μs	+7.0%
currency-1	235.5 μs	243.9 μs	+3.6%
escrow-redeem_1-1	327.6 μs	345.8 μs	+5.6%
escrow-redeem_1-2	327.8 μs	346.4 μs	+5.7%
escrow-redeem_2-1	391.0 μs	406.0 μs	+3.8%
escrow-redeem_2-2	391.6 μs	407.0 μs	+3.9%
escrow-redeem_2-3	390.8 μs	411.6 μs	+5.3%
escrow-refund-1	138.3 μs	146.1 μs	+5.6%
future-increase-margin-1	235.8 μs	243.4 μs	+3.2%
future-increase-margin-2	518.8 μs	544.8 μs	+5.0%
future-increase-margin-3	517.7 μs	545.4 μs	+5.4%
future-increase-margin-4	490.5 μs	502.4 μs	+2.4%
future-increase-margin-5	855.0 μs	871.7 μs	+2.0%
future-pay-out-1	235.5 μs	242.0 μs	+2.8%
future-pay-out-2	518.7 μs	543.4 μs	+4.8%
future-pay-out-3	517.9 μs	543.6 μs	+5.0%
future-pay-out-4	846.4 μs	869.0 μs	+2.7%
future-settle-early-1	235.7 μs	243.2 μs	+3.2%
future-settle-early-2	517.9 μs	544.4 μs	+5.1%
future-settle-early-3	517.6 μs	545.0 μs	+5.3%
future-settle-early-4	640.7 μs	653.6 μs	+2.0%
game-sm-success_1-1	380.4 μs	395.3 μs	+3.9%
game-sm-success_1-2	171.8 μs	186.2 μs	+8.4%
game-sm-success_1-3	644.4 μs	667.2 μs	+3.5%
game-sm-success_1-4	197.8 μs	216.4 μs	+9.4%
game-sm-success_2-1	379.5 μs	393.5 μs	+3.7%
game-sm-success_2-2	171.9 μs	186.8 μs	+8.7%
game-sm-success_2-3	644.8 μs	667.0 μs	+3.4%
game-sm-success_2-4	198.5 μs	216.7 μs	+9.2%
game-sm-success_2-5	648.9 μs	660.4 μs	+1.8%
game-sm-success_2-6	198.9 μs	216.1 μs	+8.6%
multisig-sm-1	392.6 μs	407.1 μs	+3.7%
multisig-sm-2	385.6 μs	399.2 μs	+3.5%
multisig-sm-3	382.5 μs	405.3 μs	+6.0%
multisig-sm-4	394.8 μs	408.5 μs	+3.5%
multisig-sm-5	564.9 μs	580.2 μs	+2.7%
multisig-sm-6	391.9 μs	405.2 μs	+3.4%
multisig-sm-7	384.9 μs	401.2 μs	+4.2%
multisig-sm-8	382.7 μs	405.4 μs	+5.9%
multisig-sm-9	392.7 μs	409.7 μs	+4.3%
multisig-sm-10	567.0 μs	580.2 μs	+2.3%
ping-pong-1	322.1 μs	330.4 μs	+2.6%
ping-pong-2	323.0 μs	334.5 μs	+3.6%
ping-pong_2-1	185.3 μs	195.3 μs	+5.4%
prism-1	144.5 μs	156.6 μs	+8.4%
prism-2	398.0 μs	418.3 μs	+5.1%
prism-3	341.6 μs	354.0 μs	+3.6%
pubkey-1	121.0 μs	132.3 μs	+9.3%
stablecoin_1-1	960.4 μs	994.9 μs	+3.6%
stablecoin_1-2	167.6 μs	184.6 μs	+10.1%
stablecoin_1-3	1.109 ms	1.133 ms	+2.2%
stablecoin_1-4	175.9 μs	191.7 μs	+9.0%
stablecoin_1-5	1.398 ms	1.432 ms	+2.4%
stablecoin_1-6	220.6 μs	237.2 μs	+7.5%
stablecoin_2-1	967.4 μs	989.8 μs	+2.3%
stablecoin_2-2	168.5 μs	179.8 μs	+6.7%
stablecoin_2-3	1.107 ms	1.134 ms	+2.4%
stablecoin_2-4	176.4 μs	191.9 μs	+8.8%
token-account-1	174.8 μs	181.3 μs	+3.7%
token-account-2	315.9 μs	328.6 μs	+4.0%
uniswap-1	399.9 μs	417.0 μs	+4.3%
uniswap-2	203.9 μs	215.5 μs	+5.7%
uniswap-3	1.793 ms	1.836 ms	+2.4%
uniswap-4	293.0 μs	313.3 μs	+6.9%
uniswap-5	1.154 ms	1.181 ms	+2.3%
uniswap-6	282.3 μs	303.8 μs	+7.6%
vesting-1	344.6 μs	360.1 μs	+4.5%

effectfully · 2023-03-10T22:37:18Z

+5.07%. That's a pretty big gap in benchmarking results TBH.

effectfully · 2023-03-11T03:23:46Z

Comments addressed, @michaelpj merge if you're OK with what's in here.

michaelpj · 2023-03-13T10:21:36Z

I'm happy modulo the thing about how we ensure that the evaluation context is shared. I just think we could establish it more simply.

bezirg · 2023-03-14T10:38:07Z

The only difference I can see is that:

before, in every benchmark run, the runCekDebruijn was called with the CAF defaultCekParameters :: MachineParameters
now, in every benchmark run, the runCekDeBruijn evaluates (toMachineParameters pv ectx)

I don't know if this is the culprit of the 5%. This shouldn't attribute to such a big difference.

effectfully · 2023-03-14T12:28:31Z

now, in every benchmark run, the runCekDeBruijn evaluates (toMachineParameters pv ectx)

It's basically zero-cost:

newtype EvaluationContext = EvaluationContext
    { machineParameters :: DefaultMachineParameters
    }

toMachineParameters :: ProtocolVersion -> EvaluationContext -> DefaultMachineParameters
toMachineParameters _ = machineParameters

before, in every benchmark run, the runCekDebruijn was called with the CAF defaultCekParameters :: MachineParameters

Now we have a different CAF: getEvalCtx.

…sectMBO#5200) * Benchmark the production code rather than some arbitrary thing * Apply suggestions from code review Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io> * Address comments --------- Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>

effectfully added bug Benchmarks labels Mar 9, 2023

Benchmark the production code rather than some arbitrary thing

d3b87a3

effectfully force-pushed the effectfully/benchmarking/stop-the-irrelevance branch from 23270c8 to d3b87a3 Compare March 9, 2023 16:05

michaelpj reviewed Mar 10, 2023

View reviewed changes

effectfully and others added 2 commits March 10, 2023 18:14

Apply suggestions from code review

b1436af

Co-authored-by: Michael Peyton Jones <michael.peyton-jones@iohk.io>

Address comments

098aed2

michaelpj merged commit 9ab93f4 into master Mar 13, 2023

michaelpj deleted the effectfully/benchmarking/stop-the-irrelevance branch March 13, 2023 11:24

effectfully mentioned this pull request Jun 28, 2023

Benchmark the production code fully rather than some arbitrary thing #5409

Merged

effectfully added No Changelog Required Add this to skip the Changelog Check and removed No Changelog Required Add this to skip the Changelog Check labels Jun 28, 2023

effectfully mentioned this pull request May 21, 2024

plutus-benchmark is slower than uplc despite doing the same? #5836

Open

Benchmark the production code rather than some arbitrary thing #5200

Benchmark the production code rather than some arbitrary thing #5200

Conversation

effectfully commented Mar 9, 2023

effectfully commented Mar 9, 2023

effectfully commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

effectfully commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

github-actions bot commented Mar 9, 2023

effectfully commented Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

effectfully commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

effectfully commented Mar 10, 2023

effectfully commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

github-actions bot commented Mar 10, 2023

effectfully commented Mar 10, 2023 • edited Loading

effectfully commented Mar 11, 2023

michaelpj commented Mar 13, 2023

bezirg commented Mar 14, 2023 • edited Loading

effectfully commented Mar 14, 2023

effectfully commented Mar 10, 2023 •

edited

Loading

effectfully commented Mar 10, 2023 •

edited

Loading

bezirg commented Mar 14, 2023 •

edited

Loading