Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark the production code rather than some arbitrary thing #5200

Merged
merged 3 commits into from
Mar 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Fixed
-----

- Made the `validation` benchmarks use the actual production evaluator (#5200)
4 changes: 4 additions & 0 deletions plutus-benchmark/plutus-benchmark.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,8 @@ benchmark validation
, optparse-applicative
, plutus-benchmark-common
, plutus-core ^>=1.3
, plutus-ledger-api ^>=1.3
, transformers

---------------- validation-decode ----------------

Expand All @@ -291,6 +293,7 @@ benchmark validation-decode
, plutus-benchmark-common
, plutus-core ^>=1.3
, plutus-ledger-api ^>=1.3
, transformers

---------------- validation-full ----------------

Expand All @@ -311,6 +314,7 @@ benchmark validation-full
, plutus-benchmark-common
, plutus-core ^>=1.3
, plutus-ledger-api:{plutus-ledger-api, plutus-ledger-api-testlib} ^>=1.3
, transformers

---------------- Cek cost model calibration ----------------

Expand Down
6 changes: 3 additions & 3 deletions plutus-benchmark/validation/BenchCek.hs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module Main where

import Common
import Control.DeepSeq (force)
import Control.Exception
import Criterion
import PlutusBenchmark.Common
import UntypedPlutusCore as UPLC
Expand All @@ -16,11 +17,10 @@ import UntypedPlutusCore as UPLC
`cabal bench -- plutus-benchmark:validation --benchmark-options crowdfunding`.
-}
main :: IO ()
main = benchWith mkCekBM
main = evaluate (force getEvalCtx) *> benchWith mkCekBM
where
mkCekBM file program =
-- don't count the undebruijn . unflat cost
-- `force` to try to ensure that deserialiation is not included in benchmarking time.
let !nterm = force (toNamedDeBruijnTerm $ UPLC._progTerm $ unsafeUnflat file program)
in whnf unsafeEvaluateCekNoEmit' nterm

in whnf evaluateCekLikeInProd nterm
45 changes: 36 additions & 9 deletions plutus-benchmark/validation/Common.hs
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,25 @@
module Common (
benchWith
, unsafeUnflat
, unsafeEvaluateCekNoEmit'
, getEvalCtx
, evaluateCekLikeInProd
, peelDataArguments
, Term
) where

import PlutusPrelude

import PlutusBenchmark.Common (getConfig, getDataDir)
import PlutusBenchmark.NaturalSort

import PlutusCore qualified as PLC
import PlutusCore.Builtin qualified as PLC
import PlutusCore.Data qualified as PLC
import PlutusCore.Evaluation.Machine.ExBudgetingDefaults qualified as PLC
import PlutusCore.Evaluation.Machine.Exception
import PlutusCore.Evaluation.Result
import PlutusLedgerApi.Common (LedgerPlutusVersion (PlutusV1), evaluateTerm)
import PlutusLedgerApi.Common.Versions (languageIntroducedIn)
import PlutusLedgerApi.V3 (EvaluationContext, ParamName, VerboseMode (..), mkEvaluationContext)
import UntypedPlutusCore qualified as UPLC
import UntypedPlutusCore.Evaluation.Machine.Cek qualified as UPLC

Expand All @@ -24,6 +30,8 @@ import Criterion.Main.Options (Mode, parseWith)
import Criterion.Types (Config (..))
import Options.Applicative

import Control.Monad.Trans.Except
import Control.Monad.Trans.Writer.Strict
import Data.ByteString qualified as BS
import Data.List (isPrefixOf)
import Flat
Expand Down Expand Up @@ -128,13 +136,32 @@ benchWith act = do
env (BS.readFile $ dir </> file) $ \scriptBS ->
bench (dropExtension file) $ act file scriptBS

unsafeEvaluateCekNoEmit' :: UPLC.Term PLC.NamedDeBruijn PLC.DefaultUni PLC.DefaultFun () -> PLC.EvaluationResult (UPLC.Term PLC.NamedDeBruijn PLC.DefaultUni PLC.DefaultFun ())
unsafeEvaluateCekNoEmit' =
(\(e, _, _) -> unsafeExtractEvaluationResult e) .
UPLC.runCekDeBruijn
PLC.defaultCekParameters
UPLC.restrictingEnormous
UPLC.noEmitter
getEvalCtx
:: Either
(UPLC.CekEvaluationException UPLC.NamedDeBruijn UPLC.DefaultUni UPLC.DefaultFun)
EvaluationContext
getEvalCtx = do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be legit to just do mkDynEvaluationContext <version> PLC.defaultCostModelParams. Since we're trying to not include this in the benchmark run.

This is also fine, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're trying to not include this in the benchmark run.

What "this"?

In any case, I wanted to stay as close to the production evaluator as possible and not do any "reasoning". getEvalCtx is a CAF that is forced before benchmarking starts and is used across all benchmarks, hence I don't see how it would distort the results (assuming I've fixed the forcing to go to NF rather than WHNF, but even the latter can't distort the results in a meaningful manner).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "this"?

This function goes through the dance of computing the integers-only version of the cost model, and then feeding it back into the function that converts it into the original form. That's good, insofar as it mirrors the production version, except that this happens in the shared "compute the evaluation context" block, and so shouldn't be included in the benchmarks anyway. So it's fine to use the faster version that just creates the evaluation context without jumping through the hoops.

costParams <-
maybe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bezirg do we not have a function somewhere for going from CostModelParams to the list of integers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should even just so we can test that they round-trip.

(Left evaluationFailure)
(Right . take (length $ enumerate @ParamName) . toList)
PLC.defaultCostModelParams
either (const $ Left evaluationFailure) (Right . fst) . runExcept . runWriterT $
mkEvaluationContext costParams
{-# NOINLINE getEvalCtx #-}

-- | Evaluate a term as it would be evaluated using the on-chain evaluator.
evaluateCekLikeInProd
effectfully marked this conversation as resolved.
Show resolved Hide resolved
:: UPLC.Term PLC.NamedDeBruijn PLC.DefaultUni PLC.DefaultFun ()
-> Either
(UPLC.CekEvaluationException UPLC.NamedDeBruijn UPLC.DefaultUni UPLC.DefaultFun)
(UPLC.Term UPLC.NamedDeBruijn UPLC.DefaultUni UPLC.DefaultFun ())
evaluateCekLikeInProd term = do
evalCtx <- getEvalCtx
let (getRes, _, _) =
-- The validation benchmarks were all created from PlutusV1 scripts
evaluateTerm UPLC.restrictingEnormous (languageIntroducedIn PlutusV1) Quiet evalCtx term
getRes

type Term = UPLC.Term UPLC.DeBruijn UPLC.DefaultUni UPLC.DefaultFun ()

Expand Down
1 change: 1 addition & 0 deletions plutus-ledger-api/src/PlutusLedgerApi/Common.hs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ module PlutusLedgerApi.Common
-- * Script evaluation
, evaluateScriptCounting
, evaluateScriptRestricting
, evaluateTerm
, VerboseMode (..)
, LogOutput
, EvaluationError (..)
Expand Down
39 changes: 25 additions & 14 deletions plutus-ledger-api/src/PlutusLedgerApi/Common/Eval.hs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ module PlutusLedgerApi.Common.Eval
, VerboseMode (..)
, evaluateScriptRestricting
, evaluateScriptCounting
, evaluateTerm
, mkDynEvaluationContext
, toMachineParameters
, mkTermToEvaluate
Expand Down Expand Up @@ -132,6 +133,28 @@ mkDynEvaluationContext ver newCMP =
assertWellFormedCostModelParams :: MonadError CostModelApplyError m => Plutus.CostModelParams -> m ()
assertWellFormedCostModelParams = void . Plutus.applyCostModelParams Plutus.defaultCekCostModel

-- | Evaluate a fully-applied term using the CEK machine. Useful for mimicking the behaviour of the
-- on-chain evaluator.
evaluateTerm
effectfully marked this conversation as resolved.
Show resolved Hide resolved
:: UPLC.ExBudgetMode cost DefaultUni DefaultFun
-> ProtocolVersion
-> VerboseMode
-> EvaluationContext
-> UPLC.Term UPLC.NamedDeBruijn DefaultUni DefaultFun ()
-> ( Either
(UPLC.CekEvaluationException NamedDeBruijn DefaultUni DefaultFun)
(UPLC.Term UPLC.NamedDeBruijn DefaultUni DefaultFun ())
, cost
, [Text]
)
evaluateTerm budgetMode pv verbose ectx =
UPLC.runCekDeBruijn
(toMachineParameters pv ectx)
budgetMode
(if verbose == Verbose then UPLC.logEmitter else UPLC.noEmitter)
-- Just replicating the old behavior, probably doesn't matter.
{-# INLINE evaluateTerm #-}

{-| Evaluates a script, with a cost model and a budget that restricts how many
resources it can use according to the cost model. Also returns the budget that
was actually used.
Expand All @@ -152,14 +175,8 @@ evaluateScriptRestricting
-> (LogOutput, Either EvaluationError ExBudget)
evaluateScriptRestricting lv pv verbose ectx budget p args = swap $ runWriter @LogOutput $ runExceptT $ do
appliedTerm <- mkTermToEvaluate lv pv p args

let (res, UPLC.RestrictingSt (ExRestrictingBudget final), logs) =
UPLC.runCekDeBruijn
(toMachineParameters pv ectx)
(UPLC.restricting $ ExRestrictingBudget budget)
(if verbose == Verbose then UPLC.logEmitter else UPLC.noEmitter)
appliedTerm

evaluateTerm (UPLC.restricting $ ExRestrictingBudget budget) pv verbose ectx appliedTerm
tell logs
liftEither $ first CekError $ void res
pure (budget `minusExBudget` final)
Expand All @@ -181,14 +198,8 @@ evaluateScriptCounting
-> (LogOutput, Either EvaluationError ExBudget)
evaluateScriptCounting lv pv verbose ectx p args = swap $ runWriter @LogOutput $ runExceptT $ do
appliedTerm <- mkTermToEvaluate lv pv p args

let (res, UPLC.CountingSt final, logs) =
UPLC.runCekDeBruijn
(toMachineParameters pv ectx)
UPLC.counting
(if verbose == Verbose then UPLC.logEmitter else UPLC.noEmitter)
appliedTerm

evaluateTerm UPLC.counting pv verbose ectx appliedTerm
tell logs
liftEither $ first CekError $ void res
pure final