This note reports on some experiments comparing different methods of serialising
TxOut
objects in on-chain Plutus Core code.
Plutus Core is a version of the untyped lambda calculus extended with a number of so-called "built-in types" (like integers and bytestrings) and "built-in functions" (or just builtins) operating on those types (for example, functions for performing arithmetic on integers and concatenating bytestrings). Built-in functions are implemented as Haskell functions which can be called from Plutus Core.
Validation code for Cardano smart contracts can be generated by writing Haskell code which is converted to Plutus Core using a compiler plugin. This process converts algebraic Haskell types into Plutus Core using the Scott encoding, which represents values of these types as (possibly large) lambda terms. The process of converting Haskell values to Plutus Core values is referred to as lifting.
In addition, Plutus Core contains a built-in type called BuiltinData
which is
a newtype wrapper round the Haskell type (see
plutus-core/plutus-core/src/PlutusCore/Data.hs
in the plutus
repository.
data Data =
Constr Integer [Data]
| Map [(Data, Data)]
| List [Data]
| I Integer
| B ByteString
This type can be used to encode algebraic types more efficiently than
the Scott encoding and is used for a number of purposes in the Cardano
blockchain, including passing redeemer objects and information about the current
transaction into on-chain validators. Typeclass methods called toBuiltinData
and
fromBuiltinData
are provided which can be used to convert Scott encoded datatypes
into BuiltinData
and vice-versa. These can be used on-chain, but are not built-in
functions: they are standard Haskell code translated into Plutus Core (but make
use of low-level builtins for constructing and destructing BuiltinData
objects).
Information about UTXOs can be passed to a validator using the TxOut
type (see
eg
plutus-ledger-api/src/Plutus/V1/Ledger/Tx.hs
in the plutus
repository) encoded as BuiltinData
. This will generally be
translated into the Scott form using fromBuiltinData
for processing within the
validator.
The Hydra team has a need to serialise TxOut
objects into bytestrings on the
chain, and to this end has implemented an on-chain CBOR encoder: see
plutus-cbor
in the hydra-poc
repository. However, Plutus Core now has a
built-in function called serialiseData
which can be used to convert
BuiltinData
objects into bytestrings (see PR
4447). The purpose of
this note is to compare this with the CBOR encoding library. The hydra-poc
repository contains some benchmarks for the CBOR library, and I backported
serialiseData
to work with the version of the Plutus interpreter in that
repository (the hydra-poc
code is based on a version of Plutus from September
2021, and it was too difficult to update it to work with the current version of
Plutus) and extended the benchmarks to include that as well. I also had to
re-run part of the cost model creation process in order to obtain an CPU time
model for serialiseData
on that version of the evaluator.
The main purpose of the existing benchmarks is to measure the cost of encoding
TxOut
objects on-chain, specifically to get some idea of how many objects can
be serialised in a reasonable time. There is an executable called
encoding-cost
which constructs complete validators which serialise
TxOut
objects and executes them using the actual ledger code, running the
Plutus Core evaluator to report the expected cost of running the code. There is
also a set of Criterion benchmarks called bench
which measures the
running time of the plutus-cbor
code vis-a-vis the execution time of
the cborg
library: note however that this involves the execution of
Haskell code, not Plutus Core code.
The benchmarks work with two types of data:
- (A) Lists of
TxOut
objects of varying lengths, each object containing some amount of Ada. - (B) Single
TxOut
objects containing multi-asset values with varying numbers of assets.
The Plutus infrastructure includes a cost model for Plutus Core which assigns
CPU and memory costs to Plutus Core programs (also called scripts). The
Plutus Core evaluator can be run in counting mode where it adds up the
costs of all of the operations it performs and returns the result to the user,
and it can also be run in restricting mode where a budget is supplied
in advance and the evaluator adds up costs during execution, terminating the
program if the given budget is exceeded. The idea is that programs (which are
fully deterministic) are run in counting mode off chain to obtain some cost
The cost model is based on benchmarks for each builtin run on a specific
reference machine, with one CPU cost unit representing 1 picosecond of execution
time on that machine. The cost of a script as given by the cost model is not an
exact measure of execution time, but should be roughly proportional to real
execution time (with different constants of proportionality on different
machines/architectures/operating systems). Tests with a large number of
realistic validation scripts (which are not themselves used in the generation of
the cost model) show that execution time as predicted by the cost model is
generally accurate to within ±20%. However, the serialiseData
function
is somewhat difficult to cost (see PR 4480, especially the last
comment), so there is some uncertainty about how accurate the cost model
predictions will be for scripts involving serialiseData
. The cost is based on
worst case behaviour, and may overestimate the cost of serialising TxOut
objects: part of the purpose of these experiments was to check that.
Memory costs are given in terms of 64-bit words, and are a rough approximation to the total amount of memory which we expect to be consumed by objects created by builtins during the execution of a script. We don't attempt to account for garbage collection or temporary memory used during the execution of a builtin, so the figures shouldn't be regarded as being hugely accurate (although they do contribute to script execution fees): their main purpose is to disallow scripts which consume very large amounts of memory.
I added two main extensions to the plutus-cbor
benchmarks:
-
The
encoding-cost
executable reports the execution costs of validators performing simulated on-chain serialisation using the plutus-cbor library; I added some extra code to perform serialisation by converting data on-chain (not literally on the blockchain, but entirely in Plutus Core) toBuiltinData
and then callingserialiseData
. The costs reported by this are costs according to the Plutus Core cost model, which are supposed to be an approximation of the time and memory usage of the program. -
I also added some Criterion benchmarks (
bench-plc
) which create Plutus Core scripts (not full validators) which just serialise some TxOut objects: the benchmarks report actual execution times on our benchmarking machine, and thus provide a check on the accuracy of the cost model (recall the comments earlier about difficulties assigning costs toserialiseData
). These benchmarks don't include all the extra validation overhead, but are still useful for comparison of different serialisation methods. Serialisation is performed in three different ways: the two mentioned above together with one which just measures the timeserialiseData
takes to serialiseBuiltinData
objects which have been created by convertingTxOut
objects off-chain (i.e., in Haskell: the resulting data objects are embedded in the scripts). This gives us some idea of the cost incurred by callingtoBuiltinData
on the chain (there's more on this later).
Both of these use TxOut
data of the types A and B mentioned earlier.
The results produced by the encoding-cost
executable for budgets (not actual
excution times) for full validators are shown below. The "mem" and "cpu"
columns show the total memory usage and CPU time according to the cost model.
The CPU times have been adjusted from picoseconds to microseconds by dividing by
encoding-cost
executable.
n | mem | cpu | %mem | %cpu |
---|---|---|---|---|
0 | 367164 | 135 μs | 0.16% | 0.07% |
10 | 2083761 | 919 μs | 10.56% | 5.28% |
20 | 3793414 | 1698 μs | 20.88% | 10.43% |
30 | 5517128 | 2485 μs | 31.34% | 15.66% |
40 | 7236890 | 3269 μs | 41.75% | 20.85% |
50 | 9014323 | 4084 μs | 52.77% | 26.38% |
60 | 10817658 | 4912 μs | 64.06% | 32.05% |
70 | 12569405 | 5712 μs | 74.79% | 37.41% |
80 | 14372964 | 6540 μs | 86.06% | 43.06% |
90 | 16159679 | 7358 μs | 97.15% | 48.60% |
100 | 18003565 | 8206 μs | 108.84% | 54.46% |
110 | 19796882 | 9027 μs | 119.99% | 60.02% |
120 | 21633695 | 9871 μs | 131.58% | 65.83% |
130 | 23481888 | 10720 μs | 143.30% | 71.69% |
140 | 25360284 | 11585 μs | 155.32% | 77.72% |
150 | 27213772 | 12437 μs | 167.08% | 83.59% |
n | mem | cpu | %mem | %cpu |
---|---|---|---|---|
0 | 359728 | 133 μs | 0.08% | 0.05% |
10 | 1501760 | 678 μs | 4.74% | 2.87% |
20 | 2643108 | 1219 μs | 9.38% | 5.64% |
30 | 3784456 | 1761 μs | 14.01% | 8.42% |
40 | 4925462 | 2301 μs | 18.64% | 11.17% |
50 | 6067836 | 2847 μs | 23.31% | 14.01% |
60 | 7210552 | 3394 μs | 27.99% | 16.88% |
70 | 8351558 | 3934 μs | 32.61% | 19.63% |
80 | 9493590 | 4479 μs | 37.27% | 22.45% |
90 | 10634938 | 5020 μs | 41.91% | 25.22% |
100 | 11777312 | 5566 μs | 46.58% | 28.06% |
110 | 12918318 | 6106 μs | 51.20% | 30.82% |
120 | 14060008 | 6649 μs | 55.85% | 33.61% |
130 | 15201698 | 7192 μs | 60.49% | 36.41% |
140 | 16343730 | 7737 μs | 65.15% | 39.23% |
150 | 17485078 | 8278 μs | 69.79% | 42.00% |
n | mem | cpu | %mem | %cpu |
---|---|---|---|---|
0 | 432372 | 165 μs | 0.41% | 0.18% |
10 | 1205297 | 546 μs | 6.39% | 3.43% |
20 | 1995028 | 937 μs | 12.54% | 6.77% |
30 | 2797416 | 1335 μs | 18.82% | 10.18% |
40 | 3593053 | 1728 μs | 25.03% | 13.54% |
50 | 4400901 | 2128 μs | 31.37% | 16.97% |
60 | 5209712 | 2527 μs | 37.71% | 20.39% |
70 | 6042110 | 2940 μs | 44.29% | 23.95% |
80 | 6872659 | 3351 μs | 50.85% | 27.49% |
90 | 7719599 | 3770 μs | 57.58% | 31.11% |
100 | 8562297 | 4187 μs | 64.26% | 34.71% |
110 | 9424324 | 4614 μs | 71.14% | 38.42% |
120 | 10273085 | 5034 μs | 77.88% | 42.05% |
130 | 11137359 | 5462 μs | 84.78% | 45.75% |
140 | 12020689 | 5899 μs | 91.87% | 49.56% |
150 | 12906430 | 6338 μs | 98.98% | 53.37% |
n | mem | cpu | %mem | %cpu |
---|---|---|---|---|
0 | 419492 | 164 μs | 0.28% | 0.18% |
10 | 760276 | 331 μs | 1.94% | 1.28% |
20 | 1101064 | 498 μs | 3.60% | 2.38% |
30 | 1441862 | 667 μs | 5.27% | 3.50% |
40 | 1782640 | 833 μs | 6.93% | 4.59% |
50 | 2123428 | 1000 μs | 8.59% | 5.69% |
60 | 2464208 | 1167 μs | 10.26% | 6.79% |
70 | 2805004 | 1335 μs | 11.92% | 7.90% |
80 | 3145788 | 1502 μs | 13.58% | 9.00% |
90 | 3486586 | 1670 μs | 15.25% | 10.11% |
100 | 3827370 | 1837 μs | 16.91% | 11.21% |
110 | 4168160 | 2005 μs | 18.58% | 12.32% |
120 | 4508932 | 2170 μs | 20.24% | 13.40% |
130 | 4849710 | 2336 μs | 21.90% | 14.49% |
140 | 5190504 | 2504 μs | 23.57% | 15.61% |
150 | 5531286 | 2670 μs | 25.23% | 16.70% |
The results suggest that using serialiseData
is about 66% of the cost of
using plutus-cbor
for lists of Ada-only TxOut
objects (in terms of both CPU
usage and memory), and that for single multi-asset TxOut
s the cost is about
45% . They should also give some idea of how many TxOut
s it will be feasible
to serialise within the time limits enforced by the chain. It should also be
taken into account that the current version of the Plutus Core evaluator
is about twice as fast as the one used in these experiments (and this is
reflected in the cost model), so we can expect some improvement on the
results above.
The results for the bench-plc
benchmarks are shown below; note that all of the
graphs have the same vertical scale. Execution times as predicted by the cost
model (obtained from encoding-cost
) are shown in blue and actual execution times of the benchmarks are
shown in red. The first two graphs in each group
show reasonable agreement between predictions and reality. These graphs
represent the same serialisation methods as for the encoding-cost
results,
which suggests that we can be reasonably confident about the accuracy of the
latter. For emphasis, we repeat that encoding-cost
uses full validators but
the bench-plc
scripts only perform serialisation: this is because it was too
difficult to obtain real execution times of full validator scripts.
Note though that in the final graph in each group the prediction is a
considerable overestimate of the actual execution time. Closer examination of
the data suggests that we are typically overestimating by a factor of 10--15.
The reason for this is probably that these benchmarks measure only the
execution time of serialiseData
whereas the second graph in each group also
includes the cost of toBuiltinData
, which involves traversing Scott-encoded data
structures to convert them into BuiltinData
. As mentioned earlier, the cost model
for serialiseData
is conservative because its worst-case behaviour, which
seems to occur for objects which contain large numbers of integers, is much
worse than its typical behaviour. The TxOut
objects which we're dealing with
here typically consist mostly of hashes (for which I think serialisation is
effectively the identity function) and so are cheaper than the worst case.
Having said that, I'm not entirely sure why this discrepancy doesn't throw the
results for the second graphs out more, but it is noticeable that those graphs
both show overestimated times but the first graphs (which don't involve
serialiseData
) show slightly underestimated times. It may be that the
discrepancy is just swallowed up by the larger overall times.
In order to call serialiseData
it is necessary to convert Scott-encoded data
structures into BuiltinData
objects, and this requires on-chain traversal of
potentially large objects: the results of the previous section suggest that this
can introduce quite a lot of overhead. But in the case of Hydra, where do these
objects come from? I don't know exactly, but they're presumably obtained from
the context object or the redeemer for the current transaction in the form of
BuiltinData
, which has to undergo conversion to Scott encoded data via the
fromBuiltinData
function, the inverse of toBuiltinData
. If this is the case
then it may be possible to get BuiltinData
encodings of TxOut
objects
directly from the context/redeemer and serialise those without having to call
toBuiltinData
on-chain, which could lead to a considerable decrease in
serialisation times.
The encoding-cost
executable also reports the serialised sizes of the scripts
in the previous section, and we record them here for completeness.
n | plutus-cbor |
serialiseData , toBuiltinData on-chain |
serialiseData , toBuiltinData off-chain |
---|---|---|---|
0 | 715 | 533 | 39 |
10 | 1455 | 1500 | 823 |
20 | 2226 | 2394 | 1535 |
30 | 3037 | 3288 | 2248 |
40 | 3770 | 4144 | 2926 |
50 | 4532 | 5146 | 3744 |
60 | 5243 | 6183 | 4597 |
70 | 6035 | 7041 | 5275 |
80 | 6787 | 8008 | 6058 |
90 | 7586 | 8901 | 6771 |
100 | 8335 | 9904 | 7589 |
110 | 9102 | 10761 | 8267 |
120 | 9793 | 11692 | 9015 |
130 | 10532 | 12621 | 9763 |
140 | 11316 | 13585 | 10546 |
150 | 12072 | 14479 | 11258 |
n | plutus-cbor |
serialiseData , toBuiltinData on-chain |
serialiseData , toBuiltinData off-chain |
---|---|---|---|
0 | 726 | 542 | 87 |
10 | 1692 | 1282 | 632 |
20 | 2586 | 2053 | 1212 |
30 | 3480 | 2864 | 1833 |
40 | 4336 | 3597 | 2369 |
50 | 5338 | 4359 | 2935 |
60 | 6375 | 5070 | 3451 |
70 | 7233 | 5862 | 4050 |
80 | 8200 | 6614 | 4606 |
90 | 9093 | 7413 | 5211 |
100 | 10096 | 8162 | 5763 |
110 | 10953 | 8929 | 6337 |
120 | 11884 | 9620 | 6832 |
130 | 12813 | 10359 | 7375 |
140 | 13777 | 11143 | 7964 |
150 | 14671 | 11899 | 8525 |
Somewhat surprisingly, for lists of Ada-only TxOut
s the serialised scripts for
serialiseData . toBuiltinData
are generally larger (by up to 20%) than those
for plutus-cbor
(which contain a compiled version of the library), and ones
where toBuiltinData
is performed "off-chain" are about 10 smaller than the
plutus-cbor
ones. For multi-asset scripts, the situation is different:
scripts using serialiseData . toBuiltinData
are about 20% smaller than the
plutus-cbor
ones and the ones where toBuiltinData
is performed off-chain are
typically about 60% of the size of the plutus-cbor ones. More
investigation/thought will be required to explain the discrepancy between the
Ada-only and the multi-asset scripts.