SCP-2417: Add builtin function: serialiseData #4447

bezirg · 2022-03-03T14:35:13Z

This adds the necessary boilerplate for a new builtin "serialiseData" as described in CIP-0036.

More specifically:

Declares a new builtin 'serialiseData
Adds it to next protocol v6
Adds golden tests
Adds stubs to be filled in after "costing" this builtin function
Adds builtin to metatheory Done by @jmchapman .

Pre-submit checklist:

Branch
- Tests are provided (if possible)
- Commit sequence broadly makes sense
- Key commits have useful messages
- Relevant tickets are mentioned in commit messages
- Formatting, materialized Nix files, PNG optimization, etc. are updated
PR
- (For external contributions) Corresponding issue exists and is linked in the description
- Self-reviewed the diff
- Useful pull request description
- Reviewer requested

jmchapman · 2022-03-04T14:25:29Z

I have added agda support for serialiseData. It was nice to see the NEAT tests picking up discrepencies between the different evaluators as I added the semantics.

However I haven't added the Haskell semantics of serialiseData and the tests still pass (I think, waiting for CI).

jmchapman · 2022-03-04T19:00:31Z

I tried cranking the NEAT tests up a notch to generate more terms but the term level tests hadn't terminated after 20 minutes :(

bezirg · 2022-03-08T10:29:41Z

However I haven't added the Haskell semantics of serialiseData and the tests still pass (I think, waiting for CI).

Do these semantics have to be added? Aren't all these builtin functions kind of "abstract" from Agda's point of view?

plutus-core/plutus-core/src/PlutusCore/Default/Builtins.hs

effectfully · 2022-03-09T15:14:44Z

plutus-core/plutus-core/src/PlutusCore/Default/Builtins.hs

@@ -969,6 +972,10 @@ instance uni ~ DefaultUni => ToBuiltinMeaning uni DefaultFun where
        makeBuiltinMeaning
            ((==) @Data)
            (runCostingFunTwoArguments . paramEqualsData)
+    toBuiltinMeaning SerialiseData =
+        makeBuiltinMeaning
+            (BS.toStrict . serialise @Data)


@michaelpj we could have a KnownTypeIn instance for lazy bytestrings. Probably not worth it though.

Indeed it is not worth it, because the most likely thing to do after calling serialiseData is to hash the result, e.g. sha3_256(serialiseData(I(3))). All our hash builtins (coming from cardano-crypto-class) go through strict bytestrings, since as @kwxm pointed out, they are foreign-imported from C.

Yeah, this seems fine.

jmchapman · 2022-03-10T09:27:34Z

However I haven't added the Haskell semantics of serialiseData and the tests still pass (I think, waiting for CI).

Do these semantics have to be added? Aren't all these builtin functions kind of "abstract" from Agda's point of view?

Yes, that's right. In Agda we just postulate the existence of them.

But, when the agda model is compiled to haskell the postulated builtins are compiled to the real haskell implementations. The fact the tests work without the supplying the implementation means the tests are not actually exercising this code. I think they may be getting applied but as the results are not used the ghc runtime never actually computes them.

kwxm · 2022-03-13T12:19:32Z

For the record, here's what the benchmark results for SerialiseData look like. We need a better generator for Data (there's an issue for this somewhere), but the results are still informative.

Firstly, here's the full set of data, with some very large objects.

The vertical scale here is in seconds, so some of those are taking quite a long time. The maximum size here is about 880,000.

If we zoom in on things of size less than 2,000, we get this:

The vertical scale is in microseconds, so we're getting up to about 1ms here.

It would be good if we had some idea of how big the things people will be serialising in real life are likely to be so we can see how usable this is likely to be. Will anyone want to serialise something containing information about hundreds of inputs, for example?

[We're getting this weird fan shape because of this issue of having a single measure of size for heterogeneous objects: two things of the same size can have very different structures and hence different serialisation costs. I'm planning to implement a new costing function inference procedure for Data which should give us a costing function that goes along the top edge of the fan: that will be safe but will overprice the serialisation operations that lie in the lower part of the plot.]

michaelpj · 2022-03-15T14:11:19Z

plutus-ledger-api/test/Spec/Builtins.hs

 import Data.Foldable (fold, for_)
 import Data.Map qualified as Map
 import Data.Set qualified as Set
 import Plutus.ApiCommon
 import Test.Tasty
 import Test.Tasty.HUnit

+serialiseDataEx :: CompiledCode Builtins.BuiltinByteString


I don't know what's going on with the nix stuff, but this package tries to avoid relying on the plugin. We do that for the other test cases by manually constructing PLC programs, look in Examples.hs. We could do that here, and then at least we don't have to figure out why things are broken...

I did what u said, let's hope that this fixes nix 🤞

btw, I am going to do a separate PR for the plutus-spec v3 to add serialiseData

michaelpj

LGTM, modulo the costing stuff.

plutus-core/cost-model/create-cost-model/CostModelCreation.hs

michaelpj · 2022-03-16T11:42:40Z

plutus-core/plutus-core/src/PlutusCore/Default/Builtins.hs

@@ -969,6 +972,10 @@ instance uni ~ DefaultUni => ToBuiltinMeaning uni DefaultFun where
        makeBuiltinMeaning
            ((==) @Data)
            (runCostingFunTwoArguments . paramEqualsData)
+    toBuiltinMeaning SerialiseData =
+        makeBuiltinMeaning
+            (BS.toStrict . serialise @Data)


Yeah, this seems fine.

plutus-core/plutus-core/src/PlutusCore/Default/Builtins.hs

ch1bo · 2022-03-21T12:09:46Z

It would be good if we had some idea of how big the things people will be serialising in real life are likely to be so we can see how usable this is likely to be. Will anyone want to serialise something containing information about hundreds of inputs, for example?

@kwxm Hydra could be a real-world example for this and our use case is serializing one or more TxOut, which in itself may vary in complexity based on the included Value. In our case, the cost for serializing outputs directly limits how big the UTxO in the Head can become. Right now this is limited to roughly 50 outputs or one output with 50 assets. It's hard to say how much we will need, as it greatly depends on the use case, but if we could get a couple hundred outputs serialized, that'd be great :)

On the data you show, the cost of all the constructed value looks linear, but with different linear factors? Do you have an idea how the upper values look like, i.e. what makes them more expensive / the costed size less accurate?

kwxm · 2022-03-22T10:18:45Z

On the data you show, the cost of all the constructed value looks linear, but with different linear factors? Do you have an idea how the upper values look like, i.e. what makes them more expensive / the costed size less accurate?

@ch1bo I'm just looking into that. See also #3619. I suspect that the reason that we see a number of separate straight lines is that our generator for Data isn't particularly good (there's an issue to fix that) and if we were creating samples with a greater variety of structures we'd see the fan shape being filled in more solidly. There are at least three factors we need to consider: the number of nodes in the object, the number of bytestrings (and how big they are), and the number of integers (and how big they are). I'm guessing that bytestrings are cheaper to serialise than integers, for example.

kwxm · 2022-03-26T17:20:47Z

On the data you show, the cost of all the constructed value looks linear, but with different linear factors? Do you have an idea how the upper values look like, i.e. what makes them more expensive / the costed size less accurate?

@ch1bo Take a look at the comments on #4480. I'm off next week, but I'll try to look at the cost of serialising TxOuts when I get back.

bezirg self-assigned this Mar 3, 2022

bezirg added the Don't look here yet label Mar 3, 2022

bezirg force-pushed the bezirg/serialiseData branch 5 times, most recently from 6f9a294 to 798ab1b Compare March 4, 2022 09:24

bezirg changed the title ~~SCP-2417: Add builtin serialiseData~~ SCP-2417: Add builtin function: serialiseData Mar 4, 2022

bezirg marked this pull request as ready for review March 4, 2022 09:33

bezirg requested review from kwxm and michaelpj March 4, 2022 09:33

bezirg removed the Don't look here yet label Mar 4, 2022

bezirg force-pushed the bezirg/serialiseData branch from da98594 to 3a2fe92 Compare March 8, 2022 12:18

effectfully reviewed Mar 9, 2022

View reviewed changes

bezirg force-pushed the bezirg/serialiseData branch from 3a2fe92 to 2220b1e Compare March 10, 2022 09:19

bezirg force-pushed the bezirg/serialiseData branch 3 times, most recently from dd8e893 to e69d00b Compare March 11, 2022 09:36

michaelpj reviewed Mar 15, 2022

View reviewed changes

bezirg force-pushed the bezirg/serialiseData branch from e69d00b to 586b880 Compare March 15, 2022 15:38

michaelpj approved these changes Mar 16, 2022

View reviewed changes

SCP-2417: Add builtin function: serialiseData

98215e8

bezirg force-pushed the bezirg/serialiseData branch from 586b880 to 98215e8 Compare March 16, 2022 18:18

michaelpj merged commit 0ed5700 into master Mar 17, 2022

bezirg deleted the bezirg/serialiseData branch March 18, 2022 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCP-2417: Add builtin function: serialiseData #4447

SCP-2417: Add builtin function: serialiseData #4447

bezirg commented Mar 3, 2022 •

edited

Loading

jmchapman commented Mar 4, 2022

jmchapman commented Mar 4, 2022 •

edited

Loading

bezirg commented Mar 8, 2022

effectfully Mar 9, 2022

bezirg Mar 10, 2022 •

edited

Loading

michaelpj Mar 16, 2022

jmchapman commented Mar 10, 2022 •

edited

Loading

kwxm commented Mar 13, 2022 •

edited

Loading

michaelpj Mar 15, 2022

bezirg Mar 15, 2022

bezirg Mar 15, 2022

bezirg Mar 15, 2022

michaelpj left a comment

michaelpj Mar 16, 2022

ch1bo commented Mar 21, 2022

kwxm commented Mar 22, 2022

kwxm commented Mar 26, 2022

SCP-2417: Add builtin function: serialiseData #4447

SCP-2417: Add builtin function: serialiseData #4447

Conversation

bezirg commented Mar 3, 2022 • edited Loading

jmchapman commented Mar 4, 2022

jmchapman commented Mar 4, 2022 • edited Loading

bezirg commented Mar 8, 2022

effectfully Mar 9, 2022

Choose a reason for hiding this comment

bezirg Mar 10, 2022 • edited Loading

Choose a reason for hiding this comment

michaelpj Mar 16, 2022

Choose a reason for hiding this comment

jmchapman commented Mar 10, 2022 • edited Loading

kwxm commented Mar 13, 2022 • edited Loading

michaelpj Mar 15, 2022

Choose a reason for hiding this comment

bezirg Mar 15, 2022

Choose a reason for hiding this comment

bezirg Mar 15, 2022

Choose a reason for hiding this comment

bezirg Mar 15, 2022

Choose a reason for hiding this comment

michaelpj left a comment

Choose a reason for hiding this comment

michaelpj Mar 16, 2022

Choose a reason for hiding this comment

ch1bo commented Mar 21, 2022

kwxm commented Mar 22, 2022

kwxm commented Mar 26, 2022

bezirg commented Mar 3, 2022 •

edited

Loading

jmchapman commented Mar 4, 2022 •

edited

Loading

bezirg Mar 10, 2022 •

edited

Loading

jmchapman commented Mar 10, 2022 •

edited

Loading

kwxm commented Mar 13, 2022 •

edited

Loading