Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCP-2417: Add builtin function: serialiseData #4447

Merged
merged 1 commit into from
Mar 17, 2022
Merged

Conversation

bezirg
Copy link
Contributor

@bezirg bezirg commented Mar 3, 2022

This adds the necessary boilerplate for a new builtin "serialiseData" as described in CIP-0036.

More specifically:

  • Declares a new builtin 'serialiseData
  • Adds it to next protocol v6
  • Adds golden tests
  • Adds stubs to be filled in after "costing" this builtin function
  • Adds builtin to metatheory Done by @jmchapman .

Pre-submit checklist:

  • Branch
    • Tests are provided (if possible)
    • Commit sequence broadly makes sense
    • Key commits have useful messages
    • Relevant tickets are mentioned in commit messages
    • Formatting, materialized Nix files, PNG optimization, etc. are updated
  • PR
    • (For external contributions) Corresponding issue exists and is linked in the description
    • Self-reviewed the diff
    • Useful pull request description
    • Reviewer requested

@bezirg bezirg self-assigned this Mar 3, 2022
@bezirg bezirg force-pushed the bezirg/serialiseData branch 5 times, most recently from 6f9a294 to 798ab1b Compare March 4, 2022 09:24
@bezirg bezirg changed the title SCP-2417: Add builtin serialiseData SCP-2417: Add builtin function: serialiseData Mar 4, 2022
@bezirg bezirg marked this pull request as ready for review March 4, 2022 09:33
@bezirg bezirg requested review from kwxm and michaelpj March 4, 2022 09:33
@jmchapman
Copy link
Contributor

I have added agda support for serialiseData. It was nice to see the NEAT tests picking up discrepencies between the different evaluators as I added the semantics.

However I haven't added the Haskell semantics of serialiseData and the tests still pass (I think, waiting for CI).

@jmchapman
Copy link
Contributor

jmchapman commented Mar 4, 2022

I tried cranking the NEAT tests up a notch to generate more terms but the term level tests hadn't terminated after 20 minutes :(

@bezirg
Copy link
Contributor Author

bezirg commented Mar 8, 2022

However I haven't added the Haskell semantics of serialiseData and the tests still pass (I think, waiting for CI).

Do these semantics have to be added? Aren't all these builtin functions kind of "abstract" from Agda's point of view?

@@ -969,6 +972,10 @@ instance uni ~ DefaultUni => ToBuiltinMeaning uni DefaultFun where
makeBuiltinMeaning
((==) @Data)
(runCostingFunTwoArguments . paramEqualsData)
toBuiltinMeaning SerialiseData =
makeBuiltinMeaning
(BS.toStrict . serialise @Data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelpj we could have a KnownTypeIn instance for lazy bytestrings. Probably not worth it though.

Copy link
Contributor Author

@bezirg bezirg Mar 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is not worth it, because the most likely thing to do after calling serialiseData is to hash the result, e.g. sha3_256(serialiseData(I(3))). All our hash builtins (coming from cardano-crypto-class) go through strict bytestrings, since as @kwxm pointed out, they are foreign-imported from C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems fine.

@jmchapman
Copy link
Contributor

jmchapman commented Mar 10, 2022

However I haven't added the Haskell semantics of serialiseData and the tests still pass (I think, waiting for CI).

Do these semantics have to be added? Aren't all these builtin functions kind of "abstract" from Agda's point of view?

Yes, that's right. In Agda we just postulate the existence of them.

But, when the agda model is compiled to haskell the postulated builtins are compiled to the real haskell implementations. The fact the tests work without the supplying the implementation means the tests are not actually exercising this code. I think they may be getting applied but as the results are not used the ghc runtime never actually computes them.

@bezirg bezirg force-pushed the bezirg/serialiseData branch 3 times, most recently from dd8e893 to e69d00b Compare March 11, 2022 09:36
@kwxm
Copy link
Contributor

kwxm commented Mar 13, 2022

For the record, here's what the benchmark results for SerialiseData look like. We need a better generator for Data (there's an issue for this somewhere), but the results are still informative.

Firstly, here's the full set of data, with some very large objects.
Deserialisation1
The vertical scale here is in seconds, so some of those are taking quite a long time. The maximum size here is about 880,000.

If we zoom in on things of size less than 2,000, we get this:
Deserialisation2

The vertical scale is in microseconds, so we're getting up to about 1ms here.

It would be good if we had some idea of how big the things people will be serialising in real life are likely to be so we can see how usable this is likely to be. Will anyone want to serialise something containing information about hundreds of inputs, for example?

[We're getting this weird fan shape because of this issue of having a single measure of size for heterogeneous objects: two things of the same size can have very different structures and hence different serialisation costs. I'm planning to implement a new costing function inference procedure for Data which should give us a costing function that goes along the top edge of the fan: that will be safe but will overprice the serialisation operations that lie in the lower part of the plot.]

import Data.Foldable (fold, for_)
import Data.Map qualified as Map
import Data.Set qualified as Set
import Plutus.ApiCommon
import Test.Tasty
import Test.Tasty.HUnit

serialiseDataEx :: CompiledCode Builtins.BuiltinByteString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's going on with the nix stuff, but this package tries to avoid relying on the plugin. We do that for the other test cases by manually constructing PLC programs, look in Examples.hs. We could do that here, and then at least we don't have to figure out why things are broken...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did what u said, let's hope that this fixes nix 🤞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay! LGTM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I am going to do a separate PR for the plutus-spec v3 to add serialiseData

Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the costing stuff.

@@ -969,6 +972,10 @@ instance uni ~ DefaultUni => ToBuiltinMeaning uni DefaultFun where
makeBuiltinMeaning
((==) @Data)
(runCostingFunTwoArguments . paramEqualsData)
toBuiltinMeaning SerialiseData =
makeBuiltinMeaning
(BS.toStrict . serialise @Data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems fine.

plutus-core/plutus-core/src/PlutusCore/Default/Builtins.hs Outdated Show resolved Hide resolved
@michaelpj michaelpj merged commit 0ed5700 into master Mar 17, 2022
@bezirg bezirg deleted the bezirg/serialiseData branch March 18, 2022 10:53
@ch1bo
Copy link

ch1bo commented Mar 21, 2022

It would be good if we had some idea of how big the things people will be serialising in real life are likely to be so we can see how usable this is likely to be. Will anyone want to serialise something containing information about hundreds of inputs, for example?

@kwxm Hydra could be a real-world example for this and our use case is serializing one or more TxOut, which in itself may vary in complexity based on the included Value. In our case, the cost for serializing outputs directly limits how big the UTxO in the Head can become. Right now this is limited to roughly 50 outputs or one output with 50 assets. It's hard to say how much we will need, as it greatly depends on the use case, but if we could get a couple hundred outputs serialized, that'd be great :)

On the data you show, the cost of all the constructed value looks linear, but with different linear factors? Do you have an idea how the upper values look like, i.e. what makes them more expensive / the costed size less accurate?

@kwxm
Copy link
Contributor

kwxm commented Mar 22, 2022

On the data you show, the cost of all the constructed value looks linear, but with different linear factors? Do you have an idea how the upper values look like, i.e. what makes them more expensive / the costed size less accurate?

@ch1bo I'm just looking into that. See also #3619. I suspect that the reason that we see a number of separate straight lines is that our generator for Data isn't particularly good (there's an issue to fix that) and if we were creating samples with a greater variety of structures we'd see the fan shape being filled in more solidly. There are at least three factors we need to consider: the number of nodes in the object, the number of bytestrings (and how big they are), and the number of integers (and how big they are). I'm guessing that bytestrings are cheaper to serialise than integers, for example.

@kwxm
Copy link
Contributor

kwxm commented Mar 26, 2022

On the data you show, the cost of all the constructed value looks linear, but with different linear factors? Do you have an idea how the upper values look like, i.e. what makes them more expensive / the costed size less accurate?

@ch1bo Take a look at the comments on #4480. I'm off next week, but I'll try to look at the cost of serialising TxOuts when I get back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants