Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Marlowe chain agnostic #182

Closed
wants to merge 33 commits into from

Conversation

edsko
Copy link
Contributor

@edsko edsko commented Jul 8, 2022

In the literature, Marlowe is described as being agnostic to a particular choice of blockchain, be it UTxO based, account based, or otherwise. This is mostly true for the implementation, but not quite. There are a few places where the Marlowe semantics refer to Cardano specific concepts; specifically, the Cardano-specific notion of a Token, and the Cardano-specific notion of identity (PubKeyHash). I am in the process of modifying the implementation so it abstracts over these concepts; of course, the validation function and off-chain could would still be specific to Cardano, but the core types and the core semantics would not be. This PR is work in progress; I'm not sure if this generalization would be welcomed by the Marlowe team. If not, of course feel free to say so and I will close this PR again. But if it is, then I will update this PR when I'm finished and make it ready for review.

edsko added 19 commits July 7, 2022 14:43
This completes parameterization of the core types module.
This completes parameterization of the types in the semantics module.
This required a few type annotations in the test suite to avoid
ambiguity.
This paves the way for the next step, which is to many the Marlowe
implementation independent from a larger-specific concept of "money". It
wasn't clear to me if `AccountsDiff` should be defined in terms of the
_ledger_ concept of money or the (to be introduced) _Marlowe_ concept,
but then it turned out this code is not used at all? Can of course
re-introduce if it's needed after all, but then we need to decide which
concept of money is appropriate here.
@edsko edsko force-pushed the edsko/parameterize-contract branch 2 times, most recently from 6d1c930 to dabf7e6 Compare July 11, 2022 10:59
@edsko
Copy link
Contributor Author

edsko commented Jul 11, 2022

Ok, this PR is now in principle complete, although I need to do a second generalization too (identity). I've updated all dependencies of marlowe also, although I've not attempted to generalize most of them (i.e., most of them refer to Token specifically). We may wish to introduce some type synonyms for Contract Token and co (or indeed rename Contract and then let type Contract = Contract_ Token or whatever). Wasn't sure what the most convenient would be, so haven't done that yet. Happy to address that if there a specific request from you guys.

(I am aware that this generalization might not be what you want at all... don't mean to be presumptuous, just keeping this PR up to date.)

@edsko
Copy link
Contributor Author

edsko commented Jul 11, 2022

Ok, introduced type synonym for Value (and Case). The old type is now Value_, and the new Value is defined as

type Value t = Value_ (Observation t) t

This avoids the awkward duplication that was happening everywhere, which I'd argue is useful even without the gneralization of this PR. It doesn't solve the type synonym problem for the non-generalized version, of course.

edsko added 7 commits July 13, 2022 14:12
The Marlowe semantics was using `Value` from `Ledger`, but this fixes
the token type (it hardcodes a currency symbol/string combination). In
this commit we introduce a ledger-independent `Money` type (essentially
a `Map Token Integer`) and use this instead, translating to `Value` only
when interacting with the ledger at the periphery.
The change to the test suite is just a few type annotations to choose
which `Token` type we want.
(with the exception of the definition of `ada`)
This isn't particularly useful within `Scripts` (code explicitly
designed to be run on the Cardano chain will of course remain to use
`Token`), but this type is also used in some dependencies; specifically,
it's useful in the (generalization of) the merkleization support in the
CLI.
edsko added 7 commits July 13, 2022 14:24
I did not attempt to make the `Request` type in symbolic
`Token`-agnostic; perhaps it can be, I'm not sure.
The repeated use of

```haskell
Value (Observation Token) Token
```

(as opposed to simply `Value Token`) is a bit frustrating, especially
since the only reason for Value `being` polymorphic is a limitation of
the Plutus compiler. I've not tried to address this for now, because we
probably want to think about some type synonyms anyway that hide the
`Token` parameter (and others).
I've not attempted to make this `Token` agnostic, with the exception of
the Merkleization support.

The CLI defines a few orphan instances for `ToJSON`/`FromJSON`. I've
added an instance here for `Money`; we could instead add the instance
where `Money` is defined, but by defining it specifically for `Money
Token` (rather than `Money t`), we can define it in terms of
`moneyToValue`/`moneyFromValue`, which means this doesn't change the
JSON format. (It would be possible of course to give an instance for
`Money t` _and_ for `Money Token` in `marlowe` using overlapping
instances, but overlapping instances is not a great thing to depend on.)
The old value is now `Value_`, and the new `Value` is defined as

```haskell
type Value t = Value_ (Observation t) t
```

This avoids the awkward duplication that was happening everywhere.
Specifically, do not reuse `Party` (and the `AccountId` synonym) and
`ChoiceId`. This paves the way for the next step, which is generalizing
core Marlowe over identity.

Extended Marlowe is the language most users deal with, so by not
touching that we minimize the impact that the generalization has
downstream.

This also introduces

```haskell
-- | Injection for types where this is total
class ToCore' a b where
  toCore' :: a -> b
```

to go alongside `ToCore`, to avoid having to deal with
actually-impossible errors. `Party` and `ChoiceId` have instances of
`ToCore'`, as well as the existing duplicated type `Payee`.
Since this follows much the same path as the generalization for the
token parameter took, have not split this into separate commits.
@edsko edsko force-pushed the edsko/parameterize-contract branch from 0e33bf6 to 31abf2f Compare July 13, 2022 12:37
@edsko
Copy link
Contributor Author

edsko commented Jul 13, 2022

Ok, I have now also generalized away from identity (and fixed the test failure that CI was reporting, which was due to Plutus Maps not really being maps at all..).

Note that I have not generalized the "Extended Marlowe" language, which I think is what most users actually see? So my hope is that the impact of this generalization downstream is relatively small.

@edsko
Copy link
Contributor Author

edsko commented Jul 14, 2022

I'm seeing the that the tests are failing due to the validator being too large. Not sure what's causing this, I didn't change any code, merely signatures. Conjecture: some dictionaries will need to be passed in, could that push it over the edge? Anyway, will leave this until I know that this is something the Marlowe team even thinks is a good idea or not.

@bwbush bwbush requested review from palas and hrajchert July 14, 2022 16:47
@bwbush
Copy link
Collaborator

bwbush commented Jul 14, 2022

I'm seeing the that the tests are failing due to the validator being too large.

@edsko, changing the signatures likely caused the validator size to increase by about 3kB. We'll need to look into mitigating this. BTW, the Marlowe private testnet has more generous protocol parameters, so we could do on-chain testing there.

@palas, @hrajchert, @yveshauser, the change proposed in this PR is reminiscent of my Lean4 implementation, which aggressively parameterized the Marlowe types. I'm wondering if in the long term the Isabelle and Haskell in https://github.com/input-output-hk/marlowe should take this approach, but perhaps (due to validator size etc.) the Haskell in https://github.com/input-output-hk/marlowe-cardano might need to be bound to the concrete types for Cardano?

@edsko
Copy link
Contributor Author

edsko commented Jul 15, 2022

changing the signatures likely caused the validator size to increase by about 3kB. We'll need to look into mitigating this. BTW, the Marlowe private testnet has more generous protocol parameters, so we could do on-chain testing there.

@bwbush This is rather surprising though, no? After all, Plutus core is untyped, making more types polymorphic should not make a significant difference. My only guess so far is that there are some dictionaries being passed around that are resulting in more size? (Total guess, no evidence for it.) Or perhaps the Plutus compiler just does something slightly different when code is polymorphic? Do you have an intuition as to why this is making such a big difference?

I can take a closer look myself if you guys think that the generalization in this PR is useful.

(I actually want to generalize the on-chain client too, so that it can have different kinds of rules for different kinds of tokens; and I'd like to make the core contract language extensible. But those generalizations are a bit more localized, and I figured the one in this PR at least matches the spirit of Marlowe, as it makes the language chain independent.)

@edsko
Copy link
Contributor Author

edsko commented Jul 15, 2022

I did some initial investigation, measuring the size of the validator, wrapper, and params separately. Out of curiosity, I also measured the size of the Plutus code after running it through the Plutonomy optimizer for Plutus. The results for main are

## Typed

scriptSize: 14677

validator: 10876 (optimized  8832)
wrapper  :  4373 (optimized  3319)
params   :    50 (optimized    39)
total    : 15299 (optimized 12190, 0.80%)

## Untyped

scriptSize: 12582

# edsko/parameterize-contract

## Typed

scriptSize: 17748 (increase of 3071)

validator: 13473 (optimized 11086)
wrapper  : 4789  (optimized  3703)
params   : 50    (optimized    39)
total    : 18312 (optimized 14828, 0.81%)

## Untyped

scriptSize: 15204 (increase of 2622)

(Here scriptSize is the measurement as it is done in the existing size tests; this doesn't quite line up with measuring the components, I guess some optimization is happening when applying them.)

Two observations at this point:

  • The size of the "untyped" validator is 15204 after this PR, which is only just over the limit set by the test (15200). The limit is exceeded by quite a bit more for the typed validator, but I think that's mostly because the limit for the untyped validator was less strict. Initially I thought that the difference in how much the limit is exceeded was giving a clue about the cause of the problem, but I now think this is a red herring.
  • Plutonomy can reduce the Marlowe code by about 20% (in its default configuration), which isn't immediately relevant for this PR but worth keeping in mind.

I don't yet know exactly where the growth is coming from; I will try and take a look at that next.

@hrajchert
Copy link
Collaborator

I think the repository that needs to be blockchain agnostic is https://github.com/input-output-hk/marlowe. We are in the process of updating the Marlowe specification and proofs, as they are a little bit outdated (doesn't have Extended Marlowe, still uses Slot instead of time and doesn't have merkelization). As part of the update we are planning to do a test oracle that test that a particular implementation (for example marlowe-cardano) is compliant with the blockchain agnostic semantics.

@edsko
Copy link
Contributor Author

edsko commented Jul 15, 2022

@hrajchert My understanding though is that marlowe (the repo) does not have a Plutus implementation? Ideally we'd like a single definition that runs on Cardano and elsewhere.

@bwbush
Copy link
Collaborator

bwbush commented Jul 15, 2022

@edsko, there are artifacts of typing that persist even into untyped plutus core (UPLC), so that may account for the growth in validator size. My understanding is that the residue of type erasure [1] [2] is the main source of artifacts, but there could also be artifacts from the TH, GHC, PIR, PLC, and UPLC layers of compilation.

Eventually, Marlowe may take advantage of reference scripts, in which case validator size will be less of an issue. That would let most of the transaction bytes be available for the Marlowe contract and state (stored in Datum) instead of using that for the Marlowe validator script. Validator size would still be an issue for execution memory, and that could become a problem in transactions where ScriptContext is very large.

Regarding extensibility, I personally often think of Core Marlowe as an "assembly language", with either code generation (currently via Haskell or JavaScript, but really via any language that can write JSON) or a DSL (currently, Extended Marlowe or Faustus) instantiating the Core Marlowe contract. The current protocol limits for execution memory are more stressful to Marlowe than the limits for execution steps, so it is advantageous to keep the language small, even at the expense of lengthier execution. (In most cases, Merkleization keeps the on-chain contract fairly small.)

@edsko
Copy link
Contributor Author

edsko commented Jul 22, 2022

I'm taking a closer look at where some of this extra size comes from; indeed, delay/force arguments arising from type arguments is one source, as you already point out, but I've identified at least one other (IntersectMBO/plutus#4782). I will continue looking at this a bit longer.

@edsko
Copy link
Contributor Author

edsko commented Aug 4, 2022

Ok, I looked more at this, and wrote a blog post with some of my findings; the final section is of particular relevance I guess: https://well-typed.com/blog/2022/08/plutus-cores/#the-cost-of-polymorphism .

FWIW, I tried moving the calls to compile into a separate module from the definition of rolePayoutScript, smallTypedValidator, smallUntypedValidator; this will give the ghc optimizer a chance to look at the code before the Plutus compiler plugin does. However, as I already anticipated in the blog post, this actually made things worse; 18318/16286 (two separate scripts) instead of 17748/15204, so roughly increase of 3%/7% respectively. Also tried adding -fspecialize-aggressively, which seemed to have zero effect; also tried adding -O1 in addition, which made it worse still 18675/16660. This is somewhat disappointing, because I thought part of the problem of my generalization of Marlowe was type class dictionaries being passed around, and in small scale tests -fspecialize-aggressively does manage to get rid of that. But I guess in the context of the larger script, it also has other consequences elsewhere.

I will come back to this again, but might be a little while before I do.

@edsko
Copy link
Contributor Author

edsko commented Sep 9, 2022

Ok, opened a PR against marlowe at input-output-hk/marlowe#135 that attempts the same generalization there. We are still experimenting, but it's probably much easier there, and if it works for us, then I will close this PR for now. (I was halfway rebasing this PR, when I realized that due to some changes the marlowe repo might now work for us).

@edsko
Copy link
Contributor Author

edsko commented Sep 13, 2022

Ok, closing this in favour of input-output-hk/marlowe#135 .

@edsko edsko closed this Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants