Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Candid: Make services self-describing #1510

Closed
wants to merge 1 commit into from

Conversation

nomeata
Copy link
Collaborator

@nomeata nomeata commented May 15, 2020

We always had the vision that you can just take a canister ID and find
out what it's interface is. This is crucial for use-cases like

  • importing external canisters in your code (where dfx would fetch
    the .did file and provide it to moc)
  • @chenyan-dfinity’s Candid interface should work for all canisters
  • command line tools can provide an interface for all canisters
    with auto-complete etc.

Why don’t we have this already since months? The main reason is that for
a long while I expected this to fall under “static front-end assets”,
which I imagined would be uploaded by dfx, separately from the
.wasm, and served by the system, independent of WasmCode, in a special
“static content serving mode”, and the IDL would just be one of these
files (with a well-known name). One reason for this design was that we
thought this was the only way to make access to these assets
trustworthy.

The feature of front-end assets has been contentious, prone to feature
creep, and got entangled with other issues (like, how to return data
from queries in trustworthy ways). So nothing moved.

But recently things started to move there again. We have a vision for
how general queries can actually be trustworthy, and are leaning towards
moving the frontend handling into the canister, and allow canisters to
somehow fuel HTTP endpoints.

And this led me to the conclusion that the candid definition is not a
ront-end asset, and we can and should just design the “canister can
indicate its own interface” separately.

This proposal is a very simple idea: A canister, by convention, reports
its IDL via

candid_interface : () -> (Text) query

For Motoko, this would happen automatically.

Benefits of this design:

  • We can implement it right away.

  • The interface is bundled with the .wasm. No more complex steps to
    be taken by dfx to create the .did file, keep it in sync, and
    somehow bundle it with the .wasm upon uploading.

    (We still want moc --idl for local development though, lest we
    write a tool that simulates the IC System API and locally calls
    candid_interface().)

  • It can be dynamic, if a canister changes its interface without
    redeployment.

    • Maybe some methods only become available after some flag got set?
    • Or we have a completely dynamic canister – I recently played around
      with a Python canister, and there I would expect to
      upload some code using an regular call, and the interface should
      match the installed code.
    • Or maybe we have a proxy canister that fetches its candid from the
      canister it proxies.
  • As @rossberg points out: Candid is not actually tied to the IC
    system, and could be used in other “service RPC” contexts as well.

    The presented design will neatly apply to other environments as well.

  • Treating this different from front-end assets is sensible:

    • Front-end assets need to be reachable via HTTP, it seems. But the
      candid interface not: Any tool that cares about Candid is able to
      speak Candid and IC.

    • Front-end assets may be developed and deployed independently from
      the backend. The Candid interface is tied to the canister.

  • Candid messages are self-describing, so Candid services ought to be too.

We always had the vision that you can just take a canister ID and find
out what it's interface is. This is crucial for use-cases like
 * importing external canisters in your code (where `dfx` would fetch
   the `.did` file and provide it to `moc`)
 * @chenyan-dfinity’s Candid interface should work for all canisters
 * command line tools can provide an interface for all canisters
   with auto-complete etc.

Why don’t we have this already since months? The main reason is that for
a long while I expected this to fall under “static front-end assets”,
which I imagined would be uploaded by `dfx`, separately from the
`.wasm`, and served by the system, independent of WasmCode, in a special
“static content serving mode”, and the IDL would just be one of these
files (with a well-known name). One reason for this design was that we
thought this was the only way to make access to these assets
trustworthy.

The feature of front-end assets has been contentious, prone to feature
creep, and got entangled with other issues (like, how to return data
from queries in trustworthy ways). So nothing moved.

But recently things started to move there again. We have a vision for
how general queries can actually be trustworthy, and are leaning towards
moving the frontend handling into the canister, and allow canisters to
somehow fuel HTTP endpoints.

And this led me to the conclusion that the candid definition is _not_ a
ront-end asset, and we can and should just design the “canister can
indicate its own interface” separately.

This proposal is a very simple idea: A canister, by convention, reports
its IDL via

    candid_interface : () -> (Text) query

For Motoko, this would happen automatically.

Benefits of this design:

 * We can implement it right away.

 * The interface is bundled with the `.wasm`. No more complex steps to
   be taken by `dfx` to create the `.did` file, keep it in sync, and
   somehow bundle it with the `.wasm` upon uploading.

   (We still want `moc --idl` for local development though, lest we
   write a tool that simulates the IC System API and locally calls
   `candid_interface()`.)

 * It _can_ be dynamic, if a canister changes its interface without
   redeployment.
   - Maybe some methods only become available after some flag got set?
   - Or we have a completely dynamic canister – I recently played around
     with a Python canister, and there I would expect to
     upload some code using an regular call, and the interface should
     match the installed code.
   - Or maybe we have a proxy canister that fetches its candid from the
     canister it proxies.

 * As @rossberg points out: Candid is not actually tied to the IC
   system, and could be used in other “service RPC” contexts as well.

   The presented design will neatly apply to other environments as well.

 * Treating this different from front-end assets is sensible:

   - Front-end assets need to be reachable via HTTP, it seems. But the
     candid interface not: Any tool that cares about Candid is able to
     speak Candid and IC.

   - Front-end assets may be developed and deployed independently from
     the backend. The Candid interface is tied to the canister.
@nomeata nomeata added the P2 medium priority, resolve within a couple of milestones label May 15, 2020
@dfinity-ci
Copy link

This PR does not affect the produced WebAssembly code.

@rossberg
Copy link
Contributor

I'm a bit puzzled. The underlying idea is to stuff assets and everything into a single Wasm module? But then you can only get them out by expensive calls into Wasm! And more expensive copying at the boundaries. And how will this support multiple modules? Embedding them as a binary blobs into another module? Sorry, that all sounds super-backwards to me.

@rossberg
Copy link
Contributor

Also, I'm not fond of conflating domain and meta-domain, by having reflection as a regular method. PL has long moved away from that, in favour of mirror-based approaches, because they are a more capability-conform setup.

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

Maybe I shouldn’t have spent so much words on frontend assets, but let’s keep the discussion on the IDL description … should we pull you into the ongoing meetings about assets and certified variables?

Also, I'm not fond of conflating domain and meta-domain, by having reflection as a regular method. PL has long moved away from that, in favour of mirror-based approaches, because they are a more capability-conform setup.

What does mirror-based mean, concretely?

And how would you support a canister with a dynamic interface?

What if the candid_interface method would be an IC method, but not a candid method (e.g. would not encode the result using Candid). Then we would not be conflating domain and metadomain, but they would be separate, just using the same raw primitive notion of “IC byte-shovling method”. But it doesn't seem to gain much.

Oh, and: Candid messages are self-describing, so why not Candid services?

@rossberg
Copy link
Contributor

rossberg commented May 15, 2020

should we pull you into the ongoing meetings about assets and certified variables?

Not that I'm keen on it, but if this is where it moves then I may want to have a word.

What does mirror-based mean, concretely?

See https://en.wikipedia.org/wiki/Mirror_(programming)

Another way of describing it is that a mirror makes reflection a separate capability from regular access to an object.

What if the candid_interface method would be an IC method, but not a candid method

That would be somewhat better conceptually, but still mixes up capabilities.

Oh, and: Candid messages are self-describing, so why not Candid services?

Well, plain data does not encapsulate anything, especially not behaviour. Also, subtyping is message coercive on data, so you don't get to downcast back to rediscover elided information, while actor reflection allows you to do that.

Edit: And I should have added: the types of message data are a low-level encoding detail, they are not accessible from the higher-level typed programming model, unlike the proposed method.

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

That would be somewhat better conceptually, but still mixes up capabilities.

But how would it be different from what we thought we’d do earlier: In both cases it would be a read query, just to a static part of the canister module.

while actor reflection allows you to do that.

…as you would in any other model we talked about (e.g. IDL as part of system-provided static assets).

What do you have in mind instead?

@rossberg
Copy link
Contributor

Good question. If interface queries are a system call, then at least there are possible ways in which the system could limit access to them (the system would be the mirror).

One possibility would be that the system distinguishes public, reflective actor ids from private, non-reflective ones. At least that path would remain open as a future feature.

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

One possibility would be that the system distinguishes public, reflective actor ids from private, non-reflective ones. At least that path would remain open as a future feature.

The proposed interface is a “should”. If you create internal, private actors that should not be inspectable, then just don’t implement candid_interface. Or if you want to slap access control on there, just do that in candid_interface.

Both these points speak in favor of a flexible, programmating interface, in contrast to a “the system always stores an IDL file in a way that everyone can access it”.

@rossberg
Copy link
Contributor

If you create internal, private actors that should not be inspectable, then just don’t implement candid_interface.

Then it is impossible to separate the ability to use an actor from the ability to reflect on it. You may want to provide one but not the other.

Or if you want to slap access control on there, just do that in candid_interface.

That would be ACLs instead of capabilities. I thought you were in the other camp? :)

in contrast to a “the system always stores an IDL file in a way that everyone can access it”.

Why would everyone have access? That depends on how that access is handled in the system. For a private actor, it may only be the owner.

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

That would be ACLs instead of capabilities. I thought you were in the other camp? :)

Once we have capabilities, and they allow you to control access to certiain methods, then this design with neatly map to capabilities – without a special casing “get interface” from any other “do X”. So this, too, seems to be be in favor of a uniform way of accessing canister features (including getting the interface).

Why would everyone have access? That depends on how that access is handled in the system. For a private actor, it may only be the owner.

Yes, but again that is a problem that we have to solve for any other query as well. It is not specific to getting the IDL, or getting the index.html, or getting the current counter. I find your arguments very supportive of my proposal :-)

@rossberg
Copy link
Contributor

I suggest reading Gilad's paper (linked from the Wikipedia article). He analyses multiple case studies, some of which are relevant to our scenario. :)

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

Gave it a glance. Not sure how the listed “The advantages of mirrors include:” apply here (but I don't understand all of them).

Anyways, that would suggest we have a well-known “candid interace store”, a canister that maybe has this interface

type CandidInterface = Text;
actor CandidMirror { 
  getInterfaceFor : (principal) -> CandidInterface;
}

and somehow (how?) knows the interface for all canisters?

Is that what you are propsing? If not, then what are you proposing?

@crusso
Copy link
Contributor

crusso commented May 15, 2020

Could we not just store the interface in an optional custom section? We could support a system method that allows a canister to query its own custom section, if it wants to navel gaze or pass it on to a some client. And the system could return custom sections without entering wasm.

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

Could we not just store the interface in an optional custom section? We could support a system method that allows a canister to query its own custom section, if it wants to navel gaze or pass it on to a some client.

Hmm, seems very similar than just using a data section? (Which, especially with the bulk memory proposal, no longer wastes precious heap space at rest.)

And the system could return custom sections without entering wasm.

And expose it as a seprate read request on the HTTP interface to external users? Or also to other canisters, via a query call to the management canister maybe? Would this allow reading any custom section?

What does that buy us the presented proposal?

And how would this support changing the interface without re-uploading the canister (e.g. a proxy or dynamic canister)?

@crusso
Copy link
Contributor

crusso commented May 15, 2020

Could we not just store the interface in an optional custom section? We could support a system method that allows a canister to query its own custom section, if it wants to navel gaze or pass it on to a some client.

Hmm, seems very similar than just using a data section? (Which, especially with the bulk memory proposal, no longer wastes precious heap space at rest.)

I guess, but is that yet another proposal we have to wait to arrive?

And the system could return custom sections without entering wasm.

And expose it as a seprate read request on the HTTP interface to external users? Or also to other canisters, via a query call to the management canister maybe? Would this allow reading any custom section?

What does that buy us the presented proposal?

I guess it uses stuff that's readily available and doesn't so readily conflate static with dynamic, which is what I think Andreas was objecting to.

And how would this support changing the interface without re-uploading the canister (e.g. a proxy or dynamic canister)?

It doesn't, but perhaps that could be added later. A (universal) proxy would need a more generic entry point anyway, right?

But I was just trying to mediate....

@nomeata
Copy link
Collaborator Author

nomeata commented May 15, 2020

But I was just trying to mediate....

Yes, sorry, I see that intention and I appreciate that.

I guess, but is that yet another proposal we have to wait to arrive?

No need to wait for it, it just means that we know that one problem (wasting heap space) will go away in the future.

I guess it uses stuff that's readily available

Quite contrary: The present proposal does not require any new features from any existing component; no new request types, no new canister state, no new system API.

and doesn't so readily conflate static with dynamic, which is what I think Andreas was objecting to.

That’s maybe the main point of contention: I think this mechanis must be dynamic to be general and flexible enough (see example above). If one considers that to be an anti-feature, then I understand that this design is unappealing.

But a canister can upgrade itself. So this means if you need to have a dynamic interface description, you can have in anways – you just have to jump through some ugly hoops.

Re Gilad's paper: I think my proposal actually satisfies Stratification, because the canister_interface function is not (expected to be) part of the published Candid interface, so await C.canister_interface() (where C is a IDL-derived reference) would not work. Instead one has to use a separate mechanism to invoke that function (sharing a good chunk of the lower-level method invocation mechanism, of course). Seems that might qualify as a mirror…
It seems that it also satisfy Gilad’s request for “encapsulation”, simply because by virtue of the actor model the user of the canister_interface is encapsulated from its implementation. I don’t know how to map the third requirement, “structural correspondence”, onto our situation.

@matthewhammer
Copy link
Contributor

I think my proposal actually satisfies Stratification, because the canister_interface function is not (expected to be) part of the published Candid interface.

I agree with this sentiment: It seems like this proposal is doing stratification.

But perhaps we should be more explicit about the "other levels" we are adding to each canister? This feature is adding to some "Candid interface level" of a canister that has yet to be named, right?

@matthewhammer
Copy link
Contributor

matthewhammer commented May 15, 2020

adding to some "Candid interface level" of a canister that has yet to be named, right?

Forgive me if this level has been named, but I don't know it.

Provisionally, I see the following two levels, with these "names":

  • Level 1: The service type provided by the canister written in Motoko, directly controlled by the Motoko developer, expressed as a Candid type.
  • Level 0: The additional methods of the canister provided by the Motoko compiler, not controlled or authored by the Motoko developer directly.

(or maybe "level 0" is really "level 2"? In any case, there is stratification in my view)

@matthewhammer
Copy link
Contributor

The proposed interface is a “should”.

Good point. I think the idea here is to support an "opt-in" form of dynamic reflection, and I favor that kind of design. As mentioned above, I think there is adequate stratification. I can't see a good way to add more without loosing the point here.

As to the question of whether this reflection should itself be a canister service (one of Joachim's possible interpretations of the "mirror suggestion from @rossberg ): I would hope that the source of this service API information about a canister is that canister itself, perhaps from another "level" of its canister API (to satisfy the need for stratification in the design), much like in this proposal.

The external service that caches this information, cross-references it, indexes it for search, etc can and should be another service. That "mirror" is way more complex and feature-ful. I think that the OOPSLA paper about Mirrors is discussing OO runtimes that do much of this kind of stuff, and perhaps that's all good to keep in mind for those designs as well. Just a thought.


BTW -- The 55 Foundry folks have been floating "canister (dev) store" as an app that they'd like to create for other canister developers. Their customers would be other developers that shop around for functionality to build upon. The app store acts as a way to communicate trust and stability, do KYC for business relationships, etc. One can imagine that the "source of truth" for this app store (or any other like it) for any given canister on the IC is the entry point proposed here. On top of that feature, app store developers can build other features, cache the information, etc.

@hansl
Copy link

hansl commented May 15, 2020

I still personally think it should be a separate call from canister_query and canister_update. It's just a separate concern entirely.

But in the question of whether the powers that be have made that decision, I suggest using an UTF-8 character that is quasi-impossible to type, like an emoji even, and is invalid Motoko function name so people cannot define that and break moc.

@matthewhammer
Copy link
Contributor

matthewhammer commented May 16, 2020

I still personally think it should be a separate call...

To what does "it" refer? (candid_interface : () -> (Text) query)?

@hansl
Copy link

hansl commented May 16, 2020

Sorry. My opinion on the manner is that getting the interface of a canister shouldn't be part of the API of the canister (for reasons I think were explained above), and should be something that's guaranteed by the system. There are big advantages to making API of canisters a first-class citizen to the platform they're running on.

One such advantage that I described to Joachim is that we could enforce that canister upgrades cannot break existing code, a guarantee that can be provided because Candid knows about covariance of types. Joachim replied that there is nothing preventing developers from changing the API semantically (e.g. returning an empty array all the time). He also believes that verifying covariance of an API should be left to tooling, not platform.

My position on the subject is;

a) you will never be able to prevent people from semantically doing the wrong thing,
b) you CAN however prevent people from breaking code that depends on them,
c) if you don't prevent them from breaking that code, people can maliciously or not do so. See the Leftpad fiasco on NPM, where unpublishing a package lead to thousands of packages used by millions of users to stop working instantly. Allowing breaking an API can lead to the same consequences.
d) Murphy's law basically tell us that if someone can do a booboo, it will happen at some point, so it's not a question of if, but when.

We have the power to make those decisions now. We should embrace it to make the platform better.

@chenyan-dfinity
Copy link
Contributor

I think it's better to have a system API to fetch the Candid interface. We can store .did file as a static data in wasm, so you can update the data dynamically as well. candid_interface is a meta-level method, same as install or upgrade.

@chenyan-dfinity
Copy link
Contributor

b) you CAN however prevent people from breaking code that depends on them,

You cannot prevent people from breaking code. Even if the subtyping rule doesn't allow your breaking change, you delete the code and simply trap. You can WARN people when they break the code.

c) if you don't prevent them from breaking that code, people can maliciously or not do so. See the Leftpad fiasco on NPM, where unpublishing a package lead to thousands of packages used by millions of users to stop working instantly. Allowing breaking an API can lead to the same consequences.

True, but I can think from the other direction: someone writes an API to steal password, and all malicious apps depend on it. Should we allow the author to withdraw that API?

@hansl
Copy link

hansl commented May 16, 2020

Even if the subtyping rule doesn't allow your breaking change, you delete the code and simply trap.

This is a semantic change though, which should be handled socially. I'm talking specifically about breaking the API, which would trap every time.

Should we allow the author to withdraw that API?

Using governance, that'd be fine.

Honestly this is one of the first point I brought and asked about when I was hired; how do people handle API changes in canisters. I'm scared a lot that this seems like something we're not spending nearly as enough time as I think it deserves.

@nomeata
Copy link
Collaborator Author

nomeata commented May 16, 2020

I think it's better to have a system API to fetch the Candid interface. We can store .did file as a static data in wasm, so you can update the data dynamically as well.

You mean (data) or custom section? How would you dynamically update it?

candid_interface is a meta-level method, same as install or upgrade.

Can you explain why that is desirable or preferable? What good is enabled or prevented by this? And, crucially, why does the system care about the idl any more than any other query result?

@nomeata
Copy link
Collaborator Author

nomeata commented May 17, 2020

Here is another try at justifying this approach, explicitly listing the assumptions and goals, and drawing this conclusion – this may provide more tangible points of attack…

The main underlying assumptions are:

  1. Candid is part of our platform, but not the core system. Our system is more or less untyped actors sending opaque data to each other.
  2. Our system ought to support many programming languages, in particular, dynamic ones.
    Example: Canisters written in Python would have all the same code (.wasm); users wouldn’t compile them, but upload code through regular update methods.
  3. Our system ought to support ways of composing canisters that makes the interface dynamic.
    Example: A proxy canister (load balancer, montor, logging, A/B-Testing) that forwards calls to backend canisters. After a (proper) upgrade of the backend canister the proxy can dynamically be reconfigured to now point to the new canister; effectively changing its interface.

My requirements and goals are (roughly in descending order or importance):

  1. Given a reference to a canister that “speaks Candid”, I (e.g. dfx) should be able to obtain the interface description.
  2. I like to avoid complexity on the side of dfx that are unnecessary, or that might cause the wrong (e.g. old) interface to be associated to a canister.
  3. I like to avoid central services (canister registries)
  4. The actual Candid interface of a caniter should not be affected by this mechanism.
  5. The IC-independence of Candid (i.e. it generally describes an RPC mechanism) should extend to the interface.

From these I come at the present design:

  • Becuase of goal 1 we have to do something.

  • Becuase of assumption 2 + 3 it is clear that, in general, the Candid interface is a function of the current state.

  • We have multiple ways of expressing a “function of the system state” on the system level – query calls of course, but also special stuff like “pay for ingress”. Still, a query seems better than the alternatives:

    • Adding a special system method canister_interface (like canister_query) would go against assumption 1. Note that existing and future “special stuff” is special because it is needed for system-level concerns.
    • One could consider adding, on the system level, a general concept of “state-dependent metadata”. So far I don’t see a reason for that, and goal 5 (IC-independence of Candid) speaks against it; it would unnecessary widen the interface between Candid and system.
    • Potentially other requirements (frontend assets) might provide an alternative general concept that we could use here. But even if that were the case, goal 2 advises against it: Bundling the Candid with the Wasm (custom section, as often suggested, or just opaquely in the code, as proposed here) will keep them easier in sync and avoid unnecessary complexity.
    • I think nobody wants fetching the interface to be an update (allowing state changes, more expensive and complex to do).

That leaves, it seems to me, the query method.

  • Andreas points to “Gilad’s paper”. While I am not not fully convinced that it is necessariy applicative here (after all it describes a source lange API design, not a lower-level system interface), we actually satisfy some points: Because the Interface is produce by Wasm code, instead of, say “static data in custom section X”, the implementation is encapsulated. And because the query method would (typically) not appear in the Candid interface itself, we’d have Stratification.
    A separate canister to query interfaces from goes aginst the decentralized spirit of the IC (goal 3).

  • Then there is the question of Goal 4. This is moslty attained (the “get interface” query does not pollute the interface of all actors). Infortunately, we’d cut a hole in the namespace for methods. This is unavoidable as the Candid method name space unfortunately(?) cover the full IC method name space. A wart, I agree, but one I can live with (until I have seen a better alternative).

    @hansl’s suggestion to at least avoid method names that are “easy” to export in Motoko or Rust is as good one. But Iwouldn’t use unicode – many languages allow unicode in their symbol names. Instead, I’d propose candid interface (note the space):

    • Still easy to write in ascii (e.g. .wat files, Candid implementations).
    • Invalid as an identifer in most langauges (more than uncode).
    • An idiom that we already used, based on @rossberg’s suggetion in canister_query <name>, for the same reasons.

Finally, there are the nice-to-have advantag that we can do this right away.

So, if you are not convinced (and care), can you pleas point to the flaws in this reasoning or where I have the wrong assumptions?

@chenyan-dfinity
Copy link
Contributor

Candid is part of our platform, but not the core system. Our system is more or less untyped actors sending opaque data to each other.

This is a big mistake. It means people can write canisters with different wire formats and they can never communicate with each other. This hurts the ecosystem. Plus, serializing everything twice is redundant.

I always thought this is a temporary assumption that we are going to fix at some point. Is this going to change in the future?

@rossberg
Copy link
Contributor

rossberg commented May 18, 2020

  1. Candid is part of our platform, but not the core system. Our system is more or less untyped actors sending opaque data to each other.

While that is technically true, that's a low-level view and not how we should be thinking about it conceptually from a programming model perspective, at least not by default.

  1. Our system ought to support many programming languages, in particular, dynamic ones.
    Example: Canisters written in Python would have all the same code (.wasm); users wouldn’t compile them, but upload code through regular update methods.

I don't know what this assumption is based on, but I don't buy that at all. In fact, such an approach would seem like an irresponsible security risk, because it's a wide open attack vector -- one mistake in the ACL check of the update method and anybody can replace your code.

No, this is another perfect example where you absolutely want to separate interface and meta interface.

  1. Our system ought to support ways of composing canisters that makes the interface dynamic.
    Example: A proxy canister (load balancer, montor, logging, A/B-Testing) that forwards calls to backend canisters. After a (proper) upgrade of the backend canister the proxy can dynamically be reconfigured to now point to the new canister; effectively changing its interface.

I don't see why any of that would require interface reflection? You can forward and log messages just fine without being able to inquire the set of supported messages. Again, different levels.

My requirements and goals are [...]

I mostly agree with these. However, I have an additional requirement:

  • A canister must be able to consist of more than one module.

From that it immediately follows that a canister is not simply a Wasm module. Instead, it is a set of assets. And there needs to be system API to retrieve assets. And once you arrive at that point, the natural solution is to simply store the IDL as an asset and rely on some convention for accessing it.

@nomeata
Copy link
Collaborator Author

nomeata commented May 18, 2020

Candid is part of our platform, but not the core system. Our system is more or less untyped actors sending opaque data to each other.

This is a big mistake. It means people can write canisters with different wire formats and they can never communicate with each other. This hurts the ecosystem.

I agree it hurts the ecosystem. But there will always be requirements that Candid doesn’t cut (e.g. HTTP Canisters). I hope we can get people to hook into the ecosystem because it is good, not because it is the only choice.

Plus, serializing everything twice is redundant.

What do you mean? Where do we serialize stuff twice?

While that is technically true, that's a low-level view and not how we should be thinking about it conceptually from a programming model perspective, at least not by default.

Well, conceptually or in the programming model, the interface access is not part of the canister interface. This proposal is all about the low-level view.

[python]

That seems to be a hard to consolidate difference now …

Why do you think an “change code” update call is any more senstive in general then a “change state” update call? Conceptually, both just change the state machine that is the canister. And pragmatically, most use cases that would want to use our platform for its tamper proofness have the problem that unauthorized state changes are catastrophic.

[proxy example] I don't see why any of that would require interface reflection?

This assumption mainly justifies why the interface (of the proxy) must be dynamic (i.e. not hard-coded in the wasm module or uploaded along it). When the proxy admin reconfigures the proxy to forward to another canister (state change, not code chnage), the interface evolves.

But it also seems to suggest that it might be nice if the proxy can (if that is desired) fetch the evolved interface directly, instead of having the admin to manually upload the right one.

From that it immediately follows that a canister is not simply a Wasm module. Instead, it is a set of assets. And there needs to be system API to retrieve assets. And once you arrive at that point, the natural solution is to simply store the IDL as an asset and rely on some convention for accessing it.

But that would only work if one dismisses the “interface can be state-dependent”, right? Or if assets becomes dynamically modifiable (and essentially a file system or key-value store that can be written from the running Wasm canister).

And I think the “simply” before “storing the ID.” is not true (complexity on the dfx side, worries about keeping them in sync). And neither is “some convention for accessing it”, because Wasm modules don’t have to be accessed from the outside (probably shound't by default), but the IDL has to.

And this access would (likely) be accessible to other canisters somehow (maybe via the managemnet canister, e.g. ic.interface(canister_id) : text) So you’d still have interface reflection.

Maybe it helps to address the two related, but still mostly orthogonal, questions, and recall the options we have:

  • How is the interface served by the system: regular query call to the canister, out-of-band call to the canister, call to the management canister, request only availble to external users?
  • How is the interface stored in the canister and exposed to the system: opaquely in code (custom entry point), opaquely in code (normal query entry point), custom Wasm section, separate asset (static only), separate asset (dynamic access)?

It seems that we can serve any of the interfaces with any of the implementations (although some combinations are odd).

@chenyan-dfinity
Copy link
Contributor

But there will always be requirements that Candid doesn’t cut (e.g. HTTP Canisters).

But we need a single wire format to transmit data. Why not Candid, but CBOR? What feature is missing from Candid that prevents us from using it in the core system? We made the decision to CBOR at that time because there is no Rust Candid library at all. It makes sense to choose any existing wire format and make progress. Now that Candid is more mature, is it a good time to revisit these decisions? And what's HTTP canisters by the way?

Where do we serialize stuff twice?

We serialize data in Candid and then in CBOR.

@nomeata
Copy link
Collaborator Author

nomeata commented May 18, 2020

But there will always be requirements that Candid doesn’t cut (e.g. HTTP Canisters).

But we need a single wire format to transmit data. Why not Candid, but CBOR? What feature is missing from Candid that prevents us from using it in the core system? We made the decision to CBOR at that time because there is no Rust Candid library at all. It makes sense to choose any existing wire format and make progress. Now that Candid is more mature, is it a good time to revisit these decisions?
We serialize data in Candid and then in CBOR.

Yes, we could replace CBOR with Candid to let the userlib talk to the HTTP handler in the replica. But that would still encapsulate the application level data that goes from the application frontend to the canister. It’s like Ethernet and IP. Or HTTP and JSON. Or … anyhing really that has different layers.

And this only applies for ingress messages – inter-canster messages don’t need the CBOR/HTTP layer.

And what's HTTP canisters by the way?

Canisters that you can talk to directly over HTTP, and that do the whole HTTP request decoding in canister (i.e. wasm) code. See https://github.com/dfinity-lab/notes/pull/3

@hansl
Copy link

hansl commented May 18, 2020

But there will always be requirements that Candid doesn’t cut (e.g. HTTP Canisters).

That's just not true though. There is a need for an API for HTTP Canisters. That proposal could say that the endpoint must look like:

service Http : {
  "http": (Text, vec Word8) -> ([Text, vec Word8);
}

We control the proposal, we can control the interface.

My view after reading these:

  • Defining an official API language has a lot of upsides (statically analyzing canisters, compatibility, upgrades, fuzzy testing, ...),
  • NOT defining an official API language only has downsides (incompatible canisters, upgrading being broken by design).
  • Also, there hasn't been a case yet put forward where not declaring a first-class Schema language is a win. Please provide one.

For people who want schemaless bytes they can always declare their endpoint as vec Word8 and deal with it in the canister. But at least the system will be clear on the API for accepted inputs and expected outputs.

It's also a must have for an ecosystem that will exist at some point (and not just built up from scratch). A developer will be asking a lot of "what's the interface for canister ABC". If you have to search stackoverflow you've already lost. Also a good argument for adding support to comments in Candid files.

@nomeata
Copy link
Collaborator Author

nomeata commented May 18, 2020

Please provide one.

Anything that doesn't need to talk to other Candid canisters. You can write .wat files that run on the IC. Only useful for testing, debugging and understanding, but testing, debugging and understanding are very useful. The universal canister used in ic-ref-test, for similar reasons.

I fully appreciate the benefits of the Candid ecosystem! I just don't see any benefits of forcing it on people who want to work at a lower level - the fact that some canisters just shovel bytes doesn't prevent any of the fuzz testing etc of your Candid-using canisters!

I don't believe we can manage the complexity of the system without pulling in appropriate layers of abstraction. And I claim that a typing layer on top of an untyped actor model is a good separation of concern - like, say, the various RPC systems on top of TCP out there.

@chenyan-dfinity
Copy link
Contributor

And I claim that a typing layer on top of an untyped actor model is a good separation of concern - like, say, the various RPC systems on top of TCP out there.

Agreed. But I'm more concerned about people inventing a new Candid-like format that is not compatible with our ecosystem. Then it becomes a war between iOS and Android again. We can keep the abstraction, but make both of them required as part of the core system, so that we won't have defragmentation.

@nomeata
Copy link
Collaborator Author

nomeata commented May 19, 2020

Agreed. But I'm more concerned about people inventing a new Candid-like format that is not compatible with our ecosystem. Then it becomes a war between iOS and Android again. We can keep the abstraction, but make both of them required as part of the core system, so that we won't have defragmentation.

I’m not concerend by that. If people feel a need for an alternative, they will build it in any case – if we force them to use Candid they’ll just mark everytying vec nat8 and warp their stuff inside. Or they will leave the platform alltogether, because they are faced with restraints that they find annoying and patronizing… No, our higher-level offerings (Motoko, CDKs, Candid) better convinces because it is attractive, not because it is required.

@chenyan-dfinity
Copy link
Contributor

if we force them to use Candid they’ll just mark everytying vec nat8 and warp their stuff inside

This is exactly what I want. The benefit is that people using Candid still have a way to process their messages, even though they may need to write their own serializer. But consider the opposite (our current status), the Candid canister has no way of communicating with non-Candid canisters. They will be forced to leave the ecosystem if they want to work with non-Candid canisters. If you really want to a way to be flexible, Motoko compiler needs to provide a flag to opt-out Candid, but do we want to go there?

our higher-level offerings (Motoko, CDKs, Candid) better convinces because it is attractive, not because it is required.

Many times, people don't have a choice. Suppose a big company builds a non-Candid canister. To interact with their data and API, you have to use whatever protocol they come up with. By enforcing Candid in our core system, people will build tools/libraries to convert between these protocols. Without the enforcement, it's just isolated islands.

@nomeata
Copy link
Collaborator Author

nomeata commented May 19, 2020

If you really want to a way to be flexible, Motoko compiler needs to provide a flag to opt-out Candid, but do we want to go there?

Eventually, I expect that we would expose “raw” IC calls, in Motoko, so yes. This feels like FFI to me - not urgently needed, wouldn't exist in an ideal world, something is something that advanced low-level users use (and often wrap in libraries), but that eventually becomes necessary. And it's not like it's hard - some way to mark a shared function type as “raw” (and then necessarily be typed to be Blob -> async Blob) and you have it, neatly part of the language.

If our IR was expressive enough to implement Candid decoding, we'd would have had that function type in the IR for a long time, and there is the unwritten law that every feature of the IT eventually makes it to the source language (at least that's my observation so far :-))

@chenyan-dfinity
Copy link
Contributor

Eventually, I expect that we would expose “raw” IC calls, in Motoko

Okay, this connects the fragmented world then. With Motoko enforcing Candid and replica not enforcing it, there is clearly a gap we cannot fill. Either we both make it required, or both make it optional.

@nomeata
Copy link
Collaborator Author

nomeata commented May 26, 2020

I guess we have sufficiently discussed why Candid is layer on top of the core IC system. Maybe not everybody agrees, but for now that’s the status quo. If we change that (and maybe one day the system enforces the Candid types), then I agree that the interface description ought to be hosted in a bespoke way. Until then, I suggest we stick to the current layers of abstraction, and do not assume dedicated system features for Candid.

With that assumption, can the present proposal proceed?

Also: Given that the candid interface function has a name that is not a valid identifier in most languages, and is not itself being included in the Candid interface, so importing a canister would not give you programmatic access to this introspection method, and it is not, in any sense, part of the “Candid interface” of the canister, are the concerns about Stratification addressed?

If there is still opposition, may I ask for a concrete counter-proposal?

@rossberg
Copy link
Contributor

are the concerns about Stratification addressed?

I'm afraid not. It cannot be a regular method. Whether you give it a funny name or whether it is omitted from an IDL description is immaterial to that -- given the right quoting mechanism and/or type annotation you can still use it as any old method.

An IDL description is meta data. That ought to be separate from code. The platform currently lacks a way to associate and access meta data with a canister. We should address that instead of hacking it into the wrong domain.

@nomeata
Copy link
Collaborator Author

nomeata commented May 26, 2020

I'm afraid not. It cannot be a regular method. Whether you give it a funny name or whether it is omitted from an IDL description is immaterial to that

So it was a flaw of candid to leave no space in the namespace of “raw canister entry points”?

Should we say that a candid-exposed method foo is invoked via canister_update candid foo?

given the right quoting mechanism and/or type annotation you can still use it as any old method.

If I use unsafeCoerce (i.e. assert a type that is not there) I can also get all kinds of metadata, even in strongly typed langauges. E.g. in Haskell I can do that to inspect the heap. Surely that isn’t an argument against info tables being stored in main meory.

An IDL description is meta data. That ought to be separate from code. The platform currently lacks a way to associate and access meta data with a canister. We should address that instead of hacking it into the wrong domain.

A canister is more than code, and surely can contain metadata. A raw method (not candid-exposed) is a good way for getting data out of a canister. So I don’t think it is ahack.

Also, what’s your counter-proposal?

@rossberg
Copy link
Contributor

So it was a flaw of candid to leave no space in the namespace of “raw canister entry points”?

No, it doesn't belong there. The IDL describes a canister's interface. Reflection is meta information, accessing which does not belong there, nor should it necessarily use a dynamic messaging mechanism. We need to keep our domains straight.

If I use unsafeCoerce (i.e. assert a type that is not there) I can also get all kinds of metadata, even in strongly typed langauges.

It it provides unsafeCoerce then it is not a strongly typed language. ;)

A raw method (not candid-exposed) is a good way for getting data out of a canister. So I don’t think it is ahack.

This is related to the asset discussion. :) I generally think that it's a hack to stuff everything into a dynamic code execution mechanism just because we can -- square peg round hole.

Also, what’s your counter-proposal?

That we fix the platform's overly naive model of canister = wasm module. Or short of that, we use an idea similar to the frontend container you suggested to provide meta data.

@nomeata
Copy link
Collaborator Author

nomeata commented May 26, 2020

That we fix the platform's overly naive model of canister = wasm module.

A multi-file canister module (with multiple wasm modules, and separate .did files, and maybe other metadata and assets) woudl address the least interesting part of the question. The interesting part is: By which interface do you expose access to the interface to interested parties (users or – by the strong canister principle – other canisters)?

Or short of that, we use an idea similar to the frontend container you suggested to provide meta data.

The frontend container canister idea works for HTML frontends because the HTML frontend is not in one-to-one connection to canisters. But canister interfaces are, so I don’t see how that would work here.

@chenyan-dfinity
Copy link
Contributor

chenyan-dfinity commented Jun 23, 2020

How about we store the .did file in the wasm module via dfx build and move this fetch method to ic:00 with the following signature: candid_interface : (id : Principal) -> (opt Text) query.

This separates the meta-level API from the real canister, and we can use capability/ACL from ic:00. The actual did file still comes with the canister itself.

nomeata added a commit that referenced this pull request Sep 4, 2020
This implemnets the ideas in
#1510, albeit with a method
name that clearly marks this as scaffolding.
mergify bot pushed a commit that referenced this pull request Sep 9, 2020
This implements the ideas in #1510, albeit with a method name that clearly marks this as scaffolding.

TL;DR: You can use the query method `__get_candid_interface_tmp_hack : () -> (text)` to fetch the Candid interface.

In the discussion https://dfinity.atlassian.net/browse/NNS-3 the fact that we
still can’t get the interface of a running canister, and thus have no good way
of exploring the implications on developer workflow, no way to develop UI etc.
is embarrasing for a feature that was clear that we want it like 18 months ago.

Why were we blocked? Initially because we expected some form of system-provided
assets (which we then did not get), and then because of disagreement about the
relation between Candid and the raw IC. Also @hansl writes in NNS-3 that the
system-integrated solution requires coordination between too many parties to be
done “soon”.

But this is a very dissatisfying state of affairs, because even if we can’t
nail down the interface right now, we should at least unblock all use cases
that block on this feature.

So in the interest of agility, I want to do, as a temporary work-around, the
implemnetation that we can have now, namely the Candid-internal solution that
does not require system features. Let’s start _using_ this, and then decide and
implement the “right” approach with less pressure.

I see this as consistent with Dom’s request for more agility; with other
scaffolding and experimentation going on right now (`ic1.call_simple`, work on
`exec` etc.), and in a line with such esteemed and useful hacks like
regex-based asset injection, `ic-router` or encoding subnet ids in canister
ids.
@nomeata
Copy link
Collaborator Author

nomeata commented Nov 23, 2020

Closing, as we have no active dicussion here. Can be resurrected when needed.

@nomeata nomeata closed this Nov 23, 2020
@nomeata nomeata deleted the joachim/idl-self-describe branch April 30, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 medium priority, resolve within a couple of milestones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants