Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on URLs in imports/exports #130

Closed
alexcrichton opened this issue Nov 21, 2022 · 4 comments
Closed

Questions on URLs in imports/exports #130

alexcrichton opened this issue Nov 21, 2022 · 4 comments

Comments

@alexcrichton
Copy link
Collaborator

Implementing #109 in Wasmtime I'm not 100% sure how the new URL fields should be exposed/accomodated through the embedding API, so I wanted to as a few (possibly simple) questions here:

  • Should the URL be used when linking a module together on the host? For example right now in Rust you can insert named functions into a Linker type, optionally within a nested "instance" which is just a bag of named functions. Should the URL, however, factor into the host-side linking? Similarly is this expected to affect how exports are accessed?

  • Orthogonally, is there an intended use case for these URLs which requires them to be valid? Adding a URL parser to a low-level validator adds a somewhat weighty dependency to a low-level component, so if it's possible to defer the URL validation to later that would be nice, but only as a nice to have and not necessary.

  • IIRC one of the purposes of URLs was to merge worlds together, and if so this doesn't seem like a trivial operation to me so would it be possible to write up some pseudo-code and/or words about how this is expected to work? (e.g. what to do with different-name same-url imports or same-name different-url imports.)

  • To confirm, URLs don't have any affect on validation other than uniqueness and it's-a-url-ness, right? That is, they don't play into "type matching" when validating that a component can be used to satisfy a component import?

@lukewagner
Copy link
Member

Great questions! Answering point-wise:

  • If we're doing host bindgen from a world, then the generated host code can use the input world's kebab-names for all identifiers exposed to the host embedding code. When loading a particular component, these host-facing kebab-names would be mapped to their corresponding URLs (as defined by the world) and it's these URLs that are then linked to the component (which can have its own different kebab-names). If we're in a world-agnostic context (e.g., REPL or reflective JS API), bindings should expose both the kebab-names and the URL names. If the client code knows the particular component they're loading (which is likely when the client is authoring and testing the component), they'll use the kebab-names (which they know from authoring the component). If the client code is implicitly implementing a world, they'll just need to be verbose and say the URLs they mean.
  • Mostly it's useful to validate to keep the ecosystem producing valid URLs (and, in particular, being explicit about the scheme) and not having random strings being distributed. It's also a case where one can always start conservative and relax later, but it's often impossible to go the other direction. Looking at the url crate, I had perhaps-naively thought that it looked like a lighter dependency, but is it actually worse than that?
  • The intended algorithm is: if either of those conflicts you mention occur, Wit produces a syntax error and requires the author of the merged world to resolve the conflict by locally (in the merged world) doing a renaming (e.g., using this include other-world with { conflicting-name as non-conflicting-name } hypothetical syntax we've discussed). For same-url-different-name, you'd rename one of them to match the other (so they unambiguously de-dupe and you've picked with which name). For same-name-different-url, you'd rename one to avoid the conflict. The key here is that hosts and guests can have different kebab-names for the same URL, because ultimately it's the URL that is used as the key by instantiation.
  • The one interesting interaction in type matching is that, when asking whether component type A is a subtype of component type B, the URL takes over as-if it was the only name. There's an example of this in the Import and Export Definitions section with wasi:filesystem. Thus, component subtyping is only ever comparing single strings (which default to the URL, and fallback to the kebab-name when there is no URL).

@alexcrichton
Copy link
Collaborator Author

Hm ok given all that I fear that I'm losing sight of the motivation of URLs and how everything is supposed to stack up together. Everything seems to favor using the URL but falls back to the kebab-name, so why not require URLs? Additionally why not go a step further and replace kebab-names with URLs where a bindings-generated-name could be something like that last element of the URL, or otherwise have the bindings-relevant pieces in other unrelated sections.

For example:

it's these URLs that are then linked to the component

This sounds like from an engine-implementation perspective that the kebab-names effectively don't matter, instead using URLs-falling-back-to-kebab instead. That's at least not what I was expecting and would require some significant rework of the ergonomics and planned idioms for how to create and use an embedding API.

Mostly it's useful to validate to keep the ecosystem producing valid URLs ... I had perhaps-naively thought that it looked like a lighter dependency, but is it actually worse than that?

I just had to audit/vet 100kloc for Rust for the inclusion of the rust-url crate, so no it's not a lightweight dependency. The pieces we're using are probably simple enough but the crate is pretty expansive in the functionality it provides and is by no means trivial I believe.

While I understand the desire for "let's not have random strings" I at least personally can't really fit this into my mental model of how everything is going to work. If everything validates URLs everywhere I'm having a tough time reasoning out why it's worth it other than "well it's good to be well-formed, right?" in the sense that the well-formededness doesn't seem to be buying any concrete features at this time.

The intended algorithm is: if either of those conflicts you mention occur ...

Given what you're mentioning I'm not sure why this is any better than "what if we just had urls" or "what if we just had kebab names". I was under the impression that with URLs some of this would be more automatic (somehow, I never thought that hard), but given that collisions always require manual work I'm not sure why we'd have the possibility for collision in two places.

the URL takes over as-if it was the only name

This is at least not what I was personally expecting, and this provides more fuel to the fire of "why have kebab names in the first place?" in my opinion. If the URL is what matters why not have only the URL?


I'll admit I did not try to think through all these questions before URLs were landed, I'd sort of just assumed it was all already figured out. But looking at it now I'm questioning more why we have both kebab-names and URL names. They seem to both be pulling in different directions a bit. Personally my mental model of everything would make more sense with something along the lines of:

  • Import/export names are always URLs
  • Bindings-related information goes into a separate section, e.g. documentation, mapping URLs to kebab-names, language-specific metadata if necessary, etc.

or the alternative of no URLs and going back to just kebab-names everywhere. I don't have a strong grasp on the original motivation for URLs, however.

@lukewagner
Copy link
Member

It's a valid question to ask why have both. To start with, I think there are a number of simultaneous things we want here for a good DX:

  • When importing or exporting a standardized interface, I want to state that unambiguously, but not everything I import/export must be tied to some external reference point.
  • I want to be able to refer to external standards, registries, remote-things-at-http-URLs, local things, for both interfaces and implementations.
  • I almost never want to see URLs in my source code, I want to see nice identifiers derived from kebab-names:
    • When writing guest code compiled against a world (that I didn't have to write)
    • When writing guest code that links to the imports/exports of another component provided as a .wasm
  • When I merge two worlds as a host, I want to be able to run components that were already independently-compiled against either of the two worlds, even if their kebab-names conflict.

Automatically deriving kebab-names from URLs by parsing out parts of the URL is an interesting option to consider but:

  • URLs need to cover a bunch of different use cases (standards, registries, various storage backends, versions, content hashes, ...) which may end up fixing the structure outside the control of the toolchain, so coming up with a particular text pattern that says exactly where to pluck the kebab-name from seems hard.
  • A single URL may end up being used in different worlds in which the kebab-name we pluck out of it conflicts with that of other URLs and it's not clear how we'd avoid this since we were relying on URLs for uniqueness, not fragments of them.
  • Different worlds (which can and should be written by a bunch of different folks, not all coordinating) are going to want to reuse common short-and-sweet kebab-names (like console) for various different interfaces, putting pressure on interfaces to claim these short-and sweet names, making conflicts more likely.

Putting kebab-names in separate custom sections is also an option to consider but since these kebab-names go into the actual source code of client code, and these bindings are sometimes generated live by the runtime, this seems to go against the idea that custom sections are semantics-free and can always be stripped: stripping a custom section shouldn't break running code. If we make special rules about these custom sections, it's not clear how this would be meaningfully different than what's proposed other than binary layout.

If the complexity of rust-url is the main problem, we can talk about relaxing the validation. I think the high-order bit is having an explicit scheme: that is "supposed to be" globally unique (coordinated through IANA registration), so one option could be to define the grammar for component "URLs" to require a URL-compatible scheme: prefix and leave the rest of the URL string un-parsed.

@lukewagner
Copy link
Member

Looks like this is all pretty much resolved by #198 and #205.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants