Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relaxing closed world validation and improving open world optimization #6965

Open
6 tasks
tlively opened this issue Sep 23, 2024 · 7 comments
Open
6 tasks

Comments

@tlively
Copy link
Member

tlively commented Sep 23, 2024

The --closed-world flag lets us assume that we can make arbitrary changes to types as long as those types are not part of the module's contract with the outside world. Since the type system is structural, there is not a single, precise definition of what it means for a type to be "part of the module's contract," but we have chosen it to mean that we will keep the types of exported or imported module elements the same, but all other types are fair game. In particular, we assume we are allowed to modify subtypes of public types that are not themselves public. Otherwise a single anyref in an exported function would prevent us from modifying any struct or array type and a single funcref in an eported function would prevent us from modifying signatures of referenced functions.

However, our current closed-world validation is much stricter than this. It additionally restricts what types are allowed to be public. It allows the types of exported and imported functions to be public, and therefore must also allow all types in the rec groups of those function types to be public, but it does not allow any other defined heap types to be public, even if they are part of the type of an imported or exported function.

I believe the original motivation for these additional restrictions was that we wanted to be able to optimize as many types as possible, so we didn't want to allow users to expose types in a way that would inhibit optimizations. But this is putting the cart before the horse. We should be able to optimize any module we are given according to the assumptions configured via command line options, and there is no user benefit if we simply reject modules that they want to optimize because we cannot optimize it as well as some different module they could have given us. Users (such as Kotlin) are running into these errors when they try to use smaller rec groups in their input.

Here is the state of the world I would like to move to:

  • All types are allowed to be used in a module's public interface in both open and closed world modes.
  • The only difference between open and closed world modes is whether subtypes of public types are considered public by default.
  • All type optimization passes work equally well in both modes, using the classified public and private types as their sources of truth for what types are allowed to be modified.
  • We have a (@private) type annotation that allows types that would otherwise be considered public to be considered private instead. It is an error for a (@private) type to be used in a module's public interface, so this is only useful for annotating subtypes of public types in open world mode.
  • We have a (@public) type annotation that allows types that would otherwise be considered private to be considered public instead.
  • Neither type annotation affects how public visibility propagates to subtypes.
  • It is an error if a single module annotates the same type as both (@private) and (@public) (even if the annotations are on different definitions of the same type).

Here are the steps necessary to get to that state of the world:

  • Add a temporary --relaxed-closed-world flag that behaves like --closed-world but allows any type to be public.
  • Get the fuzzer running cleanly with --relaxed-closed-world instead of --closed-world.
  • Remove --relaxed-closed-world and allow any type to be public with --closed-world.
  • Implement propagation of public visibility to subtypes in open world mode.
  • Design a custom section framework for arbitrary type annotations like we have for code annotations.
  • Implement (@private) and (@public) annotations.

@kripken, WDYT?

@kripken
Copy link
Member

kripken commented Sep 23, 2024

Sounds good!

  1. Might be worth mentioning externref here. I assume an exported/imported externref is handled similarly to anyref?
  2. We want to still preserve the key property in closed world that one can send a reference out but the outside cannot inspect (for an array or struct) or call (for a function) that ref. That is, that the outside can cache the reference and send it back in, but not interact with it. Atm in closed world we achieve that by sending out anyref/externref, and not the specific GC type, but maybe there's a better way, e.g., sending out the specific GC type but annotating it as private. I don't feel strongly here.

@tlively
Copy link
Member Author

tlively commented Sep 24, 2024

  • Might be worth mentioning externref here. I assume an exported/imported externref is handled similarly to anyref?

Yes, good point. Externrefs in the public interface should be treated as though they were also anyrefs and vice versa.

  • We want to still preserve the key property in closed world that one can send a reference out but the outside cannot inspect (for an array or struct) or call (for a function) that ref. That is, that the outside can cache the reference and send it back in, but not interact with it. Atm in closed world we achieve that by sending out anyref/externref, and not the specific GC type, but maybe there's a better way, e.g., sending out the specific GC type but annotating it as private. I don't feel strongly here.

I think that use case will have to continue using abstract heap types like anyref and externref on the boundary. If we allowed a defined type to be passed out directly, then even if we assume the environment will not access it directly, changing it would still change the type of the function that passes it out. That's fine in a JS embedder, but not in any statically typed embedder. If we want to allow this anyway, we could use the (@private) annotation and not make it an error to use (@private) types in the module interface.

kripken added a commit that referenced this issue Oct 18, 2024
These were added to avoid common problems with closed world mode, but
in practice they are causing more harm than good, forcing users to work
around them. In the meantime (until #6965), remove this validation to unblock
current toolchain makers.

Fix GlobalTypeOptimization and AbstractTypeRefining on issues that this
uncovers: without this validation, it is possible to run them on more wasm
files than before, hence these were not previously detected. They are
bundled in this PR because their tests cannot validate before this PR.
@kripken
Copy link
Member

kripken commented Dec 4, 2024

Fuzzing some stuff, I realized I don't know how we intend it to work in this future plan. Consider this:

(module
 (import "fuzzing-support" "call-ref" (func $call-ref (param funcref) (result i32)))

 (export "main" (func $main))

 (func $func (param $0 i32)
  (drop
   (local.get $0)
  )
 )

 (func $main
  (drop
   (call $call-ref
    (ref.func $func)
   )
  )
 )
)

$call-ref is a function that gets a ref and calls it from the outside (JS, in the main fuzzer). Running this on --gufa --closed-world causes breakage, because GUFA then assumes no calls happen on the outside (given no calls, it then puts an unreachable in the body of $func). This is the reason that the current docs explain "closed world mode" as "the outside may receive objects, but does not inspect their internal details, call them, etc."

My specific question here is, how would a --closed-world mode be used here, when all closed-world means is the default privacy of types? Would the type of the function $func need to be declared explicitly as public?

It does seem like it is convenient to have a flag that says "the outside may receive objects, but does not inspect their internal details, call them, etc.", which allows GUFA to just not worry about calls from the outside, and that the user can specify when they have this property.

@tlively
Copy link
Member Author

tlively commented Dec 4, 2024

Yeah, good point. ConstantFieldPropagation would have a similar problem where it would want to know whether a public type is going to actually be mutated or allocated by the outside world so it can decide whether to optimize uses of that type, even though it would never change the type itself.

Here's the best solution I've thought of:

  • Add a @protected (name subject to bikeshedding) annotation on public types and interpret it to mean that the type will not be accessed or instantiated from the outside, even though the type itself cannot be optimized because it is part of the public interface.
  • Interpret all public types as @protected in --closed-world mode unless they are explicitly marked @public.

What do you think?

@kripken
Copy link
Member

kripken commented Dec 4, 2024

the type will not be accessed or instantiated from the outside, even though the type itself cannot be optimized because it is part of the public interface

Isn't that the same as marking the type private? Or do you mean that this would allow a private type to be part of the public interface? (but I thought that was already proposed)

@tlively
Copy link
Member Author

tlively commented Dec 4, 2024

Isn't that the same as marking the type private?

No, because protected types would still be part of the public interface, so we wouldn't be able to modify them. This is unlike private types, which we can modify however we want.

Or do you mean that this would allow a private type to be part of the public interface? (but I thought that was already proposed)

No, this part from the opening post doesn't change:

It is an error for a (@private) type to be used in a module's public interface, so this is only useful for annotating subtypes of public types in open world mode.

Here's a table laying out the differences between the three visibility levels:

visibility assumptions type optimizable? values optimizable?
public The type appears in the public interface and can be accessed and allocated by the environment.
protected The type appears in the public interface, but will not be accessed or allocated by the environment.
private The type does not appear in the public interface and will not be accessed or allocated by the environment.

In --closed-world mode, all types would be protected or private, unless explicitly annotated as public. In --open-world mode, all types would be public or private, unless explicitly annotated as protected.

In your --closed-world GUFA example the type of $func would have to be marked public because it is in fact called from the environment.

@kripken
Copy link
Member

kripken commented Dec 4, 2024

I see what you mean now, thanks. Yeah, the type/values optimizability distinction is important here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants