Skip to content

Commit

Permalink
Merge pull request #91 from WebAssembly/principal
Browse files Browse the repository at this point in the history
State principal types
  • Loading branch information
rossberg authored Feb 21, 2023
2 parents 3a882f2 + 138095c commit 314a335
Show file tree
Hide file tree
Showing 2 changed files with 124 additions and 3 deletions.
2 changes: 1 addition & 1 deletion document/core/appendix/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ Appendix

embedding
implementation
properties
algorithm
custom
properties
changes

.. only:: singlehtml
Expand Down
125 changes: 123 additions & 2 deletions document/core/appendix/properties.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. index:: ! soundness, type system
.. _soundness:

Soundness
---------
Type Soundness
--------------

The :ref:`type system <type-system>` of WebAssembly is *sound*, implying both *type safety* and *memory safety* with respect to the WebAssembly semantics. For example:

Expand Down Expand Up @@ -1006,3 +1006,124 @@ Consequently, given a :ref:`valid store <valid-store>`, no computation defined b
.. [#cite-fm2021]
Machine-verified formalizations and soundness proofs of the semantics from the official specification are described in the following article:
Conrad Watt, Xiaojia Rao, Jean Pichon-Pharabod, Martin Bodin, Philippa Gardner. |FM2021|_. Proceedings of the 24th International Symposium on Formal Methods (FM 2021). Springer 2021.
.. index:: type system

Type System Properties
----------------------

.. index:: ! principal types, type system, subtyping, polymorphism, instruction, syntax
.. _principality:

Principal Types
~~~~~~~~~~~~~~~

The :ref:`type system <type-system>` of WebAssembly features both :ref:`subtyping <match>` and simple forms of :ref:`polymorphism <polymorphism>` for :ref:`instruction types <syntax-instrtype>`.
That has the effect that every instruction or instruction sequence can be classified with multiple different instruction types.

However, the typing rules still allow deriving *principal types* for instruction sequences.
That is, every valid instruction sequence has one particular type scheme, possibly containing some unconstrained place holder *type variables*, that is a subtype of all its valid instruction types, after substituting its type variables with suitable specific types.

Moreover, when deriving an instruction type in a "forward" manner, i.e., the *input* of the instruction sequence is already fixed to specific types,
then it has a principal *output* type expressible without type variables, up to a possibly :ref:`polymorphic stack <polymorphism>` bottom representable with one single variable.
In other words, "forward" principal types are effectively *closed*.

.. note::
For example, in isolation, the instruction :math:`\REFASNONNULL` has the type :math:`[(\REF~\NULL~\X{ht})] \to [(\REF~\X{ht})]` for any choice of valid :ref:`heap type <syntax-type>` :math:`\X{ht}`.
Moreover, if the input type :math:`[(\REF~\NULL~\X{ht})]` is already determined, i.e., a specific :math:`\X{ht}` is given, then the output type :math:`[(\REF~\X{ht})]` is fully determined as well.

The implication of the latter property is that a validator for *complete* instruction sequences (as they occur in valid modules) can be implemented with a simple left-to-right :ref:`algorithm <algo-valid>` that does not require the introduction of type variables.

A typing algorithm capable of handling *partial* instruction sequences (as might be considered for program analysis or program manipulation)
needs to introduce type variables and perform substitutions,
but it does not need to perform backtracking or record any non-syntactic constraints on these type variables.

Technically, the :ref:`syntax <syntax-type>` of :ref:`heap <syntax-heaptype>`, :ref:`value <syntax-valtype>`, and :ref:`result <syntax-resulttype>` types can be enriched with type variables as follows:

.. math::
\begin{array}{llll}
\production{heap type} & \heaptype &::=&
\dots ~|~ \alpha_{\heaptype} \\
\production{value type} & \valtype &::=&
\dots ~|~ \alpha_{\valtype} ~|~ \alpha_{\X{numvectype}} \\
\production{result type} & \resulttype &::=&
[\alpha_{\valtype^\ast}^?~\valtype^\ast] \\
\end{array}
where each :math:`\alpha_{\X{xyz}}` ranges over a set of type variables for syntactic class :math:`\X{xyz}`, respectively.
The special class :math:`\X{numvectype}` is defined as :math:`\numtype ~|~ \vectype ~|~ \BOT`,
and is only needed to handle unannotated |SELECT| instructions.

A type is *closed* when it does not contain any type variables, and *open* otherwise.
A *type substitution* :math:`\sigma` is a finite mapping from type variables to closed types of the respective syntactic class.
When applied to an open type, it replaces the type variables :math:`\alpha` from its domain with the respective :math:`\sigma(\alpha)`.

**Theorem (Principal Types).**
If an instruction sequence :math:`\instr^\ast` is :ref:`valid <valid-config>` with some closed :ref:`instruction type <syntax-instrtype>` :math:`\instrtype` (i.e., :math:`C \vdashinstrseq \instr^\ast : \instrtype`),
then it is also valid with a possibly open instruction type :math:`\instrtype_{\min}` (i.e., :math:`C \vdashinstrseq \instr^\ast : \instrtype_{\min}`),
such that for *every* closed type :math:`\instrtype'` with which :math:`\instr^\ast` is valid (i.e., for all :math:`C \vdashinstrseq \instr^\ast : \instrtype'`),
there exists a substitution :math:`\sigma`,
such that :math:`\sigma(\instrtype_{\min})` is a subtype of :math:`\instrtype'` (i.e., :math:`C \vdashinstrtypematch \sigma(\instrtype_{\min}) \matchesinstrtype \instrtype'`).
Furthermore, :math:`\instrtype_{\min}` is unique up to the choice of type variables.

**Theorem (Closed Principal Forward Types).**
If closed input type :math:`[t_1^\ast]` is given and the instruction sequence :math:`\instr^\ast` is :ref:`valid <valid-config>` with :ref:`instruction type <syntax-instrtype>` :math:`[t_1^\ast] \to_{{\!x^\ast}} [t_2^\ast]` (i.e., :math:`C \vdashinstrseq \instr^\ast : [t_1^\ast] \to_{{\!x^\ast}} [t_2^\ast]`),
then it is also valid with instruction type :math:`[t_1^\ast] \to_{{\!x^\ast}} [\alpha_{\valtype^\ast}~t^\ast]` (i.e., :math:`C \vdashinstrseq \instr^\ast : [t_1^\ast] \to_{\!x^\ast} [\alpha_{\valtype^\ast}~t^\ast]`),
where all :math:`t^\ast` are closed,
such that for *every* closed result type :math:`[{t'_2}^\ast]` with which :math:`\instr^\ast` is valid (i.e., for all :math:`C \vdashinstrseq \instr^\ast : [t_1^\ast] \to_{\!x^\ast} [{t'_2}^\ast]`),
there exists a substitution :math:`\sigma`,
such that :math:`[{t'_2}^\ast] = [\sigma(\alpha_{\valtype^\ast})~t^\ast]`.


.. index:: ! type lattice, subtyping, least upper bound, greatest lower bound

Type Lattice
~~~~~~~~~~~~

The :ref:`Principal Types <principality>` property depends on the existence of a *greatest lower bound* for any pair of types.

**Theorem (Greatest Lower Bounds for Value Types).**
For any two value types :math:`t_1` and :math:`t_2` that are :ref:`valid <valid-valtype>`
(i.e., :math:`C \vdashvaltype t_1 \ok` and :math:`C \vdashvaltype t_2 \ok`),
there exists a valid value type :math:`t` that is a subtype of both :math:`t_1` and :math:`t_2`
(i.e., :math:`C \vdashvaltype t \ok` and :math:`C \vdashvaltypematch t \matchesvaltype t_1` and :math:`C \vdashvaltypematch t \matchesvaltype t_2`),
such that *every* valid value type :math:`t'` that also is a subtype of both :math:`t_1` and :math:`t_2`
(i.e., for all :math:`C \vdashvaltype t' \ok` and :math:`C \vdashvaltypematch t' \matchesvaltype t_1` and :math:`C \vdashvaltypematch t' \matchesvaltype t_2`),
is a subtype of :math:`t`
(i.e., :math:`C \vdashvaltypematch t' \matchesvaltype t`).

.. note::
The greatest lower bound of two types may be |BOT|.

**Theorem (Conditional Least Upper Bounds for Value Types).**
Any two value types :math:`t_1` and :math:`t_2` that are :ref:`valid <valid-valtype>`
(i.e., :math:`C \vdashvaltype t_1 \ok` and :math:`C \vdashvaltype t_2 \ok`)
either have no common supertype,
or there exists a valid value type :math:`t` that is a supertype of both :math:`t_1` and :math:`t_2`
(i.e., :math:`C \vdashvaltype t \ok` and :math:`C \vdashvaltypematch t_1 \matchesvaltype t` and :math:`C \vdashvaltypematch t_2 \matchesvaltype t`),
such that *every* valid value type :math:`t'` that also is a supertype of both :math:`t_1` and :math:`t_2`
(i.e., for all :math:`C \vdashvaltype t' \ok` and :math:`C \vdashvaltypematch t_1 \matchesvaltype t'` and :math:`C \vdashvaltypematch t_2 \matchesvaltype t'`),
is a supertype of :math:`t`
(i.e., :math:`C \vdashvaltypematch t \matchesvaltype t'`).

.. note::
If a top type was added to the type system,
a least upper bound would exist for any two types.

**Corollary (Type Lattice).**
Assuming the addition of a provisional top type,
:ref:`value types <syntax-valtype>` form a lattice with respect to their :ref:`subtype <match-valtype>` relation.

Finally, value types can be partitioned into multiple disjoint hierarchies that are not related by subtyping, except through |BOT|.

**Theorem (Disjoint Subtype Hierarchies).**
The greatest lower bound of two :ref:`value types <syntax-valtype>` is :math:`\BOT` or :math:`\REF~\BOT`
if and only if they do not have a least upper bound.

In other words, types that do not have common supertypes,
do not have common subtypes either (other than :math:`\BOT` or :math:`\REF~\BOT`), and vice versa.

.. note::
Types from disjoint hierarchies can safely be represented in mutually incompatible ways in an implementation,
because their values can never flow to the same place.

0 comments on commit 314a335

Please sign in to comment.