Outline of the model rewrite project

The model needs some fairly invasive changes to get it to a state where it can reliably handle complex libraries such as `Core`. This document outlines some of the changes that are required.

Changes to paths

No more GADTs

The current implementation of the path types uses GADTs with polymorphic variants for indices. This allows us to give nice types to some functions (e.g. 'a Reference.Resolved.t -> 'a Identifier), and avoids boxing when moving between different kinds of path. However, OCaml is still not great at working with these types making them very hard to use sometimes. It also obfuscates the underlying constraints on paths, which somewhat defeats the point of having such precise types.

This should be replaced by just using multiple types for each sort of path. For example, instead of representing resolved paths by a single type with indices like [`Module] and [`Module_type], we should have one type for resolved module paths and another type for resolved module type paths.

Different types for different names

Currently we just use strings for all sub-parts of paths and identifiers. This occasionally allows stupid bugs to get through. Instead we should use separate abstract types (e.g. ModuleName.t, TypeName.t).

More principled handling of functors

Currently, identifiers that point into the body of paths just apply ordinary module projections to the functor itself. Meanwhile identifiers in the parameters of functors use a special argument projection which takes the name and index of the parameter. For example, given:

module F (X : ...) (Y : sig type param end) : sig type res end

We refer to res using an identifier like Type(Module(root, "F"), "res") and refer to param using an identifier like Type(Argument(Module(root, "F"), "Y", 1), "param").

These identifiers turn out to be quite awkward to work with. Instead we should have an explicit Result projection that goes down the right of the functor arrow, and a simple Parameter projection that goes down the left of the functor arrow.

Separate module alias substitutions from functor argument substitutions

Currently, the SubstAlias constructor is used for both module alias substitutions and functor argument substitutions. However, these two kinds of substitutions should really be treated differently.

Firstly, a module alias substitution should be performed on the substituted module itself, whilst a functor argument substitution should only be performed on projection. For example, if we have SubstAlias(M, N) (i.e. replace M with N) it should refer to the identifier of N if its for a module alias, but still refer to M if it is functor argument. Whereas a path like Type(SubstAlias(M, N)), t) should refer to the identifier of N.t for both module aliases and functor arguments.

Secondly, we don’t want to perform a module alias substitution if the right-hand side of the substitution is hidden, whereas for a functor argument substitution we must perform this substitution regardless. Currently, this is handled by trying to only insert the substitution for module aliases if the right-hand side is not hidden, however hiddenness cannot be reliably determined until the “linking” phase, so this is fragile and unreliable.

Add an optional replacement string for hidden paths

Currently we do not try to eliminate hidden paths. Adding an optional replacement string will make it easy to eliminate them in many cases.

Use a reference for right-hand side of canonical

Currently the right-hand side of a `Canonical` constructor – which is the canonical path that should be used as long as it can be resolved – is a `Path.t`. Really it should be a `Reference.t` since it comes from a documentation comment.

Cross-package references

Currently we don’t provide support for cross-package references (other than to compilation dependencies). Such references should require explicitly naming the package that they refer to. This will require adding a reference syntax for explicitly naming the package.

Note that we’ve already extended the syntax of references to allow disambiguating each part of the path we’re referring too. The new syntax has been discussed in these two issues:

https://github.com/ocaml-doc/odoc/issues/61#issuecomment-319974223
https://github.com/ocaml-doc/odoc/issues/90

Hopefully we can keep a consistent syntax to disambiguate the “package” part.

Fresh implementation of components data structure

The current implementations of path resolution, reference resolution and expansion are fundamentally flawed. In particular, the components data structure which drives resolution is not fit for purpose and cannot even be used for expansion. Fixing the underlying issues with the components data structure should greatly simplify resolution and allow us to use it for expansion – deleting the current broken, unfinished and extremely inefficient implementation of expansion.

The best approach here is probably to start from scratch – the existing implementation is hard to follow and most of its complexity shouldn’t appear in the new version.

Problems with the current implementation

The current implementation of the components data structure is designed so that given:

module M = struct
  module type S = sig ... end
  module N : S
end
module O : M.S

the same value can be used unchanged for the components of M.S, M.N and O. In other words, we do not distinguish identifier projection from path projection and the renaming operation is a no-op.

This turns out to be a terrible design decision. It causes, at least, the following serious issues:

It dramatically complicates the implementation of functor parameter substitution.
It means that the components value cannot calculate the resolved path or reference to a definition. Instead we must build the resulting path or reference in parallel with calculating the components value.
It means that the components value cannot calculate the expansion of the represented module or module type. Instead we are forced to calculate expansions without using components at all.
It doesn’t allow us to correctly track properties like “hidden” or “canonical”. For example, if S is marked as hidden above then N and O will also be treated as hidden.

Other issues with the current implementation include:

It keeps separate tables in signatures for the different kinds of reference. This adds a lot of additional code for no real gain.

It implements module extension and destructive substitution too naively. It does not handle cases like:

module type A = sig
  module M : sig module type S end
  module N : M.S
end

module B : sig module type S = sig type t end end

module C : A with module M = B

type t = C.N.t (* This path is not currently resolved *)

New implementations of compiling and linking phases

Currently, the different stages of path resolution, reference resolution and expansion are run multiple times across both the compiling and linking phases. This should be tidied up so that the different stages happen at the appropriate time.

Problems with the current implementation

The current implementation doesn’t have a clear order in which different operations should happen. It mixes together operations which should happen at different phases, and can perform some operations too early leading to incorrect results.

There is only a single “path and reference resolution” operation, even though forward paths and references should not be resolved until the link phase.
Module aliases to hidden paths are replaced by their expansion by the expansion operation, even though this replacement must happen during the link phase.
Paths involving module aliases are represented either with or without SubstAlias, depending on whether the aliased path is hidden. This decision is made during path resolution, however the hiddenness of a module may not be correctly available until the link phase.

(Optional) More abstract external interface for paths

Currently consumers of the model have to use the representations of paths directly. There are helper functions (e.g. to convert paths to identifiers) but it still requires an understanding of the various book-keeping constructors in the path representations. Instead we would like a more abstract interface to hide these details.

Outline of the model rewrite project

Changes to paths

No more GADTs

Different types for different names

More principled handling of functors

Separate module alias substitutions from functor argument substitutions

Add an optional replacement string for hidden paths

Use a reference for right-hand side of canonical

Cross-package references

Fresh implementation of components data structure

Problems with the current implementation

New implementations of compiling and linking phases

Problems with the current implementation

(Optional) More abstract external interface for paths

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally