This RFC proposes to add minimal support in the OCaml implementation for uniform arrays, which always use the standard Array_tag
representation and never Double_array_tag
.
The compiler would offer specialized built-in and C primitives to work with uniform arrays, and a minimal CamlinternalUniform_array
module in the standard library exposing these primitives.
This would make it easier to write some unsafe idioms that enforce uniform arrays in more complex ways today, simplifying the code of Domain.DLS
, the unboxed implementation of Dynarray
, and the Uniform_array
module of Jane Street's Base library and making it easier to reason about their correctness.
The OCaml representations of arrays is currently as follows:
-
There are two forms of arrays supported by the runtime:
- blocks with tag
Array_tag
followed byvalue
arguments - blocks with tag
Double_array_tag
followed bydouble
arguments
- blocks with tag
-
Elements of type
floatarray
are always represented asDouble_array_tag
blocks. -
If "flat float arrays" is disabled by the
-no-flat-float-array
flag, elements of some typesfoo array
are always represented asArray_tag
blocks. -
If "flat float arrays" is enabled (this is the default setting), there is a dynamic check on array creation, and
Array_tag
orDouble_array_tag
is used depending on whether the initializing value is a double.Then all array operations perform a dynamic check on the tag to perform the right computation depending on the operation.
Note: the runtime supports storing float
values inside Array_tag
blocks, although there is no safe-OCaml way to do it. This is used implicitly by our current implementation of Domain.DLS
, which uses an Obj.t array
as an heterogeneous array, where some elements may have type float
. This is okay as long as the array was not initialized with a float
element.
We propose to add a new OCaml type of uniform arrays, that are always represented as blocks with the tag Array_tag
, never Double_array_tag
.
Runtime support functions (in runtime/array.c
) and compiler primitives are extended to offer specialized versions for uniform arrays.
Having uniform arrays available make it easier to reason about the safety of some unsafe use-cases, similar to the Obj.t array
of Domain.DLS
: using Obj.t CamlinternalUniform_array.t
makes it easier to reason about the code.
We know of the following potential users:
-
Jane Street's Base library exposes a Uniform_array module precisely for this reason -- they use it to implement "unboxed option arrays" for example. Upstream support would not remove the need for this module, but it would simplify the implementation and may provide some speed benefits in some case -- for example they could reuse the runtime-provided
fill
,blit
,sub
,append
instead of reimplementing them in OCaml. -
Domain.DLS
could be defined using anObj.t CamlinternalUniform_array.t
, making its code easier to reason about. In particular, #12889 would benefit as it would make it easier to write thread-safe code manipulating DLS arrays -- currently the code generated bya.(i)
contains an allocation in thefloat
code path, preventing the use of[@poll error]
on the relevant function. -
CamlinternalUniform_array.t
could also be used in the unboxed implementation ofDynarray
, #12885. For example, the implementation currently contains the following functions in aDummy.Array
module:let make n x ~dummy = if Obj.(tag (repr x) <> double_tag) then Array.make n (of_val x) else begin let arr = Array.make n (of_dummy dummy) in Array.fill arr 0 n (of_val x); arr end let copy a ~dummy = if Obj.(tag (repr a) <> double_array_tag) then Array.copy a else begin let n = Array.length a in let arr = Array.make n (of_dummy dummy) in Array.blit a 0 arr 0 n; arr end
None of this would be necessary if uniform arrays were available.
This would be implemented as an internal module of the standard library, CamlinternalUniform_array
. "Proper" support as a Uniform_array
in the standard library is not planned for now.
We propose that CamlinternalUniform_array
remain a minimal module that exposes only the operations that benefit from runtime support. This is not a user-facing module so it needs not offer all the niceties of Array
and Float.Array
.
We are considering three different approaches to define the uniform array type:
- abstract type:
type 'a t
- built-in type:
type 'a t = 'a uniform_array
- private type:
type 'a t = private 'a array
Having a built-in type requires more work and adds more baggage to the compiler, but it makes it easy for optimizers to reason specifically on the uniform_array
type.
Having a private type exposes to users the fact that the representation is compatible with 'a array
, which could be convenient for some use-cases -- Array
functions that consume arrays could be called on uniform arrays; but this does not work for functions which return arrays, such as Array.map
.
Having an abstract type leaves the door open to use the built-in or private approach later. This is the default proposal.
This would not be necessary if -no-flat-float-array
became the default, and over time the only supported mode. This would have various benefits (it simplifies the type theory) but also the usual, obvious downsides that it makes a lot of simple, natural code written using float array
much slower overnight.
There is no consensus towards this invasive change today, and good arguments against it.
Note that the current proposal would not be particularly costly if -no-flat-float-array
later became the default. At that point, "uniform arrays" and "arrays" would be the exact same type, and we end up with a trivial and useless camlinternalUniform_array
module. This is no big deal.
The rationale for OCaml uniform arrays is to make it easier to write certain kinds of unsafe code. The alternative is to write this code in C: the core of Domain.DLS
and Dynarray
could be implemented in C rather than OCaml.
Currently my impression is that writing FFI code from C is even more unsafe and difficult than writing unsafe code in OCaml, in particular due to the very weak typing of OCaml values on the C side that makes it very easy for simple mistakes to slip in. For both Domain.DLS
and Dynarray
I personally preferred "OCaml code with weird hacks and some informal static reasoning about array creation" over "C code".
(Note: C code arguably remains the better choice when reasoning about asynchronous callbacks is necessary, as the runtime functions currently provide much better control over asynchronous callbacks than plain OCaml code.)
It may be possible to trick the compiler into selecting the non-float representation by lying about the array type:
module Lying_array : sig
type 'a t
val get : 'a t -> int -> 'a
val set : 'a t -> int -> 'a -> unit
end = struct
type 'a lie = ('a * unit) array (* some type the compiler knows is non-float *)
external of_lie : 'a lie -> 'a = "%identity"
external to_lie : 'a -> 'a lie = "%identity"
type 'a t = 'a lie array
let get a i = of_lie a.(i) (* [@poll error] works here *)
let set a i v = a.(i) <- to_lie v
end
I see two downsides:
-
I do not know whether this choice would work well with optimizers making assumptions based on type information. (In general lying about types in this way is not a good idea because of Flambda and other potential optimizers.) On the other hand, once types are erased this approach only lies in that it uses the
Paddrarray
kind for array operations, and:- This is what the
-no-flat-float-array
mode uses for generic array primitives today, so it must be safe (assuming Flambda people also test that mode), and - This is also what our proposal would use for the
uniform_array
primitives, so we want optimizers to not assume to much fromPaddrarray
anyway -- it may in particular containint
or evendouble
values.
- This is what the
-
For array operations that are implemented in the runtime (
blit
,fill
, etc.), we still need to carefully think about whether to reimplement them in OCaml for that type, or somehow cast our lying arrays into normal arrays to call the runtime. The end result is more complexity rather than less complexity, something that we hope to avoid by adding first-class support for uniform arrays in the runtime.