Skip to content

Latest commit

 

History

History
143 lines (86 loc) · 8.89 KB

uniform-array.md

File metadata and controls

143 lines (86 loc) · 8.89 KB

(Camlinternal)Uniform_array: runtime- and compiler-supported uniform arrays

This RFC proposes to add minimal support in the OCaml implementation for uniform arrays, which always use the standard Array_tag representation and never Double_array_tag.

The compiler would offer specialized built-in and C primitives to work with uniform arrays, and a minimal CamlinternalUniform_array module in the standard library exposing these primitives.

This would make it easier to write some unsafe idioms that enforce uniform arrays in more complex ways today, simplifying the code of Domain.DLS, the unboxed implementation of Dynarray, and the Uniform_array module of Jane Street's Base library and making it easier to reason about their correctness.

Current status

The OCaml representations of arrays is currently as follows:

  1. There are two forms of arrays supported by the runtime:

    • blocks with tag Array_tag followed by value arguments
    • blocks with tag Double_array_tag followed by double arguments
  2. Elements of type floatarray are always represented as Double_array_tag blocks.

  3. If "flat float arrays" is disabled by the -no-flat-float-array flag, elements of some types foo array are always represented as Array_tag blocks.

  4. If "flat float arrays" is enabled (this is the default setting), there is a dynamic check on array creation, and Array_tag or Double_array_tag is used depending on whether the initializing value is a double.

    Then all array operations perform a dynamic check on the tag to perform the right computation depending on the operation.

Note: the runtime supports storing float values inside Array_tag blocks, although there is no safe-OCaml way to do it. This is used implicitly by our current implementation of Domain.DLS, which uses an Obj.t array as an heterogeneous array, where some elements may have type float. This is okay as long as the array was not initialized with a float element.

Proposal

We propose to add a new OCaml type of uniform arrays, that are always represented as blocks with the tag Array_tag, never Double_array_tag.

Runtime support functions (in runtime/array.c) and compiler primitives are extended to offer specialized versions for uniform arrays.

Rationale

Having uniform arrays available make it easier to reason about the safety of some unsafe use-cases, similar to the Obj.t array of Domain.DLS: using Obj.t CamlinternalUniform_array.t makes it easier to reason about the code.

We know of the following potential users:

  • Jane Street's Base library exposes a Uniform_array module precisely for this reason -- they use it to implement "unboxed option arrays" for example. Upstream support would not remove the need for this module, but it would simplify the implementation and may provide some speed benefits in some case -- for example they could reuse the runtime-provided fill, blit, sub, append instead of reimplementing them in OCaml.

  • Domain.DLS could be defined using an Obj.t CamlinternalUniform_array.t, making its code easier to reason about. In particular, #12889 would benefit as it would make it easier to write thread-safe code manipulating DLS arrays -- currently the code generated by a.(i) contains an allocation in the float code path, preventing the use of [@poll error] on the relevant function.

  • CamlinternalUniform_array.t could also be used in the unboxed implementation of Dynarray, #12885. For example, the implementation currently contains the following functions in a Dummy.Array module:

    let make n x ~dummy =
      if Obj.(tag (repr x) <> double_tag) then
        Array.make n (of_val x)
      else begin
        let arr = Array.make n (of_dummy dummy) in
        Array.fill arr 0 n (of_val x);
        arr
      end
    
    let copy a ~dummy =
      if Obj.(tag (repr a) <> double_array_tag) then
        Array.copy a
      else begin
        let n = Array.length a in
        let arr = Array.make n (of_dummy dummy) in
        Array.blit a 0 arr 0 n;
        arr
      end

    None of this would be necessary if uniform arrays were available.

Design questions/answers

Where to define this

This would be implemented as an internal module of the standard library, CamlinternalUniform_array. "Proper" support as a Uniform_array in the standard library is not planned for now.

Scope of the module

We propose that CamlinternalUniform_array remain a minimal module that exposes only the operations that benefit from runtime support. This is not a user-facing module so it needs not offer all the niceties of Array and Float.Array.

Type definition

We are considering three different approaches to define the uniform array type:

  • abstract type: type 'a t
  • built-in type: type 'a t = 'a uniform_array
  • private type: type 'a t = private 'a array

Having a built-in type requires more work and adds more baggage to the compiler, but it makes it easy for optimizers to reason specifically on the uniform_array type.

Having a private type exposes to users the fact that the representation is compatible with 'a array, which could be convenient for some use-cases -- Array functions that consume arrays could be called on uniform arrays; but this does not work for functions which return arrays, such as Array.map.

Having an abstract type leaves the door open to use the built-in or private approach later. This is the default proposal.

Alternative approaches

Enable -no-flat-float-array by default

This would not be necessary if -no-flat-float-array became the default, and over time the only supported mode. This would have various benefits (it simplifies the type theory) but also the usual, obvious downsides that it makes a lot of simple, natural code written using float array much slower overnight.

There is no consensus towards this invasive change today, and good arguments against it.

Note that the current proposal would not be particularly costly if -no-flat-float-array later became the default. At that point, "uniform arrays" and "arrays" would be the exact same type, and we end up with a trivial and useless camlinternalUniform_array module. This is no big deal.

Implement this sort of code in C

The rationale for OCaml uniform arrays is to make it easier to write certain kinds of unsafe code. The alternative is to write this code in C: the core of Domain.DLS and Dynarray could be implemented in C rather than OCaml.

Currently my impression is that writing FFI code from C is even more unsafe and difficult than writing unsafe code in OCaml, in particular due to the very weak typing of OCaml values on the C side that makes it very easy for simple mistakes to slip in. For both Domain.DLS and Dynarray I personally preferred "OCaml code with weird hacks and some informal static reasoning about array creation" over "C code".

(Note: C code arguably remains the better choice when reasoning about asynchronous callbacks is necessary, as the runtime functions currently provide much better control over asynchronous callbacks than plain OCaml code.)

Trick the OCaml compiler using array types

It may be possible to trick the compiler into selecting the non-float representation by lying about the array type:

module Lying_array : sig
  type 'a t
  val get : 'a t -> int -> 'a
  val set : 'a t -> int -> 'a -> unit
end = struct
  type 'a lie = ('a * unit) array (* some type the compiler knows is non-float *)
  external of_lie : 'a lie -> 'a = "%identity"
  external to_lie : 'a -> 'a lie = "%identity"
  
  type 'a t = 'a lie array
  let get a i = of_lie a.(i) (* [@poll error] works here *)
  let set a i v = a.(i) <- to_lie v
end

I see two downsides:

  • I do not know whether this choice would work well with optimizers making assumptions based on type information. (In general lying about types in this way is not a good idea because of Flambda and other potential optimizers.) On the other hand, once types are erased this approach only lies in that it uses the Paddrarray kind for array operations, and:

    1. This is what the -no-flat-float-array mode uses for generic array primitives today, so it must be safe (assuming Flambda people also test that mode), and
    2. This is also what our proposal would use for the uniform_array primitives, so we want optimizers to not assume to much from Paddrarray anyway -- it may in particular contain int or even double values.
  • For array operations that are implemented in the runtime (blit, fill, etc.), we still need to carefully think about whether to reimplement them in OCaml for that type, or somehow cast our lying arrays into normal arrays to call the runtime. The end result is more complexity rather than less complexity, something that we hope to avoid by adding first-class support for uniform arrays in the runtime.