Skip to content

Unboxed is not a good default unboxed vector  #250

Open
@Lysxia

Description

@Lysxia

Currently the documentation (or lack of it) suggests to use Unboxed if you want unboxed, low-overhead vectors. In fact the wiki linked in the description says:

End users should use Data.Vector.Unboxed for most cases

However, in my experience, it's really not the better default, as opposed to Storable (or perhaps Primitive). Here I propose to update the documentation to make Unboxed less prominent, and even explicitly discourage its use (to counter the natural appeal of its name).

The Unbox class is hard to understand

It involves

  • Multi-param type classes, and the class synonym trick (Unbox)
  • Data type families (which I think are even more niche than plain type families)
  • The names MVector and Vector are used both for classes and data type families, adding to the confusion
  • PrimMonad (should monads even be a prerequisite to use efficient vectors for user-defined types?)

Hence, most new users would probably be unable to complete the sketched implementation of Unbox for Complex in the documentation of Data.Vector.Unboxed.

Furthermore, because of the methods parameterized by PrimMonad instances, it's not possible to use DerivingVia to abstract any of those details. So you either pay 30 lines of boilerplate, or use CPP/Template Haskell. Even assuming users are capable of adapting that boilerplate to their own types, that doesn't look "very easy" (quoted from the documentation).

Unboxed is not the most efficient

In addition, although Unboxed is touted as efficient because vectors of tuples are merely tuples of vectors, a flat Storable vector is actually more efficient in almost all cases: it has much fewer indirections and having every entry in a contiguous piece of memory makes it more cache-friendly. The Storable class is also so much easier to understand and use than Unbox. Access patterns taking advantage of Unboxed's layout are arguably rare.

That means this sentence at the top of Data.Vector.Primitive (which is similar to Storable in those respects) is incorrect:

Adaptive unboxed vectors defined in Data.Vector.Unboxed are significantly more flexible at no performance cost.

Storable vs Primitive

Instead of Unboxed, either Storable or Primitive should be promoted instead.

I'm still uncertain about the trade-offs of who should manage a vector's memory.

  • Although Storable is originally an FFI thing, it still seems to do a fine job for general-purpose programming, and it doesn't have the 2x factor of GC-controlled memory, as opposed to Primitive.

  • MagicHash is a recurrent point of confusion, and that also plays against Prim from a documentation/usability point of view.

I'm sure there are advantages to Primitive, but I'm not sure they are sufficiently strong to make it a better default than Storable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions