Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unboxed is not a good default unboxed vector #250

Open
Lysxia opened this issue Jun 28, 2019 · 34 comments
Open

Unboxed is not a good default unboxed vector #250

Lysxia opened this issue Jun 28, 2019 · 34 comments

Comments

@Lysxia
Copy link

Lysxia commented Jun 28, 2019

Currently the documentation (or lack of it) suggests to use Unboxed if you want unboxed, low-overhead vectors. In fact the wiki linked in the description says:

End users should use Data.Vector.Unboxed for most cases

However, in my experience, it's really not the better default, as opposed to Storable (or perhaps Primitive). Here I propose to update the documentation to make Unboxed less prominent, and even explicitly discourage its use (to counter the natural appeal of its name).

The Unbox class is hard to understand

It involves

  • Multi-param type classes, and the class synonym trick (Unbox)
  • Data type families (which I think are even more niche than plain type families)
  • The names MVector and Vector are used both for classes and data type families, adding to the confusion
  • PrimMonad (should monads even be a prerequisite to use efficient vectors for user-defined types?)

Hence, most new users would probably be unable to complete the sketched implementation of Unbox for Complex in the documentation of Data.Vector.Unboxed.

Furthermore, because of the methods parameterized by PrimMonad instances, it's not possible to use DerivingVia to abstract any of those details. So you either pay 30 lines of boilerplate, or use CPP/Template Haskell. Even assuming users are capable of adapting that boilerplate to their own types, that doesn't look "very easy" (quoted from the documentation).

Unboxed is not the most efficient

In addition, although Unboxed is touted as efficient because vectors of tuples are merely tuples of vectors, a flat Storable vector is actually more efficient in almost all cases: it has much fewer indirections and having every entry in a contiguous piece of memory makes it more cache-friendly. The Storable class is also so much easier to understand and use than Unbox. Access patterns taking advantage of Unboxed's layout are arguably rare.

That means this sentence at the top of Data.Vector.Primitive (which is similar to Storable in those respects) is incorrect:

Adaptive unboxed vectors defined in Data.Vector.Unboxed are significantly more flexible at no performance cost.

Storable vs Primitive

Instead of Unboxed, either Storable or Primitive should be promoted instead.

I'm still uncertain about the trade-offs of who should manage a vector's memory.

  • Although Storable is originally an FFI thing, it still seems to do a fine job for general-purpose programming, and it doesn't have the 2x factor of GC-controlled memory, as opposed to Primitive.

  • MagicHash is a recurrent point of confusion, and that also plays against Prim from a documentation/usability point of view.

I'm sure there are advantages to Primitive, but I'm not sure they are sufficiently strong to make it a better default than Storable.

@Shimuuar
Copy link
Contributor

My 2c

Working with unboxed data in haskell is somewhat awkward and require some sort of boilerplate no matter what. With Storable one has write boilerplate instance as well. And boilerplate is necessary with any approach. One has to describe how data is laid out in memory.

Also Unboxed comes with different tradeoffs compared to storable. Former implements structure of arrays (by convention), latter array of structures (necessary). Having side by side comparison of them would be nice

P.S. Storable does not have instances for tuples which caused some problems to me.

@cartazio cartazio changed the title Unboxed is not a good default unboxed vector defining new unboxed vector instances should be easier than it currently is. Jun 28, 2019
@Lysxia
Copy link
Author

Lysxia commented Jun 28, 2019

I disagree with the title change.

I don't think the cost-to-benefit ratio of reworking Unbox is as good as simply pointing people towards Storable more. My mind could be changed if someone proposed a concrete and compelling way to improve Unbox that would make it on par or easier to work with than Storable.

Storable boilerplate is significantly easier to manage than Unbox. It can be derived generically, for example using derive-storable. Although I don't really agree with this particular implementation, it is a concrete example of what's possible without extra-linguistic features such as CPP or Template Haskell.

@cartazio
Copy link
Contributor

theres definitely room for improvements. And I do agree that the SOA vs AOS (struct of array vs array of structs) is conflated with unpinned vs pinned bytearrays.

currently we have SOA unpinned == unboxed
and AOS C Style struct pinned arrays == storable.

it would be totally valid to have some flavors of AOS unpinned and SOA pinned.

Primitive is just a helper for boot strapping.

theres definitely

deriving via doesn't work for the same reason that Generalized New Type Deriving. The Pure and Mutable vector apis need the monad (resp primmonad) constraints on the generic versions for evaluation ordering (resp effects), and Role annotations currently. This was a HUGE pain point when Roles were added to Monad and friends..

as for Tuple storable instances, the theres been a number of discussions around this, and problem with having a Storable a, Storable b => Storable (a,b) instance is that theres no portable/always correct way to do the placement / alignment etc correctly that maps to the associated C Struct layout! so such an instance is fundamentally not going to wokr

@Lysxia tough, :) , i'm turning this into an action item to improve stuff. :)

@cartazio
Copy link
Contributor

  1. improving the UX for all flavors of vectors is valuable

  2. benchmarks for performance characteristics across various datatypes across applicable representations are valued and please share them!

@Lysxia
Copy link
Author

Lysxia commented Jun 28, 2019

as for Tuple storable instances, the theres been a number of discussions around this, and problem with having a Storable a, Storable b => Storable (a,b) instance is that theres no portable/always correct way to do the placement / alignment etc correctly that maps to the associated C Struct layout!

Fair point. I wonder whether it would be worth having a clone of Storable explicitly not meant to interface with C.

@cartazio
Copy link
Contributor

as an example of how to write your own, heres one I wrote a while ago but not suitable for inclusion in vector i think
https://github.com/wellposed/numerical/blob/master/src/Numerical/Data/Vector/HPair.hs

@cartazio
Copy link
Contributor

@Lysxia absolutely! theres some design choices to be made there, and thats gonna take some time and experimentation

@cartazio
Copy link
Contributor

At one extreme there’s even research projects around fancy layouts in dbs like https://db.cs.cmu.edu/papers/2016/arulraj-sigmod2016.pdf (plus my own experiments in my numerical Haskell code base that predates the cmu paper ).

A big challenge is of course engineering complexity vs performance vs user facing api ux. And in some respects it’s still in general a career sized research problem to build tools that a jointly best in all three ways

@choener
Copy link

choener commented Aug 30, 2019

One performance-related comment:
I use the vector library in my PrimitiveArray and ADPfusion libraries for dynamic programming. When switching between Unboxed vs Storable vectors, I have noticeable better performance with Unboxed.

My NeedlemanWunsch implementation clocks at 0.4 seconds for Unboxed, and 0.6 seconds for Storable with 'Int' as the underlying type for the elements of the vector.

I have not yet investigated why Storable is slower, and if it is my fault, or a problem of vector or Storable.

@Shimuuar Shimuuar changed the title defining new unboxed vector instances should be easier than it currently is. Unboxed is not a good default unboxed vector Jun 11, 2020
@Shimuuar
Copy link
Contributor

I've reverted to original title. This issue mostly about comparing advantages and drawbacks of different vector types. Good deriving of Unbox instances is important and reserve its own issues (it's big topic after all)

@Bodigrim
Copy link
Contributor

My 2p :)

Are Unboxed vectors more flexible than Primitive / Storable? IMHO they are, because there are more Unboxed instances available out of the box than Primitive / Storable instances.

Are Unboxed vectors slower than Primitive / Storable, when both instances are available from vector? No, I believe they are not, by virtue of being backed by the same structures.

So at the moment I do not see a reason to change vector documentation. Obviously, we may review it, once more efficient Storable / Primitive instances for tuples will be available.

@Shimuuar
Copy link
Contributor

More complicated question is when we start considering records. Unboxed vectors allows for O(1) zip/unzip which is very useful when you want to work on columns of table. Also is GHC able to avoid useless reads when only part of structure is demanded: map (\x -> fieldFoo x ^ 2)?

I also think that it's worthy to investigate performance of different vector implementations. We could find quite a few surprises

@cartazio
Copy link
Contributor

cartazio commented Jun 11, 2020 via email

@cartazio
Copy link
Contributor

cartazio commented Jun 11, 2020 via email

@lehins
Copy link
Contributor

lehins commented Jun 12, 2020

Unbox has benefits and drawbacks.

On the downside:

  • Compile time and memory usage are bad Inlining functions of Unboxed vector consumes a lot of memory #274, but I hope one day it will get fixed
  • Hard to write custom instances. I have a solution that worked for me quite well: Deriving of unboxed vector for newtypes #315 (comment)
  • Sometimes it is not good for cache locality, so for some data types it can have lower performance then let's say for storable instance which will have all of components of individual element next to each other in memory. This one I haven't really seen to give me much trouble, so I don't particularly consider it a problem, but had to mention it anyways.
  • Hopefully one day we will get to adding SIMD instructions to vector, when we do, unboxed vectors will not be that useful because for efficient usage of data parallelization actual data has to be next to each other.
  • Storable was designed for FFI and Primitive vector, contrary to the common belief, can still be safely used with FFI in many situations, while Unboxed vectors are simply unsuitable for FFI.
  • I've seen primitive array perform better on rare occasions when compared to unboxed with elements of the same type in some complicated setups. This is likely due to the fact that unboxing machinery makes it a bit harder for ghc to perform the same optimizations.

On the upside:

  • We can have instances for product types, which Prim cannot allow by itself and Storable does not provide by default. (I don't understand why base does not provide such tuple instances, to me it seems like an oversight. What Carter mentioned about alignment and associated C struct is something I don't agree with, but that is not the point of this issue so let's not worry about it)
  • Unlike Storable, both Unboxed and Primitive use unpinned memory for small allocations under ~3.3K which is great for avoiding memory fragmentation. But when large chunks of memory are allocated we can avoiding expensive copies during GC, because then they are allocated as pinned. In other words we get the best of both worlds.
  • Theoretically you will index only one array behind the Vector (a, b) when you just need a or b, but not both, although I am not sure it actually works that way in practice.
  • Has slightly lower memory overhead than Storable, because ForeignPtr besides the pointer actual Addr# also carries around the MutableByteArray# in a sum type. This can only be noticeable for very small vectors, which Storable shouldn't be used for anyways.

Above points are things that I learned about differences about vector just of top of my head. The conclusion I can say about all this is that there is not one best default array type, so suggesting "use unboxed" whenever you can is not the right one, they all have their uses. Which means we just need to document it better. What my suggestion would be:

  • Use primitive vector for data types that have Prim instance.
  • Use storable if you need FFI
  • Use unboxed or storable if you have more complicated types that don't have Prim instance, but you have instances available of either Unbox or Storable, with preference that depends on the use cases
  • Use boxed when none of above are applicable.

@cartazio
Copy link
Contributor

cartazio commented Jun 12, 2020 via email

@lehins
Copy link
Contributor

lehins commented Jun 12, 2020

@cartazio Wouldn't such a PR would have to go into primitive?

@cartazio
Copy link
Contributor

cartazio commented Jun 12, 2020 via email

@lehins
Copy link
Contributor

lehins commented Jun 12, 2020

@cartazio I see what you mean and I am definitely sure that vector is not the place for this. What I would do is adjust Prim class to support this sort of unboxing. In the matter of fact I already have a a perfectly working and tested implementation for this stuff in a closed source repository, which I plan to open up and release on Hackage hopefully later on this summer. Spoiler alert: unboxing Maybe and Either in a way you described gives tremendous improvements in speed, which is definitely worth wasting this extra space.

The problem with this sort of unboxing though is that the primops that are needed for achieving this are only available starting with GHC 8.6 (readWord8ArrayAsDouble# ...), which means if we want it for unpinned array we have to loose support for all those older ghc that people love so much.

The reason why I am saying that vector is not the right place for this functionality is because:

  • there are already too many Vector types in vector, adding a new one AOS is no go.
  • We need to support old GHCs which will not allow us this sort of unboxing

What I can recommend though, if you are keen on working on this creating a separate package that creates a new vector type in the same way that we already Boxed, Unboxed, Primitive, and Storable

@cartazio
Copy link
Contributor

cartazio commented Jun 12, 2020 via email

@lehins
Copy link
Contributor

lehins commented Jun 12, 2020

I’m pretty sure we don’t need that primop you refer to ... what makes you
think we do?

If want to read/write from/into a ByteArray anything that is larger than one byte, eg. Word64 at any location, which is what you would need if you wanted to unbox Either Word64 Word32, you would have to reserve a byte for a flag (ie. Left/Right) and 8 bytes for content, then your second element of the array will start at 9th byte and the actual Word64 or Word32 would be written at 10th byte. Now tell me, which primop would you use to write into an unpinned ByteArray# a Word64 value at the offset of 10 bytes?

@cartazio
Copy link
Contributor

cartazio commented Jun 12, 2020 via email

@Bodigrim
Copy link
Contributor

For an MVP one can possibly spend a full word on a tag.

@cartazio
Copy link
Contributor

cartazio commented Jun 14, 2020 via email

@nh2
Copy link
Member

nh2 commented Mar 13, 2021

adding SIMD instructions to vector, when we do, unboxed vectors will not be that useful because for efficient usage of data parallelization actual data has to be next to each other.

@lehins This one surprises me a bit. Isn't it that for most SIMD, SoA is preferrable?

For example:

@cartazio
Copy link
Contributor

@nh2 is correct

@lehins
Copy link
Contributor

lehins commented Mar 13, 2021

@nh2 is always correct 😄 but not this time 😉

Unboxed vectors in Haskell are useless with SIMD precisely because data is not next to each other. Let me explain.

Lets say we have a data type

data Point = Point { x :: Double, y :: Double}

A vector of points is unboxed as two vectors of doubles:

Vector Point => (Vector Double, Vector Double)

All is great because normally that is exactlywhat we want with SIMD. Let's say we want to add two vectors of points:

(a  :: Vector Point) + (b :: Vector Point) = (c :: Vector Point)

what we need is x component of a added to x component of b and stored in x of c and the same for y component. And AoS allows to treat x and y separate and thus makes parallelization possible. However in Haskell in order to perform an operation on a Point we have to construct a Point type first, which completely destroys this ability to parallelize because we can no longer treat x and y independently. In other words if you want to use SIMD and SoA you'll have to use separate vectors of double for x and y components yourself, because then you can treat them separately and you don't have to construct a Point when indexing a vector.

In other words what we want is:

a[0].x + b[0].x
a[1].x + b[1].x
...
a[n].x + b[n].x
---
a[0].y + b[0].y
a[1].y + b[1].y
...
a[n].y + b[n].y

where + is parallelized with SIMD, but unboxing technique in vector forces us to do this:

Point (a[0].x) (a[0].y) + Point (b[0].x) (b[0].y)
Point (a[1].x) (a[1].y) + Point (b[1].x) (b[1].y)
...
Point (a[n].x) (a[n].y) + Point (b[n].x) (b[n].y)

simply because we can not operate on a Point without constructing it first.

Now, if x and y components were stored one after another in memory then we would be able to use SIMD efficiently in the above scenario.

@nh2 is always correct 😄

So, joking aside you are totally right that SoA is much better for parallelization and that is exactly the approach I would recommend too. But unfortunately the safety of Haskell type system makes this storage layout useless for Unboxed vectors in vector package.

@cartazio
Copy link
Contributor

i think the intended meaning is "the layout lets you hand write the correct instances" rather than "automatic simd", which we are very far from ever having

@cartazio
Copy link
Contributor

the point of vector, or perhaps (for reasons of stability whatever comes next thats not vector), is to "drive" providing better fusion/optimization machinery in ghc. if thats no longer vector, thats fine too. That we cant do the right "through the constructor" homomorphism MEANS its an opportunity to make ghc better for these sorts of work, not to say its not practical / impossible

@cartazio
Copy link
Contributor

I’ll try to share a grounded technical walk through of what I mean and how it would work sometime on the next few days. I’m a tad fried from a really busy work week and if that came across coarsely it’s just me feeling, in my own head, that my reasoning is obvious .

I do hope folks can engage in poking at ambiguous parts of said exposition when I share it. Again because What’s obvious is different to different folks

@cartazio
Copy link
Contributor

cartazio commented Mar 13, 2021

so, the short version is: whats missing to do the right "flat" Struct of Array SIMD in user space with vector is the following representation mapping being exposed (perhaps in a .unsafe/.internal interface):

using the types in (see link for the associated vector instances )https://github.com/wellposed/numerical/blob/6b458232760b20674487bd9f8442b0991ce59423/src/Numerical/Data/Vector/HPair.hs#L61-L72

--- core defs from that file 
--- we  use Hprod a , where a :: *->* 
--- because in the context i wrote it, i was mixing having leaves that were various different base vectors, and theres another -- example in the same directory that users data families rather than gadts, they have surprising trade off
data HProd a  where
    HPair :: HProd a-> HProd a  -> HProd a
    HUnit :: a -> HProd a

data  VHProd  (prd:: HProd ( * -> * )) val where
    VHLeaf ::  !(v a) -> VHProd   ('HUnit v) a
    VHNode  :: !(VHProd  pra a) -> !(VHProd  prb b ) ->VHProd  ('HPair  pra prb) (a,b)

data  MVHProd   (prd:: HProd (* -> * -> *) ) (st :: * ) val where
    MVHLeaf :: !(mv  st a) -> MVHProd   ('HUnit mv) st  a
    MVHNode  :: !(MVHProd pra st a) -> !(MVHProd   prb   st b ) -> MVHProd  ('HPair pra prb) st (a,b)

then you write something like these

type family ProductOfPrimVect ::  ( a : *)->   HProd Prim.Vector (PrimProductRep a )
type family PrimProductRep :: * -> * --- this would be like a ghc generic rep, but only for product types,
--- and should match the tuple associativity of how unboxed would be paired up in this rep 
unboxedExplode :: Unboxed.Vector a -> VHProd (ProductOfPrimVect a) (ProductRep a)

then what a person using simd could do is "pick out" the right Prim vectors that are packed up in the underlying unboxed vector and do the SOA operations thereof. no constructor barrier

i was actually working out a plan that would lay the ground work for this in primitive, but well, the past year made being motivated to do the supportring work a tad hard motivation wise .

theres def some fuzzy bits in this sketch, but the point being, subject to having a strong contract about what Prim instances are legal if you dont wrap them in some sort of new type that annotates "what layout youre choosing" for composite instances, you can actually make this a pretty save to expose "unsafe" rep to build SOA primops.

please ask what parts of this are vague (and do look at the HPair.hs and Pair .hs files in the associated repo i linked)

edit: to be clear: this explode would need to be part of writing instances for unboxed vectors, but theres ways to make that work without hurting / breaking users

edit2: or we just export in a .internal all the constructors for Unboxed vector data family instances and users can do this themselves (do we do that yet?)

@Boarders
Copy link
Contributor

@lehins Sorry if I am being daft but I don't see how the point constructor enters into the construction of an unboxed vector whatsoever. Why is that relevant here?

@lehins
Copy link
Contributor

lehins commented Mar 14, 2021

@cartazio I have no idea what you are proposing here. Some day we'll get SIMD support in GHC then you can submit a PR with this functionality that you are suggesting.

Sorry @Boarders I currently don't have much time to explain this in more detail, I guess the best suggestion I have for you that could clear it up is for you to try implementing the example I presented using either the SIMD primops instructions that use LLVM or intrinsic from gcc on the C side over FFI then you will see the problem right away. Until I tried implementing it myself it was hard for me to see the problem too.

One way or another this is not the ticket that discusses SIMD, it is about calling Unboxed vectors default. So if anyone would like to discuss it further here is a gitter that can be used for such discussions: https://gitter.im/haskell/vector Once we do get proper support for SIMD in GHC, please open a new ticket about it.

@nh2
Copy link
Member

nh2 commented Mar 28, 2021

Has slightly lower memory overhead than Storable, because ForeignPtr besides the pointer actual Addr# also carries around the MutableByteArray# in a sum type. This can only be noticeable for very small vectors, which Storable shouldn't be used for anyways.

For the curious, I just measured the memory overhead of Unboxed vs Storable, and found:

  • Data.Vector.Storable.Vector Int has 64 Bytes overhead
  • Data.Vector.Unboxed.Vector Int has 48 Bytes overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants