-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unboxed is not a good default unboxed vector #250
Comments
My 2c Working with unboxed data in haskell is somewhat awkward and require some sort of boilerplate no matter what. With Storable one has write boilerplate instance as well. And boilerplate is necessary with any approach. One has to describe how data is laid out in memory. Also Unboxed comes with different tradeoffs compared to storable. Former implements structure of arrays (by convention), latter array of structures (necessary). Having side by side comparison of them would be nice P.S. Storable does not have instances for tuples which caused some problems to me. |
I disagree with the title change. I don't think the cost-to-benefit ratio of reworking
|
theres definitely room for improvements. And I do agree that the SOA vs AOS (struct of array vs array of structs) is conflated with unpinned vs pinned bytearrays. currently we have SOA unpinned == unboxed it would be totally valid to have some flavors of AOS unpinned and SOA pinned. Primitive is just a helper for boot strapping. theres definitely deriving via doesn't work for the same reason that Generalized New Type Deriving. The Pure and Mutable vector apis need the monad (resp primmonad) constraints on the generic versions for evaluation ordering (resp effects), and Role annotations currently. This was a HUGE pain point when Roles were added to Monad and friends.. as for Tuple storable instances, the theres been a number of discussions around this, and problem with having a @Lysxia tough, :) , i'm turning this into an action item to improve stuff. :) |
|
Fair point. I wonder whether it would be worth having a clone of |
as an example of how to write your own, heres one I wrote a while ago but not suitable for inclusion in vector i think |
@Lysxia absolutely! theres some design choices to be made there, and thats gonna take some time and experimentation |
At one extreme there’s even research projects around fancy layouts in dbs like https://db.cs.cmu.edu/papers/2016/arulraj-sigmod2016.pdf (plus my own experiments in my numerical Haskell code base that predates the cmu paper ). A big challenge is of course engineering complexity vs performance vs user facing api ux. And in some respects it’s still in general a career sized research problem to build tools that a jointly best in all three ways |
One performance-related comment: My I have not yet investigated why |
I've reverted to original title. This issue mostly about comparing advantages and drawbacks of different vector types. Good deriving of Unbox instances is important and reserve its own issues (it's big topic after all) |
My 2p :) Are Are So at the moment I do not see a reason to change |
More complicated question is when we start considering records. Unboxed vectors allows for O(1) zip/unzip which is very useful when you want to work on columns of table. Also is GHC able to avoid useless reads when only part of structure is demanded: I also think that it's worthy to investigate performance of different vector implementations. We could find quite a few surprises |
To elaborate on @bodigrims point , unboxed vectors are exactly the set of
Haskell products made out of the primitive supported types of the rts.
Which pritmive happens to have helpers for!
The law hiding in Primitive the package for what types should have a Prim
instance boils down, spiritually, can you directly write an efficient flat
memmove/memcopy / memset analogue without committing to a structure of
array SOA or array of struct AOS memory layout. (Hence why for primitive
I’ve had the stance that adding a “complex a” instance to Prim class
Storable vectors can’t have storable a , storable b => storable (a,b)
instance because you can’t really statically decide why that would map to
for a given c struct definition.
Unboxed vectors correspond with struct of array format over arbitrary
Haskell prim types of instances therof.
For Lehins, and certainly in general, it would absolutely make sense to add
an ARRAY OF struct memory layout that doesn’t require having a legal
corresponding c struct definition. And perhaps that could live in vector or
primitive, and probably would address the build time for code that he
dislikes about unboxed vectors.
Conceptually, the way I’ve thought about it is that currently we have SOA
unpinned arrays of arbitrary products of Haskell types ... we call this
Unbox.
We also have AOS of pinned arrays that we require only have instances that
at type checking time have a statically computable alignment etc, aka
storable.
We could totally also have AOS of arbitrary products of Haskell types in
unpinned arrays, which would be a useful new thing that would likely have
some nice tradeoffs. And unlike storable allow AOS a, AOS b => AOS (a,b)
Instances ! Which I think would address the build time and other concerns
folks mention and just be a useful different thing.
…On Thu, Jun 11, 2020 at 3:04 PM Bodigrim ***@***.***> wrote:
My 2p :)
Are Unboxed vectors more flexible than Primitive / Storable? IMHO they
are, because there are more Unboxed instances available out of the box
than Primitive / Storable instances.
Are Unboxed vectors slower than Primitive / Storable, when both instances
are available from vector? No, I believe they are not, by virtue of being
backed by the same structures.
So at the moment I do not see a reason to change vector documentation.
Obviously, we may review it, once more efficient Storable / Primitive
instances for tuples will be available.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQXZXBXZO4J3J3DCJV3RWETFPANCNFSM4H4F364A>
.
|
To further clarify: Prim instances should be those types that have zero
stride in their setting of sub fields in the respective mem* analogue
operations.
On Thu, Jun 11, 2020 at 6:32 PM Carter Schonwald <carter.schonwald@gmail.com>
wrote:
… To elaborate on @bodigrims point , unboxed vectors are exactly the set of
Haskell products made out of the primitive supported types of the rts.
Which pritmive happens to have helpers for!
The law hiding in Primitive the package for what types should have a Prim
instance boils down, spiritually, can you directly write an efficient flat
memmove/memcopy / memset analogue without committing to a structure of
array SOA or array of struct AOS memory layout. (Hence why for primitive
I’ve had the stance that adding a “complex a” instance to Prim class
Storable vectors can’t have storable a , storable b => storable (a,b)
instance because you can’t really statically decide why that would map to
for a given c struct definition.
Unboxed vectors correspond with struct of array format over arbitrary
Haskell prim types of instances therof.
For Lehins, and certainly in general, it would absolutely make sense to
add an ARRAY OF struct memory layout that doesn’t require having a legal
corresponding c struct definition. And perhaps that could live in vector or
primitive, and probably would address the build time for code that he
dislikes about unboxed vectors.
Conceptually, the way I’ve thought about it is that currently we have SOA
unpinned arrays of arbitrary products of Haskell types ... we call this
Unbox.
We also have AOS of pinned arrays that we require only have instances that
at type checking time have a statically computable alignment etc, aka
storable.
We could totally also have AOS of arbitrary products of Haskell types in
unpinned arrays, which would be a useful new thing that would likely have
some nice tradeoffs. And unlike storable allow AOS a, AOS b => AOS (a,b)
Instances ! Which I think would address the build time and other concerns
folks mention and just be a useful different thing.
On Thu, Jun 11, 2020 at 3:04 PM Bodigrim ***@***.***> wrote:
> My 2p :)
>
> Are Unboxed vectors more flexible than Primitive / Storable? IMHO they
> are, because there are more Unboxed instances available out of the box
> than Primitive / Storable instances.
>
> Are Unboxed vectors slower than Primitive / Storable, when both
> instances are available from vector? No, I believe they are not, by
> virtue of being backed by the same structures.
>
> So at the moment I do not see a reason to change vector documentation.
> Obviously, we may review it, once more efficient Storable / Primitive
> instances for tuples will be available.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#250 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAABBQXZXBXZO4J3J3DCJV3RWETFPANCNFSM4H4F364A>
> .
>
|
Unbox has benefits and drawbacks. On the downside:
On the upside:
Above points are things that I learned about differences about vector just of top of my head. The conclusion I can say about all this is that there is not one best default array type, so suggesting "use unboxed" whenever you can is not the right one, they all have their uses. Which means we just need to document it better. What my suggestion would be:
|
I can do a starter demonstrator PR for the AOS unpinned vector IF other
folks would be interested in helping get it to a mature point? I think
that would likely be a useful thing. Idk.
…On Fri, Jun 12, 2020 at 4:36 AM Alexey Kuleshevich ***@***.***> wrote:
Unbox has benefits and drawbacks.
On the downside:
- Compile time and memory usage are bad #274
<#274>, but I hope one day it
will get fixed
- Hard to write custom instances. I have a solution that worked for me
quite well: #315 (comment)
<#315 (comment)>
- Sometimes it is not good for cache locality, so for some data types
it can have lower performance then let's say for storable instance which
will have all of components of individual element next to each other in
memory. This one I haven't really seen to give me much trouble, so I don't
particularly consider it a problem, but had to mention it anyways.
- Hopefully one day we will get to adding SIMD instructions to vector,
when we do, unboxed vectors will not be that useful because for efficient
usage of data parallelization actual data has to be next to each other.
- Storable was designed for FFI and Primitive vector, contrary to the
common belief, can still be safely used with FFI in many situations, while
Unboxed vectors are simply unsuitable for FFI.
- I've seen primitive array perform better on rare occasions when
compared to unboxed with elements of the same type in some complicated
setups. This is likely due to the fact that unboxing machinery makes it a
bit harder for ghc to perform the same optimizations.
On the upside:
- We can have instances for product types, which Prim cannot allow by
itself and Storable does not provide by default. (I don't understand
why base does not provide such tuple instances, to me it seems like an
oversight. What Carter mentioned about alignment and associated C struct is
something I don't agree with, but that is not the point of this issue so
let's not worry about it)
- Unlike Storable, both Unboxed and Primitive use unpinned memory for
small allocations under ~3.3K which is great for avoiding memory
fragmentation. But when large chunks of memory are allocated we can
avoiding expensive copies during GC, because then they are allocated as
pinned. In other words we get the best of both worlds.
- Theoretically you will index only one array behind the Vector (a, b)
when you just need a or b, but not both, although I am not sure it
actually works that way in practice.
- Has slightly lower memory overhead than Storable, because ForeignPtr
besides the pointer actual Addr# also carries around the
MutableByteArray# in a sum type. This can only be noticeable for very
small vectors, which Storable shouldn't be used for anyways.
Above points are things that I learned about differences about vector just
of top of my head. The conclusion I can say about all this is that there is
not one best default array type, so suggesting "use unboxed" whenever you
can is not the right one, they all have their uses. Which means we just
need to document it better. What my suggestion would be:
- Use primitive vector for data types that have Prim instance.
- Use storable if you need FFI
- Use unboxed or storable if you have more complicated types that
don't have Prim instance, but you have instances available of either
Unbox or Storable, with preference that depends on the use cases
- Use boxed when none of above are applicable.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQRX6RPMYB2C4MUMKP3RWHSKVANCNFSM4H4F364A>
.
|
@cartazio Wouldn't such a PR would have to go into |
Perhaps! It might still be good to workshop it within vector first to make
sure it’s good for vector, then split it out into prs into both once we
have a good interface then everyone can help flesh it out once there’s a
nice demonstrator.
One possible refinement that maybe could be done later or perhaps best at
the same time would be that it would be possible to define safe instances
for sums as tagged unions! Aka
AOS a , AOS b => AOS Either a b.
Unlike in UNBOX, this wouldn’t have the undefinedness issue of one of the
vectors holding an ambiguous value unrelated to the current tag, just a
trailing suffix we could define as being zeroed
One issue of course would be that the sum eats up space equal to the max of
size of rep of a , size of rep of b. Plus the tag bits! So in some extreme
cases it would use way more space than vanilla boxed vectors. Which is
fine! Another angle is that nested sums won’t use tagged bits efficiently,
in general that would require instances all be sum of product
representations. So maybe the proof of concept would have a generic
deriving / default rep that requires types have a kosher sum of products
rep. Which seems like a reasonable requirement.
…On Fri, Jun 12, 2020 at 10:08 AM Alexey Kuleshevich < ***@***.***> wrote:
@cartazio <https://github.com/cartazio> Wouldn't such a PR would have to
go into primitive?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQTIPWAXPMX4PXFVQG3RWIZETANCNFSM4H4F364A>
.
|
@cartazio I see what you mean and I am definitely sure that vector is not the place for this. What I would do is adjust The problem with this sort of unboxing though is that the primops that are needed for achieving this are only available starting with GHC 8.6 ( The reason why I am saying that vector is not the right place for this functionality is because:
What I can recommend though, if you are keen on working on this creating a separate package that creates a new vector type in the same way that we already Boxed, Unboxed, Primitive, and Storable |
I’m pretty sure we don’t need that primop you refer to ... what makes you
think we do?
Yeah, the sop approach doesn’t make sense in terms of additional maintainer
complexity for vector. Just likely a valuable ecosystem bit.
…On Fri, Jun 12, 2020 at 10:43 AM Alexey Kuleshevich < ***@***.***> wrote:
@cartazio <https://github.com/cartazio> I see what you mean and I am
definitely sure that vector is not the place for this. What I would do is
adjust Prim class to support this sort of unboxing. In the matter of fact
I already have a a perfectly working and tested implementation for this
stuff in a closed source repository, which I plan to open up and release on
Hackage hopefully later on this summer. Spoiler alert: unboxing Maybe and
Either in a way you described gives tremendous improvements in speed, which
is definitely worth wasting this extra space.
The problem with this sort of unboxing though is that the primops that are
needed for achieving this are only available starting with GHC 8.6 (
readWord8ArrayAsDouble# ...), which means if we want it for unpinned
array we have to loose support for all those older ghc that people love so
much.
The reason why I am saying that vector is not the right place for this
functionality is because:
- there are already too many Vector types in vector, adding a new one
AOS is no go.
- We need to support old GHCs which will not allow us this sort of
unboxing
What I can recommend though, if you are keen on working on this creating a
separate package that creates a new vector type in the same way that we
already Boxed, Unboxed, Primitive, and Storable
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQTHVBUYNPW2UZKRBV3RWI5HRANCNFSM4H4F364A>
.
|
If want to read/write from/into a |
From an optimality of rep point of view you are absolutely correct. One
option I may choose is to just have the encoding be different / sloppier on
older ghcs. This is a great point and I’ll think about it as I’m
travelling with family this afternoon.
…On Fri, Jun 12, 2020 at 11:29 AM Alexey Kuleshevich < ***@***.***> wrote:
I’m pretty sure we don’t need that primop you refer to ... what makes you
think we do?
If want to read/write from/into a ByteArray anything that is larger than
one byte, eg. Word64 at any location, which is what you would need if you
wanted to unbox Either Word64 Word32, you would have to reserve a byte
for a flag (ie. Left/Right) and 8 bytes for content, then your second
element of the array will start at 9th byte and the actual Word64 or Word32
would be written at 10th byte. Now tell me, which primop would you use to
write into an unpinned ByteArray# a Word64 value at the offset of 10
bytes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQXV2UFDUUHB5JWSJG3RWJCWHANCNFSM4H4F364A>
.
|
For an MVP one can possibly spend a full word on a tag. |
Absolutely :)
…On Sun, Jun 14, 2020 at 12:35 PM Bodigrim ***@***.***> wrote:
For an MVP one can possibly spend a full word on a tag.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQSY73TVF62PZKSG5RLRWT34NANCNFSM4H4F364A>
.
|
@lehins This one surprises me a bit. Isn't it that for most SIMD, SoA is preferrable? For example:
|
@nh2 is correct |
@nh2 is always correct 😄 but not this time 😉 Unboxed vectors in Haskell are useless with SIMD precisely because data is not next to each other. Let me explain. Lets say we have a data type data Point = Point { x :: Double, y :: Double} A vector of points is unboxed as two vectors of doubles:
All is great because normally that is exactlywhat we want with SIMD. Let's say we want to add two vectors of points: (a :: Vector Point) + (b :: Vector Point) = (c :: Vector Point) what we need is In other words what we want is:
where
simply because we can not operate on a Now, if
So, joking aside you are totally right that SoA is much better for parallelization and that is exactly the approach I would recommend too. But unfortunately the safety of Haskell type system makes this storage layout useless for Unboxed vectors in vector package. |
i think the intended meaning is "the layout lets you hand write the correct instances" rather than "automatic simd", which we are very far from ever having |
the point of vector, or perhaps (for reasons of stability whatever comes next thats not vector), is to "drive" providing better fusion/optimization machinery in ghc. if thats no longer vector, thats fine too. That we cant do the right "through the constructor" homomorphism MEANS its an opportunity to make ghc better for these sorts of work, not to say its not practical / impossible |
I’ll try to share a grounded technical walk through of what I mean and how it would work sometime on the next few days. I’m a tad fried from a really busy work week and if that came across coarsely it’s just me feeling, in my own head, that my reasoning is obvious . I do hope folks can engage in poking at ambiguous parts of said exposition when I share it. Again because What’s obvious is different to different folks |
so, the short version is: whats missing to do the right "flat" Struct of Array SIMD in user space with vector is the following representation mapping being exposed (perhaps in a .unsafe/.internal interface): using the types in (see link for the associated vector instances )https://github.com/wellposed/numerical/blob/6b458232760b20674487bd9f8442b0991ce59423/src/Numerical/Data/Vector/HPair.hs#L61-L72 --- core defs from that file
--- we use Hprod a , where a :: *->*
--- because in the context i wrote it, i was mixing having leaves that were various different base vectors, and theres another -- example in the same directory that users data families rather than gadts, they have surprising trade off
data HProd a where
HPair :: HProd a-> HProd a -> HProd a
HUnit :: a -> HProd a
data VHProd (prd:: HProd ( * -> * )) val where
VHLeaf :: !(v a) -> VHProd ('HUnit v) a
VHNode :: !(VHProd pra a) -> !(VHProd prb b ) ->VHProd ('HPair pra prb) (a,b)
data MVHProd (prd:: HProd (* -> * -> *) ) (st :: * ) val where
MVHLeaf :: !(mv st a) -> MVHProd ('HUnit mv) st a
MVHNode :: !(MVHProd pra st a) -> !(MVHProd prb st b ) -> MVHProd ('HPair pra prb) st (a,b) then you write something like these type family ProductOfPrimVect :: ( a : *)-> HProd Prim.Vector (PrimProductRep a )
type family PrimProductRep :: * -> * --- this would be like a ghc generic rep, but only for product types,
--- and should match the tuple associativity of how unboxed would be paired up in this rep
unboxedExplode :: Unboxed.Vector a -> VHProd (ProductOfPrimVect a) (ProductRep a) then what a person using simd could do is "pick out" the right Prim vectors that are packed up in the underlying unboxed vector and do the SOA operations thereof. no constructor barrier i was actually working out a plan that would lay the ground work for this in primitive, but well, the past year made being motivated to do the supportring work a tad hard motivation wise . theres def some fuzzy bits in this sketch, but the point being, subject to having a strong contract about what Prim instances are legal if you dont wrap them in some sort of new type that annotates "what layout youre choosing" for composite instances, you can actually make this a pretty save to expose "unsafe" rep to build SOA primops. please ask what parts of this are vague (and do look at the HPair.hs and Pair .hs files in the associated repo i linked) edit: to be clear: this edit2: or we just export in a .internal all the constructors for Unboxed vector data family instances and users can do this themselves (do we do that yet?) |
@lehins Sorry if I am being daft but I don't see how the point constructor enters into the construction of an unboxed vector whatsoever. Why is that relevant here? |
@cartazio I have no idea what you are proposing here. Some day we'll get SIMD support in GHC then you can submit a PR with this functionality that you are suggesting. Sorry @Boarders I currently don't have much time to explain this in more detail, I guess the best suggestion I have for you that could clear it up is for you to try implementing the example I presented using either the SIMD primops instructions that use LLVM or intrinsic from gcc on the C side over FFI then you will see the problem right away. Until I tried implementing it myself it was hard for me to see the problem too. One way or another this is not the ticket that discusses SIMD, it is about calling Unboxed vectors default. So if anyone would like to discuss it further here is a gitter that can be used for such discussions: https://gitter.im/haskell/vector Once we do get proper support for SIMD in GHC, please open a new ticket about it. |
For the curious, I just measured the memory overhead of
|
Currently the documentation (or lack of it) suggests to use
Unboxed
if you want unboxed, low-overhead vectors. In fact the wiki linked in the description says:However, in my experience, it's really not the better default, as opposed to
Storable
(or perhapsPrimitive
). Here I propose to update the documentation to makeUnboxed
less prominent, and even explicitly discourage its use (to counter the natural appeal of its name).The
Unbox
class is hard to understandIt involves
Unbox
)MVector
andVector
are used both for classes and data type families, adding to the confusionPrimMonad
(should monads even be a prerequisite to use efficient vectors for user-defined types?)Hence, most new users would probably be unable to complete the sketched implementation of
Unbox
forComplex
in the documentation ofData.Vector.Unboxed
.Furthermore, because of the methods parameterized by
PrimMonad
instances, it's not possible to useDerivingVia
to abstract any of those details. So you either pay 30 lines of boilerplate, or use CPP/Template Haskell. Even assuming users are capable of adapting that boilerplate to their own types, that doesn't look "very easy" (quoted from the documentation).Unboxed is not the most efficient
In addition, although
Unboxed
is touted as efficient because vectors of tuples are merely tuples of vectors, a flatStorable
vector is actually more efficient in almost all cases: it has much fewer indirections and having every entry in a contiguous piece of memory makes it more cache-friendly. TheStorable
class is also so much easier to understand and use thanUnbox
. Access patterns taking advantage ofUnboxed
's layout are arguably rare.That means this sentence at the top of
Data.Vector.Primitive
(which is similar toStorable
in those respects) is incorrect:Storable
vsPrimitive
Instead of
Unboxed
, eitherStorable
orPrimitive
should be promoted instead.I'm still uncertain about the trade-offs of who should manage a vector's memory.
Although
Storable
is originally an FFI thing, it still seems to do a fine job for general-purpose programming, and it doesn't have the 2x factor of GC-controlled memory, as opposed toPrimitive
.MagicHash
is a recurrent point of confusion, and that also plays againstPrim
from a documentation/usability point of view.I'm sure there are advantages to
Primitive
, but I'm not sure they are sufficiently strong to make it a better default thanStorable
.The text was updated successfully, but these errors were encountered: