Add `unsafeCast` to `Primitive` vectors #401

lehins · 2021-08-10T22:10:24Z

While reviewing #398 I noticed that there is no unsafeCast for primitive vector.

We can't provide a cast that changes primitive vectors with elements of different sizes in the way that Data.Vector.Storable.unsafeCast does it. For example Vector Word8 -> Vector Word16 cast cannot be implemented with current primitive. However, this doesn't mean a cast like Vector Word8 -> Word Int8 or a Vector Float -> Vector Word32 cannot be useful for some reason.

Shimuuar · 2021-08-11T07:38:21Z

Or unsafeCast over newtypes for that matter

# Changes in version 0.13.0.0 * `mkType` from `Data.Vector.Generic` is deprecated in favor of `Data.Data.mkNoRepType` * The role signatures on several `Vector` types were too permissive, so they have been tightened up: * The role signature for `Data.Vector.Mutable.MVector` is now `type role MVector nominal representational` (previously, both arguments were `phantom`). [#224](haskell/vector#224) * The role signature for `Data.Vector.Primitive.Vector` is now `type role Vector nominal` (previously, it was `phantom`). The role signature for `Data.Vector.Primitive.Mutable.MVector` is now `type role MVector nominal nominal` (previously, both arguments were `phantom`). [#316](haskell/vector#316) * The role signature for `Data.Vector.Storable.Vector` is now `type role Vector nominal` (previous, it was `phantom`), and the signature for `Data.Vector.Storable.Mutable.MVector` is now `type role MVector nominal nominal` (previous, both arguments were `phantom`). [#235](haskell/vector#235) We pick `nominal` for the role of the last argument instead of `representational` since the internal structure of a `Storable` vector is determined by the `Storable` instance of the element type, and it is not guaranteed that the `Storable` instances between two representationally equal types will preserve this internal structure. One consequence of this choice is that it is no longer possible to `coerce` between `Storable.Vector a` and `Storable.Vector b` if `a` and `b` are nominally distinct but representationally equal types. We now provide `unsafeCoerce{M}Vector` and `unsafeCast` functions to allow this (the onus is on the user to ensure that no `Storable` invariants are broken when using these functions). * Methods of type classes `Data.Vector.Generic.Mutable.MVector` and `Data.Vector.Generic.Vector` use concrete monads (`ST`, etc) istead of being polymorphic (`PrimMonad`, etc). [#335](haskell/vector#335). This makes it possible to derive `Unbox` with: * `GeneralizedNewtypeDeriving` * via `UnboxViaPrim` and `Prim` instance * via `As` and `IsoUnbox` instance: [#378](haskell/vector#378) * Add `MonadFix` instance for boxed vectors: [#312](haskell/vector#312) * Re-export `PrimMonad` and `RealWorld` from mutable vectors: [#320](haskell/vector#320) * Add `maximumOn` and `minimumOn`: [#356](haskell/vector#356) * The functions `scanl1`, `scanl1'`, `scanr1`, and `scanr1'` for immutable vectors are now defined when given empty vectors as arguments, in which case they return empty vectors. This new behavior is consistent with the one of the corresponding functions in `Data.List`. Prior to this change, applying an empty vector to any of those functions resulted in an error. This change was introduced in: [#382](haskell/vector#382) * Change allocation strategy for `unfoldrN`: [#387](haskell/vector#387) * Remove `CPP` driven error reporting in favor of `HasCallStack`: [#397](haskell/vector#397) * Remove redundant `Storable` constraints on to/from `ForeignPtr` conversions: [#394](haskell/vector#394) * Add `unsafeCast` to `Primitive` vectors: [#401](haskell/vector#401) * Make `(!?)` operator strict: [#402](haskell/vector#402) * Add `readMaybe`: [#425](haskell/vector#425) * Add `groupBy` and `group` for `Data.Vector.Generic` and the specialized version in `Data.Vector`, `Data.Vector.Unboxed`, `Data.Vector.Storable` and `Data.Vector.Primitive`. [#427](haskell/vector#427) * Add `toArraySlice` and `unsafeFromArraySlice` functions for conversion to and from the underlying boxed `Array`: [#434](haskell/vector#434)

qrpnxz · 2022-08-23T00:39:13Z

We can't provide a cast that changes primitive vectors with elements of different sizes in the way that Data.Vector.Storable.unsafeCast does it.

Why not?

lehins · 2022-08-23T01:09:18Z

@qrpnxz Because unlike Storable vector we cannot start at an arbitrary byte offset, we have to think in element offsets. Unfortunately it is a limitation of Prim type class, which could be fixed.

For example let's try to cast a primitive Vector Word16 -> Vector Word64 and this is the vector we want to cast:

w16vec :: Vector Word16
w16vec = drop 3 $ fromList [1,2,3,4,5,6,7,8]

when we drop 3 elements we do a O(1) slice, which makes

w16vec :: Vector Word16
w16vec = Vector 3 5 (ba :: ByteArray)
  where ba = fromList [1,2,3,4,5,6,7,8]

If we were to convert this vector using the same strategy as in Storable we'd expect a Vector Word64 with a single element, were each of its 16 bit chunks were: [4,5,6,7], but how would you specify the offset?

w32vec :: Vector Word64
w32vec = Vector ? 1 (ba :: ByteArray)

Problem is that the offset must be specified in number of elements, and it must be specified in number of elements because Prim class does not provide us with the ability to index in offsets of bytes, only in elements

qrpnxz · 2022-08-23T01:24:15Z

How unfortunate. Thanks for explaining.

qrpnxz · 2022-08-30T23:08:29Z

A pinned Primitive variant could do it by implementing *byteOff :: MutableByteArray# -> Int# -> State# s -> _ helper functions with byteArrayContents#, plusAddr#, and *OffAddr#. For example:

indexByteOffPrim :: Prim a => ByteArray# -> Int# -> a
indexByteOffPrim arr# idx# =
  case byteArrayContents# arr# of
    addr# -> indexOffAddr# (plusAddr# addr# idx#) 0

writeByteOffPrim :: Prim a => MutableByteArray# -> Int# -> a -> State# s -> State# s
writeByteOffPrim marr# idx# a s0# =
  case unsafeFreezeByteArray# marr# st0# of
    (# s1#, arr# #) -> case byteArrayContents arr# of
      addr# -> writeOffAddr# (plusAddr# arr# idx#) 0 a s1#

EDIT:

In the latest base there is now mutableByteArrayContents# :: MutableByteArray# d -> Addr#. I don't know what version exactly it was introduced.

writeByteOffPrim :: Prim a => MutableByteArray# -> Int# -> a -> State# s -> State# s
writeByteOffPrim marr# idx# a s0# =
  case mutableByteArrayContents marr# of
    addr# -> writeOffAddr# (plusAddr# arr# idx#) 0 a s0#

lehins · 2022-08-30T23:21:34Z

@qrpnxz Your pinned solution would not work for vector first of all. Secondly why would use Primitive for pinned memory, you can just use Storable vector.

I know for certain that this problem is solvable with primitive operations that work on a byte offset, such as indexWord8ArrayAsWord16# for example, however this is not a discussion for vector package, you should open an issue on the primitive repository if you really care about this.

qrpnxz · 2022-08-31T00:48:45Z

I don't see why it wouldn't work. Perhaps I should mention that the offset would now be in bytes. Offset recalculation would not even be needed at all in a cast; it would stay the same. As to why you would want a pinned version of the primitive vector, here is one use-case: you can zero-copy convert a pinned primitive vector into both Data.Text.Text and into Data.ByteString.ByteString. If you use a storable vector you have to copy to go into text, if you use an unpinned primitive vector you have to copy to go into bytestring.

Although the Prim class could be improved with the functions you mention, I'm not proposing to a change to primitive; I'm talking about what can be done in vector today with what we already have. If the only acceptable solution is to go and change primitive, then yes I can go over there and see what can be done, but already we can do something here.

lehins · 2022-08-31T01:16:06Z

The whole point of Data.Vector.Primitive.Vector is that it is unpinned memory. If you need pinned memory you should use Data.Vector.Storable.Vector vector.

Switching primitive vector to use byte offset is a serious breaking change, which could happen, but only if primitive supported it and there was enough motivation to get this done.
Switching primitive vector to use pinned memory in order to achieve zero copy to Text and ByteString does not outweigh the issue of memory fragmentation when a lot of small vectors are being used. This is not something we will do today or ever for that matter!

FYI:

It is possible to cast Primitive vector into Text without copy, since both use unpinned memory.
It is possible to cast Storable vector into ByteString without copy, since both use pinned memory.

If the only acceptable solution is to go and change primitive, then yes I can go over there and see what can be done

So yes, the only acceptable solution is to go and change primitive

qrpnxz · 2022-08-31T01:41:29Z

Goodness, I did not mean to imply changing the current Primitive to pinned! (😄) I was talking about having pinned variant; that is, a whole other type. Of course we still want the unpinned one.

It is possible to cast Primitive vector into Text without copy, since both use unpinned memory.
It is possible to cast Storable vector into ByteString without copy, since both use pinned memory.

I said so myself? The plus is being able to do both Text and ByteString. (Useful in parsing.)

lehins · 2022-08-31T08:58:09Z

I was talking about having pinned variant; that is, a whole other type.

Maintaining five different modules with almost the same API already takes a lot of work, I'd be dreading adding one more module to that list.

It feels like we keep talking past each other. I recommend you open a ticket explaining clearly:

the goals you are trying to achieve.
the proposed solution:
- list all the benefits we would gain from this solution. If performance is your goal, some basic benchmarks would be useful to support the claim.
- list all the drawbacks that you can think of
- estimate or summarize in words how much work it would take to implement the solution and maintain it. Evaluate if all the effort is worth the goal.
compare it to alternative solutions that you know of

If all maintainers will agree that it really is something worthwhile implementing an maintaining then we can come up with the plan on how to get it done. This would also allow other from the community to pitch in with ideas. Discussing this on a merged PR does not make much sense and it doesn't have the same visibility.

lehins requested review from Shimuuar and Bodigrim August 10, 2021 22:10

Bodigrim approved these changes Aug 10, 2021

View reviewed changes

Add unsafeCast to Primitive vectors

24d64b0

lehins force-pushed the add-cast-vector branch from c1e1950 to 24d64b0 Compare August 11, 2021 00:43

Shimuuar merged commit f77e707 into master Aug 11, 2021

lehins deleted the add-cast-vector branch August 11, 2021 22:06

lehins mentioned this pull request Aug 13, 2021

Improve documentation for mutable vectors #404

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `unsafeCast` to `Primitive` vectors #401

Add `unsafeCast` to `Primitive` vectors #401

lehins commented Aug 10, 2021 •

edited

Loading

Shimuuar commented Aug 11, 2021

qrpnxz commented Aug 23, 2022

lehins commented Aug 23, 2022

qrpnxz commented Aug 23, 2022

qrpnxz commented Aug 30, 2022 •

edited

Loading

lehins commented Aug 30, 2022

qrpnxz commented Aug 31, 2022

lehins commented Aug 31, 2022 •

edited

Loading

qrpnxz commented Aug 31, 2022 •

edited

Loading

lehins commented Aug 31, 2022

Add unsafeCast to Primitive vectors #401

Add unsafeCast to Primitive vectors #401

Conversation

lehins commented Aug 10, 2021 • edited Loading

Shimuuar commented Aug 11, 2021

qrpnxz commented Aug 23, 2022

lehins commented Aug 23, 2022

qrpnxz commented Aug 23, 2022

qrpnxz commented Aug 30, 2022 • edited Loading

lehins commented Aug 30, 2022

qrpnxz commented Aug 31, 2022

lehins commented Aug 31, 2022 • edited Loading

qrpnxz commented Aug 31, 2022 • edited Loading

lehins commented Aug 31, 2022

Add `unsafeCast` to `Primitive` vectors #401

Add `unsafeCast` to `Primitive` vectors #401

lehins commented Aug 10, 2021 •

edited

Loading

qrpnxz commented Aug 30, 2022 •

edited

Loading

lehins commented Aug 31, 2022 •

edited

Loading

qrpnxz commented Aug 31, 2022 •

edited

Loading