Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unsafeCast to Primitive vectors #401

Merged
merged 1 commit into from
Aug 11, 2021
Merged

Add unsafeCast to Primitive vectors #401

merged 1 commit into from
Aug 11, 2021

Conversation

lehins
Copy link
Contributor

@lehins lehins commented Aug 10, 2021

While reviewing #398 I noticed that there is no unsafeCast for primitive vector.

We can't provide a cast that changes primitive vectors with elements of different sizes in the way that Data.Vector.Storable.unsafeCast does it. For example Vector Word8 -> Vector Word16 cast cannot be implemented with current primitive. However, this doesn't mean a cast like Vector Word8 -> Word Int8 or a Vector Float -> Vector Word32 cannot be useful for some reason.

@lehins lehins requested review from Shimuuar and Bodigrim August 10, 2021 22:10
@Shimuuar
Copy link
Contributor

Or unsafeCast over newtypes for that matter

@Shimuuar Shimuuar merged commit f77e707 into master Aug 11, 2021
@lehins lehins deleted the add-cast-vector branch August 11, 2021 22:06
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Aug 19, 2022
# Changes in version 0.13.0.0

 * `mkType` from `Data.Vector.Generic` is deprecated in favor of
   `Data.Data.mkNoRepType`
 * The role signatures on several `Vector` types were too permissive, so they
   have been tightened up:
   * The role signature for `Data.Vector.Mutable.MVector` is now
     `type role MVector nominal representational` (previously, both arguments
     were `phantom`). [#224](haskell/vector#224)
   * The role signature for `Data.Vector.Primitive.Vector` is now
     `type role Vector nominal` (previously, it was `phantom`).
     The role signature for `Data.Vector.Primitive.Mutable.MVector` is now
     `type role MVector nominal nominal` (previously, both arguments were
     `phantom`). [#316](haskell/vector#316)
   * The role signature for `Data.Vector.Storable.Vector` is now
     `type role Vector nominal` (previous, it was `phantom`), and the signature
     for `Data.Vector.Storable.Mutable.MVector` is now
     `type role MVector nominal nominal` (previous, both arguments were
     `phantom`). [#235](haskell/vector#235)

     We pick `nominal` for the role of the last argument instead of
     `representational` since the internal structure of a `Storable` vector is
     determined by the `Storable` instance of the element type, and it is not
     guaranteed that the `Storable` instances between two representationally
     equal types will preserve this internal structure.  One consequence of this
     choice is that it is no longer possible to `coerce` between
     `Storable.Vector a` and `Storable.Vector b` if `a` and `b` are nominally
     distinct but representationally equal types. We now provide
     `unsafeCoerce{M}Vector` and `unsafeCast` functions to allow this (the onus
     is on the user to ensure that no `Storable` invariants are broken when
     using these functions).
 * Methods of type classes `Data.Vector.Generic.Mutable.MVector` and
   `Data.Vector.Generic.Vector` use concrete monads (`ST`, etc) istead of being
   polymorphic (`PrimMonad`, etc). [#335](haskell/vector#335).
   This makes it possible to derive `Unbox` with:
   * `GeneralizedNewtypeDeriving`
   * via `UnboxViaPrim` and `Prim` instance
   * via `As` and `IsoUnbox` instance: [#378](haskell/vector#378)
 * Add `MonadFix` instance for boxed vectors: [#312](haskell/vector#312)
 * Re-export `PrimMonad` and `RealWorld` from mutable vectors:
   [#320](haskell/vector#320)
 * Add `maximumOn` and `minimumOn`: [#356](haskell/vector#356)
 * The functions `scanl1`, `scanl1'`, `scanr1`, and `scanr1'` for immutable
   vectors are now defined when given empty vectors as arguments,
   in which case they return empty vectors. This new behavior is consistent
   with the one of the corresponding functions in `Data.List`.
   Prior to this change, applying an empty vector to any of those functions
   resulted in an error. This change was introduced in:
   [#382](haskell/vector#382)
 * Change allocation strategy for `unfoldrN`: [#387](haskell/vector#387)
 * Remove `CPP` driven error reporting in favor of `HasCallStack`:
   [#397](haskell/vector#397)
 * Remove redundant `Storable` constraints on to/from `ForeignPtr` conversions:
   [#394](haskell/vector#394)
 * Add `unsafeCast` to `Primitive` vectors: [#401](haskell/vector#401)
 * Make `(!?)` operator strict: [#402](haskell/vector#402)
 * Add `readMaybe`: [#425](haskell/vector#425)
 * Add `groupBy` and `group` for `Data.Vector.Generic` and the specialized
   version in `Data.Vector`, `Data.Vector.Unboxed`, `Data.Vector.Storable` and
   `Data.Vector.Primitive`. [#427](haskell/vector#427)
 * Add `toArraySlice` and `unsafeFromArraySlice` functions for conversion to and
   from the underlying boxed `Array`: [#434](haskell/vector#434)
@qrpnxz
Copy link

qrpnxz commented Aug 23, 2022

We can't provide a cast that changes primitive vectors with elements of different sizes in the way that Data.Vector.Storable.unsafeCast does it.

Why not?

@lehins
Copy link
Contributor Author

lehins commented Aug 23, 2022

@qrpnxz Because unlike Storable vector we cannot start at an arbitrary byte offset, we have to think in element offsets. Unfortunately it is a limitation of Prim type class, which could be fixed.

For example let's try to cast a primitive Vector Word16 -> Vector Word64 and this is the vector we want to cast:

w16vec :: Vector Word16
w16vec = drop 3 $ fromList [1,2,3,4,5,6,7,8]

when we drop 3 elements we do a O(1) slice, which makes

w16vec :: Vector Word16
w16vec = Vector 3 5 (ba :: ByteArray)
  where ba = fromList [1,2,3,4,5,6,7,8]

If we were to convert this vector using the same strategy as in Storable we'd expect a Vector Word64 with a single element, were each of its 16 bit chunks were: [4,5,6,7], but how would you specify the offset?

w32vec :: Vector Word64
w32vec = Vector ? 1 (ba :: ByteArray)

Problem is that the offset must be specified in number of elements, and it must be specified in number of elements because Prim class does not provide us with the ability to index in offsets of bytes, only in elements

@qrpnxz
Copy link

qrpnxz commented Aug 23, 2022

How unfortunate. Thanks for explaining.

@qrpnxz
Copy link

qrpnxz commented Aug 30, 2022

A pinned Primitive variant could do it by implementing *byteOff :: MutableByteArray# -> Int# -> State# s -> _ helper functions with byteArrayContents#, plusAddr#, and *OffAddr#. For example:

indexByteOffPrim :: Prim a => ByteArray# -> Int# -> a
indexByteOffPrim arr# idx# =
  case byteArrayContents# arr# of
    addr# -> indexOffAddr# (plusAddr# addr# idx#) 0
writeByteOffPrim :: Prim a => MutableByteArray# -> Int# -> a -> State# s -> State# s
writeByteOffPrim marr# idx# a s0# =
  case unsafeFreezeByteArray# marr# st0# of
    (# s1#, arr# #) -> case byteArrayContents arr# of
      addr# -> writeOffAddr# (plusAddr# arr# idx#) 0 a s1#

EDIT:

In the latest base there is now mutableByteArrayContents# :: MutableByteArray# d -> Addr#. I don't know what version exactly it was introduced.

writeByteOffPrim :: Prim a => MutableByteArray# -> Int# -> a -> State# s -> State# s
writeByteOffPrim marr# idx# a s0# =
  case mutableByteArrayContents marr# of
    addr# -> writeOffAddr# (plusAddr# arr# idx#) 0 a s0#

@lehins
Copy link
Contributor Author

lehins commented Aug 30, 2022

@qrpnxz Your pinned solution would not work for vector first of all. Secondly why would use Primitive for pinned memory, you can just use Storable vector.

I know for certain that this problem is solvable with primitive operations that work on a byte offset, such as indexWord8ArrayAsWord16# for example, however this is not a discussion for vector package, you should open an issue on the primitive repository if you really care about this.

@qrpnxz
Copy link

qrpnxz commented Aug 31, 2022

I don't see why it wouldn't work. Perhaps I should mention that the offset would now be in bytes. Offset recalculation would not even be needed at all in a cast; it would stay the same. As to why you would want a pinned version of the primitive vector, here is one use-case: you can zero-copy convert a pinned primitive vector into both Data.Text.Text and into Data.ByteString.ByteString. If you use a storable vector you have to copy to go into text, if you use an unpinned primitive vector you have to copy to go into bytestring.

Although the Prim class could be improved with the functions you mention, I'm not proposing to a change to primitive; I'm talking about what can be done in vector today with what we already have. If the only acceptable solution is to go and change primitive, then yes I can go over there and see what can be done, but already we can do something here.

@lehins
Copy link
Contributor Author

lehins commented Aug 31, 2022

The whole point of Data.Vector.Primitive.Vector is that it is unpinned memory. If you need pinned memory you should use Data.Vector.Storable.Vector vector.

Switching primitive vector to use byte offset is a serious breaking change, which could happen, but only if primitive supported it and there was enough motivation to get this done.
Switching primitive vector to use pinned memory in order to achieve zero copy to Text and ByteString does not outweigh the issue of memory fragmentation when a lot of small vectors are being used. This is not something we will do today or ever for that matter!

FYI:

  • It is possible to cast Primitive vector into Text without copy, since both use unpinned memory.
  • It is possible to cast Storable vector into ByteString without copy, since both use pinned memory.

If the only acceptable solution is to go and change primitive, then yes I can go over there and see what can be done

So yes, the only acceptable solution is to go and change primitive

@qrpnxz
Copy link

qrpnxz commented Aug 31, 2022

Goodness, I did not mean to imply changing the current Primitive to pinned! (😄) I was talking about having pinned variant; that is, a whole other type. Of course we still want the unpinned one.

It is possible to cast Primitive vector into Text without copy, since both use unpinned memory.
It is possible to cast Storable vector into ByteString without copy, since both use pinned memory.

I said so myself? The plus is being able to do both Text and ByteString. (Useful in parsing.)

@lehins
Copy link
Contributor Author

lehins commented Aug 31, 2022

I was talking about having pinned variant; that is, a whole other type.

Maintaining five different modules with almost the same API already takes a lot of work, I'd be dreading adding one more module to that list.

It feels like we keep talking past each other. I recommend you open a ticket explaining clearly:

  • the goals you are trying to achieve.
  • the proposed solution:
    • list all the benefits we would gain from this solution. If performance is your goal, some basic benchmarks would be useful to support the claim.
    • list all the drawbacks that you can think of
    • estimate or summarize in words how much work it would take to implement the solution and maintain it. Evaluate if all the effort is worth the goal.
  • compare it to alternative solutions that you know of

If all maintainers will agree that it really is something worthwhile implementing an maintaining then we can come up with the plan on how to get it done. This would also allow other from the community to pitch in with ideas. Discussing this on a merged PR does not make much sense and it doesn't have the same visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants