-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[proposition] mask-based implementation for missing values #441
Comments
Some questions arise:
At the moment saddle provides two copies from many operations (eg max, max2) one omitting the missingness check the other respecting missigness. All code in linalg is oblivious to missingness. |
That's an approach indeed. So as to make sure to enable vectorization. To be validated with benchmarking. Also, the vector could have a field
There are two variants if I understand correctly. The boxed one returning
I'd start with dense for simplicity? also along the
These aspects of saddle are surprising imo. I assumed these 2 implementations were semantically equivalent. |
I see. I think I concur with most of these. My only remaining concern is that I believe there should be a way to extract a Double from a Vec[Double] without going through Scalar, except if we are sure that the VM eliminates the allocations the Scalar on branch less code path (e.g. |
Currently, saddle uses one value of each primitive type to represent
NA
. For floating point numbers, this is straightforward as they already include such a value. For other types (Boolean
,Byte
,Int
, etc), it isn't straightforward and an arbitrary value must be used. Currently the minimum value is used (Byte.MinValue
,Short.MinValue
, etc).I think this approach has important drawbacks:
MinValue
would result in a missing value.if (tag.isMissing(v1)) v1 else v1 + 2
might prevent loop optimizations of the jvm to kick-in..raw(i)
-like api exposes unnecessary complexity to the user.An alternative approach would be to use a mask-based implementation for the integer-based
Vec[T]
s. That is, the vector stores a companionArray[Boolean]
indicating missing value. This approach is used by pandas.The text was updated successfully, but these errors were encountered: