-
Notifications
You must be signed in to change notification settings - Fork 88
Bounding const generic bitmasks #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I didn't check the latest spec, but the earlier version I read allows you to have vector registers with much more than 1000 elements (something like 16k or 64k elements iirc), of course that depends on the hardware supporting something that big, I'd expect most implementations to support less than 64 elements per vector register for the near future. |
so, no, you can't cram a bitmask for that in a u128 :) |
Well, VLEN doesn't have a specified upper bound here, either, I just said 128 because it was the specced minimum size for an actual vector register, which clearly can plausibly go to at least 512 even on a simpler RISC-y architecture given that such has happened for ARM chips as well. |
In the spirit of "something that works, now" currently we're using a u64 since we don't have as much practical need for something more clever in 2021. As this is not necessarily the solution we will want in 2031, this is likely not to be what we do permanently, and we are considering carving out an alternative feature gate for mask types. |
Code for idea: pub const fn mask_bytes(n: usize) -> usize {
((n + 7) / 8).next_power_of_two()
}
macro_rules! make_masks {
($($n:literal,)*) => {
$(
unsafe impl MaskHelper for [(); $n] {
type Underlying = [u8; mask_bytes($n)];
}
)*
};
}
make_masks![
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154,
155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173,
174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192,
193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211,
212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230,
231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249,
250, 251, 252, 253, 254, 255, 256,
];
pub unsafe trait MaskHelper {
/// for `Self = [(); N]`:
/// `Underlying` must be `[u8; mask_bytes(N)]`
type Underlying: Copy + 'static;
}
pub struct Mask<const BITS: usize>(<[(); BITS] as MaskHelper>::Underlying)
where
[(); BITS]: MaskHelper; |
—@calebzulawski in (https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Mask.20API/near/219272371)
As a reminder, because frankly I have lost track of this in a conversation at least once, a "bitmask" is a mask type that uses 1 bit per vector element (like a bitset) so that a single tightly-packed number defines a mask for an entire vector. Bitmasks can be more efficient, but in order to make an efficient bitmask, you need to know the size of the bitmasks you're supporting ahead of time. This is something we would prefer to not say, given that current architectures that are actually in use on commodity hardware (AVX512) already support 64b bitmasks, and future implementations of RISCV-V also will use bitmasks with pretty interestingly widths†, as I currently understand it.
This problem infects our attempt to design an "opaque", arch-agnostic mask type because we have to impose the same bound at that site as well. That makes it quite unpretty and non-ergonomic, even though we have some clear bounds that you would think we could prove to the compiler pretty effectively.
"Full masks" are comparatively simple... they're the size of a vector's elements, and so, are the size of a vector register. Which is a bit convenient, really. Caleb mentioned It might be nice to say that a mask uses an entire vector register and so is Many bits wide regardless, in terms of layout, and hope the optimizer handles it intelligently.
† hypothetically, unless I misunderstand the spec, it requires a mask register to fit inside VLEN, the size of an individual vector register, but that means it could allow a VLEN-sized mask to mask byte-wide structures over a "vector register group" using that 1 vector register... or 128 elements! @programmerjake am I reading this correctly?
The text was updated successfully, but these errors were encountered: