-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BTreeMap with u128 values for sparse bit sets/"vectors" (in dataflow etc.). #47575
Comments
Is the idea that the key would be |
@nikomatsakis That's pretty much it, yes. Here's what I've been playing with so far: // FIXME(eddyb) move to rustc_data_structures.
#[derive(Clone)]
pub struct SparseBitSet<I: Idx> {
map: BTreeMap<u32, u128>,
_marker: PhantomData<I>,
}
fn key_and_mask<I: Idx>(index: I) -> (u32, u128) {
let index = index.index();
let key = index / 128;
let key_u32 = key as u32;
assert_eq!(key_u32 as usize, key);
(key_u32, 1 << (index % 128))
}
impl<I: Idx> SparseBitSet<I> {
pub fn new() -> Self {
SparseBitSet {
map: BTreeMap::new(),
_marker: PhantomData
}
}
pub fn capacity(&self) -> usize {
self.map.len() * 128
}
pub fn contains(&self, index: I) -> bool {
let (key, mask) = key_and_mask(index);
self.map.get(&key).map_or(false, |bits| (bits & mask) != 0)
}
pub fn insert(&mut self, index: I) -> bool {
let (key, mask) = key_and_mask(index);
let bits = self.map.entry(key).or_insert(0);
let old_bits = *bits;
let new_bits = old_bits | mask;
*bits = new_bits;
new_bits != old_bits
}
pub fn remove(&mut self, index: I) -> bool {
let (key, mask) = key_and_mask(index);
if let Some(bits) = self.map.get_mut(&key) {
let old_bits = *bits;
let new_bits = old_bits & !mask;
*bits = new_bits;
// FIXME(eddyb) maybe remove entry if now `0`.
new_bits != old_bits
} else {
false
}
}
pub fn iter<'a>(&'a self) -> impl Iterator<Item = I> + 'a {
self.map.iter().flat_map(|(&key, &bits)| {
let base = key as usize * 128;
(0..128).filter_map(move |i| {
if (bits & (1 << i)) != 0 {
Some(I::new(base + i))
} else {
None
}
})
})
}
} |
Makes sense. I wonder if it would be useful in NLL too. |
One thing that might be worth considering is using binmaps. |
We should also benchmark "sparse bit matrices" with EDIT: This makes iteration within a "row" harder, especially with the |
I've added a "chunked" API to my // FIXME(eddyb) move to rustc_data_structures.
#[derive(Clone)]
pub struct SparseBitSet<I: Idx> {
chunk_bits: BTreeMap<u32, u128>,
_marker: PhantomData<I>,
}
#[derive(Copy, Clone)]
pub struct SparseChunk<I> {
key: u32,
bits: u128,
_marker: PhantomData<I>,
}
impl<I: Idx> SparseChunk<I> {
pub fn one(index: I) -> Self {
let index = index.index();
let key_usize = index / 128;
let key = key_usize as u32;
assert_eq!(key as usize, key_usize);
SparseChunk {
key,
bits: 1 << (index % 128),
_marker: PhantomData
}
}
pub fn any(&self) -> bool {
self.bits != 0
}
pub fn iter(&self) -> impl Iterator<Item = I> {
let base = self.key as usize * 128;
let mut bits = self.bits;
(0..128).map(move |i| {
let current_bits = bits;
bits >>= 1;
(i, current_bits)
}).take_while(|&(_, bits)| bits != 0)
.filter_map(move |(i, bits)| {
if (bits & 1) != 0 {
Some(I::new(base + i))
} else {
None
}
})
}
}
impl<I: Idx> SparseBitSet<I> {
pub fn new() -> Self {
SparseBitSet {
chunk_bits: BTreeMap::new(),
_marker: PhantomData
}
}
pub fn capacity(&self) -> usize {
self.chunk_bits.len() * 128
}
pub fn contains_chunk(&self, chunk: SparseChunk<I>) -> SparseChunk<I> {
SparseChunk {
bits: self.chunk_bits.get(&chunk.key).map_or(0, |bits| bits & chunk.bits),
..chunk
}
}
pub fn insert_chunk(&mut self, chunk: SparseChunk<I>) -> SparseChunk<I> {
if chunk.bits == 0 {
return chunk;
}
let bits = self.chunk_bits.entry(chunk.key).or_insert(0);
let old_bits = *bits;
let new_bits = old_bits | chunk.bits;
*bits = new_bits;
let changed = new_bits ^ old_bits;
SparseChunk {
bits: changed,
..chunk
}
}
pub fn remove_chunk(&mut self, chunk: SparseChunk<I>) -> SparseChunk<I> {
if chunk.bits == 0 {
return chunk;
}
let changed = match self.chunk_bits.entry(chunk.key) {
Entry::Occupied(mut bits) => {
let old_bits = *bits.get();
let new_bits = old_bits & !chunk.bits;
if new_bits == 0 {
bits.remove();
} else {
bits.insert(new_bits);
}
new_bits ^ old_bits
}
Entry::Vacant(_) => 0
};
SparseChunk {
bits: changed,
..chunk
}
}
pub fn clear(&mut self) {
self.chunk_bits.clear();
}
pub fn chunks<'a>(&'a self) -> impl Iterator<Item = SparseChunk<I>> + 'a {
self.chunk_bits.iter().map(|(&key, &bits)| {
SparseChunk {
key,
bits,
_marker: PhantomData
}
})
}
pub fn contains(&self, index: I) -> bool {
self.contains_chunk(SparseChunk::one(index)).any()
}
pub fn insert(&mut self, index: I) -> bool {
self.insert_chunk(SparseChunk::one(index)).any()
}
pub fn remove(&mut self, index: I) -> bool {
self.remove_chunk(SparseChunk::one(index)).any()
}
pub fn iter<'a>(&'a self) -> impl Iterator<Item = I> + 'a {
self.chunks().flat_map(|chunk| chunk.iter())
}
} cc @spastorino |
Done here #48245 |
Should this be closed, now? |
Maybe? There are still some places we could use sparse sets that we are not today, I suppose. |
@rustbot release-assignment |
I'm going to go ahead and close. There's been a bunch of iterations on various sparse vs. dense tradeoffs throughout the compiler (particularly in the MIR-related code), I don't think a generic tracking issue remains useful. |
According to our current implementation of B-trees:
rust/src/liballoc/btree/node.rs
Lines 53 to 55 in 5965b79
it would appear that up to
11
key-value pairs can be stored in each node.For
u128
values representing128
set elements each,1408
set elements can be stored in a single allocation, with an overhead of around 50% compared toVec<u128>
in the dense case.Such sparse bitsets would be really useful for (e.g. dataflow) analysis algorithms, in situations where the bitset elements tend to be localized, with multi-"word" gaps in between local groups.
cc @nikomatsakis @pnkfelix
The text was updated successfully, but these errors were encountered: