opt(search): mmap slab-fs to reduce memory consumption #19

ldm0 · 2025-11-27T19:10:19Z

No description provided.

Copilot

Pull request overview

This PR introduces a memory-mapped slab allocator implementation (slab-mmap) to reduce memory consumption in the search infrastructure. The new crate uses temporary files with memory mapping to allow the OS to page data in and out of memory, replacing the standard slab crate in search-cache.

Key Changes:

New slab-mmap crate with mmap-backed storage using memmap2 and tempfile
Full serde serialization/deserialization support for compatibility with the standard slab crate
Integration into search-cache replacing the standard slab dependency
Comprehensive test suite covering basic operations, serialization, concurrency, and boundary cases

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
`slab-mmap/Cargo.toml`	Package definition with dependencies on memmap2, tempfile, and serde
`slab-mmap/src/lib.rs`	Core mmap-backed slab implementation with growth management
`slab-mmap/src/serde.rs`	Serde serialization bridge for compatibility with standard slab
`slab-mmap/src/builder.rs`	Helper for rebuilding slab from deserialized key-value pairs
`slab-mmap/tests/*.rs`	Test suite covering functionality, serialization, concurrency, and edge cases
`search-cache/src/slab.rs`	Updated to use `slab_mmap::Slab` instead of `slab::Slab`
`search-cache/Cargo.toml`	Dependency change from `slab` to `slab-mmap`
`Cargo.toml`	Added `slab-mmap` to workspace members
`Cargo.lock`	Updated with new dependencies (memmap2, serde-value, etc.)
`cardinal/src-tauri/Cargo.lock`	Updated lockfile reflecting dependency changes
`cardinal/package-lock.json`	Various peer dependency changes (appears unrelated)

Files not reviewed (1)

cardinal/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

search-cache/src/slab.rs

Copilot · 2025-11-27T19:35:15Z

slab-mmap/src/lib.rs

+        }
+        assert!(
+            min_slots < MAX_SLOTS,
+            "slab index must be less than u32::MAX"


The assertion message is misleading. The check is min_slots < MAX_SLOTS but the message says "must be less than u32::MAX". Since MAX_SLOTS = u32::MAX as usize, the message should be "slab capacity must be less than u32::MAX" for clarity.

Suggested change

"slab index must be less than u32::MAX"

"slab capacity must be less than u32::MAX"

Copilot · 2025-11-27T19:35:16Z

slab-mmap/src/lib.rs

+
+    fn with_capacity(capacity: usize) -> Self {
+        let capacity = capacity.max(1);
+        let mut file = NamedTempFile::new().expect("Failed to create slab backing file");


Potential resource leak: The temporary file created by NamedTempFile::new() will be deleted when the Slab is dropped, but if the program crashes or is killed, the file may remain. Consider documenting this behavior or adding a comment about cleanup guarantees. Additionally, the expect message should be more descriptive about why this might fail (e.g., "Failed to create temporary mmap backing file - check disk space and /tmp permissions").

Copilot · 2025-11-27T19:35:16Z

slab-mmap/tests/serialization_tests.rs

+    // 序列化为JSON
+    let json = serde_json::to_string(slab).expect("Failed to serialize slab");
+
+    // 从JSON反序列化
+    let deserialized: Slab<T> = serde_json::from_str(&json).expect("Failed to deserialize slab");
+
+    deserialized
+}
+
+#[test]
+fn test_basic_serialization() {
+    // 测试基本类型的序列化
+    let mut slab = Slab::new();
+
+    // 插入一些数据
+    let idx1 = slab.insert(42);
+    let idx2 = slab.insert(100);
+    let idx3 = slab.insert(-1);
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), slab.len());
+    assert_eq!(deserialized.get(idx1), Some(&42));
+    assert_eq!(deserialized.get(idx2), Some(&100));
+    assert_eq!(deserialized.get(idx3), Some(&-1));
+}
+
+#[test]
+fn test_empty_slab_serialization() {
+    // 测试空slab的序列化
+    let slab: Slab<i32> = Slab::new();
+
+    let deserialized = roundtrip_serde(&slab);
+
+    assert!(deserialized.is_empty());
+    assert_eq!(deserialized.len(), 0);
+}
+
+#[test]
+fn test_with_holes_serialization() {
+    // 测试有空洞的slab序列化
+    let mut slab = Slab::new();
+
+    // 插入数据
+    let idx1 = slab.insert("a".to_string());
+    let idx2 = slab.insert("b".to_string());
+    let idx3 = slab.insert("c".to_string());
+    let idx4 = slab.insert("d".to_string());
+
+    // 删除一些元素，创建空洞
+    slab.try_remove(idx2);
+    slab.try_remove(idx4);
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), slab.len());
+    assert_eq!(deserialized.get(idx1), Some(&"a".to_string()));
+    assert!(deserialized.get(idx2).is_none());
+    assert_eq!(deserialized.get(idx3), Some(&"c".to_string()));
+    assert!(deserialized.get(idx4).is_none());
+
+    // 验证迭代器行为一致
+    let original_iter: Vec<_> = slab.iter().collect();
+    let deserialized_iter: Vec<_> = deserialized.iter().collect();
+    assert_eq!(original_iter, deserialized_iter);
+}
+
+#[test]
+fn test_complex_structure_serialization() {
+    // 测试复杂结构体的序列化
+    let mut slab = Slab::new();
+
+    // 创建并插入复杂数据
+    let data1 = ComplexData {
+        id: 1,
+        name: "test1".to_string(),
+        values: vec![1, 2, 3, 4, 5],
+        metadata: HashMap::from([
+            ("type".to_string(), "example".to_string()),
+            ("version".to_string(), "1.0".to_string()),
+        ]),
+    };
+
+    let data2 = ComplexData {
+        id: 2,
+        name: "test2".to_string(),
+        values: vec![],
+        metadata: HashMap::new(),
+    };
+
+    let idx1 = slab.insert(data1.clone());
+    let idx2 = slab.insert(data2.clone());
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), slab.len());
+    assert_eq!(deserialized.get(idx1), Some(&data1));
+    assert_eq!(deserialized.get(idx2), Some(&data2));
+}
+
+#[test]
+fn test_large_data_serialization() {
+    // 测试大数据集的序列化
+    let mut slab = Slab::new();
+
+    // 插入1000个元素
+    for i in 0..1000 {
+        slab.insert(format!("Item {i}"));
+    }
+
+    // 删除一些元素以创建空洞
+    for i in 0..1000 {
+        if i % 5 == 0 {
+            slab.try_remove(i);
+        }
+    }
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), slab.len());
+
+    // 随机验证一些元素
+    for i in 0..1000 {
+        if i % 5 != 0 {
+            // 没有被删除的元素
+            assert_eq!(deserialized.get(i), Some(&format!("Item {i}")));
+        } else {
+            // 被删除的元素
+            assert!(deserialized.get(i).is_none());
+        }
+    }
+}
+
+#[test]
+fn test_nested_slab_serialization() {
+    // 测试嵌套数据结构的序列化
+    #[derive(Serialize, Deserialize, PartialEq, Debug)]
+    struct NestedData {
+        name: String,
+        numbers: Vec<u32>,
+    }
+
+    let mut slab = Slab::new();
+
+    // 插入嵌套数据
+    for i in 0..100 {
+        let nested = NestedData {
+            name: format!("Nested {i}"),
+            numbers: (0..i).collect(),
+        };
+        slab.insert(nested);
+    }
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), slab.len());
+
+    // 验证每个元素
+    for i in 0..100 {
+        let original = slab.get(i).unwrap();
+        let deserialized_item = deserialized.get(i).unwrap();
+
+        assert_eq!(original.name, deserialized_item.name);
+        assert_eq!(original.numbers, deserialized_item.numbers);
+    }
+}
+
+#[test]
+fn test_mixed_types_serialization() {
+    // 使用枚举来测试混合类型
+    #[derive(Serialize, Deserialize, PartialEq, Debug)]
+    enum MixedType {
+        Int(i32),
+        String(String),
+        Bool(bool),
+        List(Vec<i32>),
+    }
+
+    let mut slab = Slab::new();
+
+    // 插入不同类型的数据
+    slab.insert(MixedType::Int(42));
+    slab.insert(MixedType::String("hello".to_string()));
+    slab.insert(MixedType::Bool(true));
+    slab.insert(MixedType::List(vec![1, 2, 3, 4, 5]));
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), 4);
+    assert_eq!(deserialized.get(0), Some(&MixedType::Int(42)));
+    assert_eq!(
+        deserialized.get(1),
+        Some(&MixedType::String("hello".to_string()))
+    );
+    assert_eq!(deserialized.get(2), Some(&MixedType::Bool(true)));
+    assert_eq!(
+        deserialized.get(3),
+        Some(&MixedType::List(vec![1, 2, 3, 4, 5]))
+    );
+}
+
+#[test]
+fn test_serialization_compatibility() {
+    // 测试与标准slab的序列化兼容性
+    use slab::Slab as StdSlab;
+
+    // 创建两个slab并插入相同的数据
+    let mut mmap_slab = Slab::new();
+    let mut std_slab = StdSlab::new();
+
+    for i in 0..100 {
+        mmap_slab.insert(i);
+        std_slab.insert(i);
+    }
+
+    // 从mmap_slab序列化的数据应该可以正确表示std_slab的数据
+    let mmap_json = serde_json::to_string(&mmap_slab).expect("Failed to serialize mmap slab");
+    let std_json = serde_json::to_string(&std_slab).expect("Failed to serialize std slab");
+
+    // 注意：由于实现细节不同，JSON表示可能不完全相同，但数据内容应该一致
+    // 我们可以通过反序列化来验证
+
+    let mmap_from_std: Slab<i32> =
+        serde_json::from_str(&std_json).expect("Failed to deserialize from std slab");
+    let std_from_mmap: StdSlab<i32> =
+        serde_json::from_str(&mmap_json).expect("Failed to deserialize from mmap slab");
+
+    // 验证数据一致性
+    assert_eq!(mmap_from_std.len(), 100);
+    assert_eq!(std_from_mmap.len(), 100);
+
+    for i in 0..100 {
+        let v = i as i32;
+        assert_eq!(mmap_from_std.get(i), Some(&v));
+        assert_eq!(std_from_mmap.get(i), Some(&v));
+    }
+}
+
+#[test]
+fn test_serialization_with_large_indexes() {
+    // 测试大索引情况下的序列化
+    let mut slab = Slab::new();
+
+    // 首先填充一些元素
+    for i in 0..100 {
+        slab.insert(i);
+    }
+
+    // 删除大部分元素，只保留最后几个
+    for i in 0..95 {
+        slab.try_remove(i);
+    }
+
+    // 此时slab中只有索引95-99的元素
+    assert_eq!(slab.len(), 5);
+
+    // 序列化并反序列化
+    let deserialized = roundtrip_serde(&slab);
+
+    // 验证数据一致性
+    assert_eq!(deserialized.len(), 5);
+    for i in 95..100 {
+        assert_eq!(deserialized.get(i), Some(&i));
+    }
+    for i in 0..95 {
+        assert!(deserialized.get(i).is_none());
+    }
+}


Chinese comments in test code should be in English for consistency. This appears throughout serialization_tests.rs (lines 19, 22, 30, 33, 38, 41, 50, 61, 64, 70, 74, 84, 92, 95, etc.).

Copilot · 2025-11-27T19:35:16Z

slab-mmap/src/builder.rs

@@ -0,0 +1,54 @@
+use super::Slab;
+
+/// Helper to rebuild a slab from arbitrary key/value pairs during deserialization.


[nitpick] Missing documentation for the Builder struct. While it's marked pub(crate), it's a key component of the deserialization process and would benefit from doc comments explaining its purpose, how it rebuilds the free list, and why it handles duplicate keys by dropping the old value.

Suggested change

/// Helper to rebuild a slab from arbitrary key/value pairs during deserialization.

/// A helper struct for reconstructing a `Slab` from arbitrary key/value pairs during deserialization.

///

/// The `Builder` is used to incrementally rebuild a slab by inserting key/value pairs,

/// typically as part of a deserialization process. It ensures that the slab's internal

/// free list is correctly reconstructed by scanning for unoccupied slots after all pairs

/// have been inserted. This guarantees that subsequent allocations from the slab will

/// behave as expected.

///

/// When inserting a key/value pair, if the key already exists and is occupied, the old

/// value is dropped and replaced with the new one. This matches the behavior of typical

/// deserialization, where the last occurrence of a key takes precedence. Duplicate keys

/// are handled by dropping the previous value to avoid memory leaks.

///

/// # Usage

/// - Use `Builder::with_capacity` to create a builder with a given capacity.

/// - Call `pair(key, value)` for each key/value to insert.

/// - Call `build()` to finalize and obtain the reconstructed `Slab`.

Copilot · 2025-11-27T19:35:18Z

slab-mmap/tests/boundary_tests.rs

+    // 测试空slab的基本属性
+    assert!(slab.is_empty());
+    assert_eq!(slab.len(), 0);
+
+    // 测试从空slab获取元素
+    assert!(slab.get(0).is_none());
+    assert!(slab.get_mut(0).is_none());
+
+    // 测试从空slab删除元素
+    assert!(slab.try_remove(0).is_none());
+
+    // 测试空slab的迭代器
+    let collected: Vec<_> = slab.iter().collect();
+    assert!(collected.is_empty());
+
+    // 插入第一个元素
+    let idx = slab.insert(42);
+    assert_eq!(idx, 0);
+    assert_eq!(slab.len(), 1);
+    assert_eq!(slab[idx], 42);
+}
+
+#[test]
+fn test_very_large_data() {
+    let mut slab = Slab::new();
+
+    // 创建一个很大的向量
+    let large_vector = vec![42; 1_000_000]; // 约4MB的数据
+
+    // 插入大块数据
+    let idx = slab.insert(large_vector.clone());
+
+    // 验证数据完整性
+    assert_eq!(slab.get(idx).unwrap(), &large_vector);
+
+    // 删除并验证
+    let removed = slab.try_remove(idx).unwrap();
+    assert_eq!(removed, large_vector);
+}
+
+#[test]
+fn test_zero_sized_types() {
+    // 测试零大小类型
+    let mut slab = Slab::new();
+
+    // 插入多个零大小类型
+    let indices = [slab.insert(()), slab.insert(()), slab.insert(())];
+
+    // 验证插入的索引
+    assert_eq!(indices[0], 0);
+    assert_eq!(indices[1], 1);
+    assert_eq!(indices[2], 2);
+
+    // 验证获取
+    assert!(slab.get(indices[0]).is_some());
+
+    // 测试删除和重用
+    slab.try_remove(indices[1]);
+    let reused_idx = slab.insert(());
+    assert_eq!(reused_idx, indices[1]);
+}
+
+#[test]
+fn test_edge_case_indices() {
+    let mut slab = Slab::new();
+
+    // 插入一些元素
+    for i in 0..100 {
+        slab.insert(i);
+    }
+
+    // 测试获取不存在的索引（超出范围）
+    assert!(slab.get(1000).is_none());
+    assert!(slab.get_mut(1000).is_none());
+    assert!(slab.try_remove(1000).is_none());
+
+    // 测试最大可能的有效索引（虽然在实际中我们不会达到）
+    // 由于MAX_SLOTS是u32::MAX，我们不应该直接测试，但可以测试接近增长边界
+
+    // 测试插入和删除交替进行，导致索引不连续
+    for i in 0..50 {
+        slab.try_remove(i);
+    }
+
+    // 验证空洞存在
+    for i in 0..50 {
+        assert!(slab.get(i).is_none());
+    }
+
+    // 再次插入，验证重用
+    for i in 0..50 {
+        let idx = slab.insert(i + 1000);
+        assert!(idx < 50); // 应该重用之前删除的索引
+    }
+}
+
+#[test]
+fn test_struct_with_drop() {
+    // 测试带有Drop实现的结构体，确保正确清理
+    use std::sync::Arc;
+
+    let counter = Arc::new(());
+
+    #[derive(Clone)]
+    struct DropTracked {
+        _inner: Arc<()>,
+    }
+
+    impl Drop for DropTracked {
+        fn drop(&mut self) {
+            // Drop操作会减少Arc的引用计数
+        }
+    }
+
+    let mut slab = Slab::new();
+
+    // 插入带有Arc的结构体
+    let indices = [
+        slab.insert(DropTracked {
+            _inner: counter.clone(),
+        }),
+        slab.insert(DropTracked {
+            _inner: counter.clone(),
+        }),
+        slab.insert(DropTracked {
+            _inner: counter.clone(),
+        }),
+    ];
+
+    // 引用计数应该是4（初始1 + 3次插入）
+    assert_eq!(Arc::strong_count(&counter), 4);
+
+    // 删除一个元素，引用计数应该减少1
+    slab.try_remove(indices[1]);
+    assert_eq!(Arc::strong_count(&counter), 3);
+
+    // 再次插入，引用计数增加1
+    slab.insert(DropTracked {
+        _inner: counter.clone(),
+    });
+    assert_eq!(Arc::strong_count(&counter), 4);
+
+    // 当slab被删除时，所有剩余的元素也应该被删除
+    drop(slab);
+    assert_eq!(Arc::strong_count(&counter), 1);
+}
+
+#[test]
+fn test_memory_mapping_edge_cases() {
+    // 测试内存映射相关的边界情况
+    // 注意：由于这涉及底层系统操作，我们只能测试高级行为
+
+    let mut slab = Slab::new();
+
+    // 测试从小容量开始，逐步增长
+    let current_capacity = 1024; // 初始容量
+
+    // 插入元素直到触发多次扩容
+    for i in 0..current_capacity * 4 {
+        slab.insert(i);
+    }
+
+    // 验证所有元素都能正确访问
+    for i in 0..current_capacity * 4 {
+        assert_eq!(slab.get(i).unwrap(), &i);
+    }
+}
+
+#[test]
+fn test_saturating_add_edge_case() {
+    // 测试next_slot的saturating_add行为
+    let mut slab = Slab::new();
+
+    // 这里我们不能真正测试u32::MAX的情况，但可以验证saturating_add的存在
+    // 从代码中我们可以看到next_slot使用了saturating_add来防止溢出
+
+    // 插入大量元素以验证扩容机制正常工作
+    for i in 0..10_000 {
+        slab.insert(i);
+    }
+
+    assert_eq!(slab.len(), 10_000);
+}
+
+#[test]
+fn test_mixed_data_types_performance() {
+    // 测试混合不同大小的数据类型
+    #[derive(Clone, PartialEq, Debug)]
+    enum Data {
+        Small(i32),
+        Medium(Vec<u8>),
+        Large(Vec<u8>),
+    }
+
+    let mut slab = Slab::new();
+
+    // 插入不同大小的数据
+    let small_idx = slab.insert(Data::Small(42));
+    let medium_idx = slab.insert(Data::Medium(vec![1u8; 100]));
+    let large_idx = slab.insert(Data::Large(vec![2u8; 10_000]));
+
+    // 验证所有数据都能正确访问
+    match &slab[small_idx] {
+        Data::Small(v) => assert_eq!(*v, 42),
+        _ => panic!("unexpected variant for small"),
+    }
+    match &slab[medium_idx] {
+        Data::Medium(v) => assert_eq!(v.len(), 100),
+        _ => panic!("unexpected variant for medium"),
+    }
+    match &slab[large_idx] {
+        Data::Large(v) => assert_eq!(v.len(), 10_000),
+        _ => panic!("unexpected variant for large"),
+    }
+
+    // 删除并重用
+    slab.try_remove(medium_idx);
+    let new_medium_idx = slab.insert(Data::Medium(vec![3u8; 50]));
+    assert_eq!(new_medium_idx, medium_idx);
+    match &slab[new_medium_idx] {
+        Data::Medium(v) => assert_eq!(v.len(), 50),
+        _ => panic!("unexpected variant for new medium"),
+    }
+}
+
+#[test]
+fn test_ensure_capacity_edge_cases() {
+    // 测试ensure_capacity的边界情况
+    // 注意：由于MAX_SLOTS是u32::MAX，我们不能直接测试，但可以测试接近边界的情况
+
+    let mut slab = Slab::new();
+
+    // 通过插入元素来触发扩容，而不是直接调用私有方法
+    let target_capacity = 2048;
+
+    for i in 0..target_capacity {
+        slab.insert(i);
+    }
+
+    for i in 0..target_capacity {
+        assert_eq!(slab.get(i), Some(&i));
+    }
+}
+
+#[test]
+fn test_read_after_delete_behavior() {
+    // 测试删除后的读取行为
+    let mut slab = Slab::new();
+
+    let idx = slab.insert("hello");
+    assert_eq!(slab.get(idx), Some(&"hello"));
+
+    // 删除元素
+    assert_eq!(slab.try_remove(idx), Some("hello"));
+
+    // 验证删除后无法读取
+    assert!(slab.get(idx).is_none());
+    assert!(slab.get_mut(idx).is_none());
+    assert!(slab.try_remove(idx).is_none());
+
+    // 重新插入到相同的索引
+    slab.insert("world");
+    assert_eq!(slab.get(idx), Some(&"world"));
+}
+
+#[test]
+fn test_insert_remove_alternating() {
+    // 测试插入删除交替进行的边界情况
+    let mut slab = Slab::new();
+    let mut keys = Vec::new();
+
+    // 执行一系列交替的插入和删除操作
+    for i in 0..1000 {
+        if i % 3 == 0 && !keys.is_empty() {
+            // 删除最老的键
+            let key = keys.remove(0);
+            slab.try_remove(key);
+        } else {
+            // 插入新元素
+            keys.push(slab.insert(i));
+        }
+    }
+
+    // 验证最终状态
+    assert_eq!(slab.len(), keys.len());
+
+    // 验证所有剩余的键都能正确访问
+    for &key in keys.iter() {
+        assert!(slab.get(key).is_some());
+    }
+}
+
+#[test]
+fn test_memory_usage_with_holes() {
+    // 测试有大量空洞的内存使用情况
+    let mut slab = Slab::new();
+
+    // 插入1000个元素
+    for i in 0..1000 {
+        slab.insert(i);
+    }
+
+    // 删除除了最后一个以外的所有元素
+    for i in 0..999 {
+        slab.try_remove(i);
+    }
+
+    // 验证只有最后一个元素保留
+    assert_eq!(slab.len(), 1);
+    assert_eq!(slab.get(999), Some(&999));
+
+    // 插入新元素应该重用之前删除的slot
+    for i in 1000..1500 {
+        let idx = slab.insert(i);
+        assert!(idx < 999); // 应该重用之前的slot
+    }
+
+    assert_eq!(slab.len(), 501);
+}
+
+#[test]
+fn test_sparse_deserialize_rebuilds_free_list() {
+    let json = json!({
+        "0": 10,
+        "1023": 20,
+        "2048": 30
+    });
+
+    let mut slab: Slab<i32> = serde_json::from_value(json).expect("deserialize sparse slab");
+    assert_eq!(slab.len(), 3);
+    assert_eq!(slab.get(0), Some(&10));
+    assert_eq!(slab.get(1023), Some(&20));
+    assert_eq!(slab.get(2048), Some(&30));
+    assert!(slab.get(1).is_none());
+
+    let reused_idx = slab.insert(99);
+    assert_eq!(reused_idx, 1);
+    assert_eq!(slab.get(reused_idx), Some(&99));
+    assert_eq!(slab.len(), 4);
+}


Chinese comments in test code should be in English for consistency. This appears throughout the boundary_tests.rs file (lines 8, 12, 16, 19, 23, 34, 37, 50, 53, etc.).

Copilot · 2025-11-27T19:35:18Z

slab-mmap/src/lib.rs

+    fn write_slot(&mut self, index: usize, value: T) {
+        let slot = unsafe { self.slots_mut().get_unchecked_mut(index) };
+        debug_assert!(!slot.occupied);
+        slot.value.write(value);
+        slot.occupied = true;
+    }


The write_slot method has a debug assertion that the slot is not occupied, but this is only checked in debug builds. If this invariant is violated in release builds, it could lead to memory leaks (the old value is overwritten without being dropped). Consider making this a regular assertion or returning a Result to make the contract explicit and prevent undefined behavior in release builds.

Copilot · 2025-11-27T19:35:19Z

slab-mmap/src/lib.rs

+            let index = self.next_slot;
+            self.next_slot = self.next_slot.saturating_add(1);
+            if self.next_slot > self.capacity {
+                self.grow();
+            }
+            index
+        };
+        self.ensure_capacity(index + 1);


The insert method calls ensure_capacity twice in the worst case (lines 145-146 and 150). When next_slot > capacity after incrementing, grow() calls ensure_capacity, then line 150 calls it again. While the second call returns early due to the check on line 81, this is still inefficient. Consider restructuring to only call ensure_capacity once:

if self.next_slot >= self.capacity { self.grow(); }

This also fixes the off-by-one issue where we grow when next_slot == capacity rather than when we're about to exceed it.

Suggested change

let index = self.next_slot;

self.next_slot = self.next_slot.saturating_add(1);

if self.next_slot > self.capacity {

self.grow();

}

index

};

self.ensure_capacity(index + 1);

if self.next_slot >= self.capacity {

self.grow();

}

let index = self.next_slot;

self.next_slot = self.next_slot.saturating_add(1);

index

};

Copilot · 2025-11-27T19:35:19Z

slab-mmap/tests/serialization_tests.rs

+    metadata: HashMap<String, String>,
+}
+
+// 序列化辅助函数


Chinese comments in test code should be in English for consistency with the rest of the codebase. This affects maintainability for international contributors.

Copilot · 2025-11-27T19:35:19Z

slab-mmap/src/lib.rs

+        for i in 0..self.next_slot {
+            if let Some(slot) = self.slot_mut(i) {
+                if slot.occupied {
+                    unsafe {
+                        slot.value.assume_init_drop();
+                    }
+                }
+            }
+        }


The Drop implementation iterates through all slots but uses slot_mut(i) which checks i < self.next_slot inside. However, the loop condition already guarantees this, making the Option check redundant. Consider using slots_mut() directly with an indexed loop for better performance and clarity:

for slot in &mut self.slots_mut()[..self.next_slot] { if slot.occupied { unsafe { slot.value.assume_init_drop(); } } }

To reduce memory consumption

ldm0 requested a review from Copilot November 27, 2025 19:28

Copilot started reviewing on behalf of ldm0 November 27, 2025 19:28 View session

Copilot finished reviewing on behalf of ldm0 November 27, 2025 19:31

Copilot AI reviewed Nov 27, 2025

View reviewed changes

feat(search): add make slab-fs mmaped

8f9c381

To reduce memory consumption

ldm0 force-pushed the mmap branch from 0845df7 to 8f9c381 Compare November 28, 2025 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opt(search): mmap slab-fs to reduce memory consumption #19

opt(search): mmap slab-fs to reduce memory consumption #19

ldm0 commented Nov 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Copilot AI Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"slab index must be less than u32::MAX"
	"slab capacity must be less than u32::MAX"

		@@ -0,0 +1,54 @@
		use super::Slab;

		/// Helper to rebuild a slab from arbitrary key/value pairs during deserialization.

-/// Helper to rebuild a slab from arbitrary key/value pairs during deserialization.
+/// A helper struct for reconstructing a `Slab` from arbitrary key/value pairs during deserialization.
+///
+/// The `Builder` is used to incrementally rebuild a slab by inserting key/value pairs,
+/// typically as part of a deserialization process. It ensures that the slab's internal
+/// free list is correctly reconstructed by scanning for unoccupied slots after all pairs
+/// have been inserted. This guarantees that subsequent allocations from the slab will
+/// behave as expected.
+///
+/// When inserting a key/value pair, if the key already exists and is occupied, the old
+/// value is dropped and replaced with the new one. This matches the behavior of typical
+/// deserialization, where the last occurrence of a key takes precedence. Duplicate keys
+/// are handled by dropping the previous value to avoid memory leaks.
+///
+/// # Usage
+/// - Use `Builder::with_capacity` to create a builder with a given capacity.
+/// - Call `pair(key, value)` for each key/value to insert.
+/// - Call `build()` to finalize and obtain the reconstructed `Slab`.

opt(search): mmap slab-fs to reduce memory consumption #19

Are you sure you want to change the base?

opt(search): mmap slab-fs to reduce memory consumption #19

Conversation

ldm0 commented Nov 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants