Skip to content

Conversation

@ykwd
Copy link
Collaborator

@ykwd ykwd commented Aug 19, 2025

This feature will be used for persisting master KV metadata. The serialize and deserialize methods of OffsetAllocator are implemented as templates; otherwise, offset_allocator.h would need to directly or indirectly include type.h, which would result in circular dependencies and compilation failure.

@ykwd ykwd marked this pull request as ready for review August 20, 2025 08:12
@ykwd ykwd requested a review from xiaguan August 20, 2025 08:29
Copy link
Collaborator

@xiaguan xiaguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It introduces additional complexity.

From my perspective, I'd suggest:

std::vector<char> serialize();
tl::expected<T, ErrorCode> deserialize(char*, size_t); // vector works fine too

Since m_max_capacity is currently static, I believe the size of __Allocator should be static as well? The other members in OffsetAllocator also have static sizes.

My proposal is to modify its memory layout so it resides in a contiguous buffer with a static size.

For serialization/deserialization, we could simply allocate a buffer and perform a memcpy.

const uint64_t m_multiplier_bits;
const uint64_t m_capacity;
uint64_t m_multiplier_bits;
uint64_t m_capacity;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the allocator always allocates a max_capacity array, I believe m_current_capacity serves no purpose?

@xiaguan
Copy link
Collaborator

xiaguan commented Aug 20, 2025

for example

#include <array>
#include <cstddef>
#include <cstring>
#include <iostream>
#include <new>

struct Record {
    int id;
    std::array<int, 5> values;
};

inline constexpr std::size_t kWireSize = sizeof(Record);

int main() {
    Record r{42, {1, 2, 3, 4, 5}};

    char* buffer = new char[kWireSize];
    memcpy(buffer, &r, kWireSize);

    Record* r1 = (Record*)buffer;
    std::cout << "id=" << r1->id << " values:";
    for (int v : r1->values) std::cout << ' ' << v;
    std::cout << '\n';

    return 0;
}

@ykwd
Copy link
Collaborator Author

ykwd commented Aug 21, 2025

We had an offline discussion about this issue, and here is a summary of the conclusion. The value of m_max_capacity is dynamically computed from size using a formula. During initialization, two arrays of size m_max_capacity are allocated, but only init_capacity elements are initialized. On Linux, this effectively reserves virtual memory for m_max_capacity, while only committing physical memory for init_capacity. Since the memory usage is capacity * 32 bytes, and it is difficult to know the exact required capacity at initialization time, this approach essentially achieves on-demand elastic growth. Therefore, when performing serialization and deserialization, we should only handle the current_capacity elements.

@ykwd ykwd merged commit aecbeaa into kvcache-ai:main Aug 21, 2025
11 checks passed
@ykwd ykwd mentioned this pull request Aug 21, 2025
29 tasks
@ykwd ykwd deleted the dev/allocator-serialization branch September 25, 2025 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants