Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowering the memory requirements for benchmarks #7

Open
maawad opened this issue Apr 26, 2023 · 0 comments
Open

Lowering the memory requirements for benchmarks #7

maawad opened this issue Apr 26, 2023 · 0 comments

Comments

@maawad
Copy link
Member

maawad commented Apr 26, 2023

The benchmarking code now requires 20 GiBs of memory for a complete set of benchmarks. It would be nice to limit the memory requirements for benchmarking on workstations. The code (thrust) will throw an out-of-memory error or some exception when it runs out of memory. To help with limiting the memory requirements, here are the reasons why we need these 20 GiBs:

  1. The tree data structure:
    All memory allocations to the tree are either satisfied by the device_bump_allocator or SlabAllocator. Both allocators allocate 8 GiBs on construction by default. You may reduce this to 4 GiBs by changing template parameters (SlabAllocator only supports power-of-two allocations) but keep in mind that when inserting keys into the tree, I don't check for out-of-memory errors in device code (code will either segfault or deadlock). Also, keep in mind that device_bump_allocator does not support free memory, so benchmarks like VoB-Tree will not scale.

  2. Input:
    2.1. Point query benchmarks require only keys and values. For 50 million key-value pairs, the code will need ~0.2 GiBs for each array. The total will be $0.2 * 4 = 0.8$ GiBs (2 inputs, query, and result).
    2.2. Range query benchmarks require keys and values, and range query lower and upper bounds and output buffer. For an input size of 50 million pairs, the code will need $0.2 * 4 = 0.8$ GiBs for queries and key-value pairs, and it will need $0.2 *$ average_range_query_length GiBs for RQ output buffer. So for a range query of 32 (Figure 3.b), the RQ output will be 6.4 GiBs. Most nodes are ~2/3 full, so 32 RQ length will cover $32 / (2/3 * 14) \approx 3.5$ nodes. Another example is Figure 4, where we have 2.5 million RQs up to 1000 RQ length, which makes the RQ required size 9.3 GiB.

  3. Memory reclaimer: Allocates ~0.3 GiBs and can be changed by setting this number.

Maximum will be 8 (tree) + 0.8 (RQ input) + 9.3 (maximum RQ output) + 0.3 (reclaimer)= 18.4 GiBs. Notice that I never explicitly free GPU memory since I used a shared pointer wrapper around all allocations (see example) which means that any allocations get deallocated when it becomes out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant