Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom d-ary heap implementation #7017

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SiarheiFedartsou
Copy link
Member

@SiarheiFedartsou SiarheiFedartsou commented Aug 19, 2024

Context

Benchmark highlights

CH Map Matching +16% ops/sec
MLD Map matching +30% ops/sec
CH Routing +3% ops/sec
MLD Routing +18% ops/sec
CH Table +7% ops/sec
MLD Table +16% ops/sec

osrm_contract run time on berlin-latest.osm.pbf -24%
osrm_customize run time on berlin-latest.osm.pbf -25%

Description

It turns out that with mutability enabled boost::heap::d_ary_heap becomes extremely ineffective. I wrote such benchmark which compares boost::heap::mutable_<true> vs boost::heap::mutable_<false>:

    const int NUM_ELEMENTS = 10000000;
    std::vector<uint16_t> data;
    data.reserve(NUM_ELEMENTS);

    std::mt19937 gen(42);
    std::uniform_int_distribution<> dis(1, 1000000);

    for (int i = 0; i < NUM_ELEMENTS; ++i) {
        data.push_back(dis(gen));
    }

    std::vector<int> resultBoostDAryHeapMutable;
    resultBoostDAryHeapMutable.reserve(NUM_ELEMENTS);
    auto start = std::chrono::high_resolution_clock::now();
    boost::heap::d_ary_heap<uint16_t, boost::heap::arity<4>, boost::heap::mutable_<true>, boost::heap::compare<std::greater<uint16_t>>> boostDAryHeapMutable;
    for (const auto& val : data) {
        boostDAryHeapMutable.push(val);
    }
    while (!boostDAryHeapMutable.empty()) {
        resultBoostDAryHeapMutable.push_back(boostDAryHeapMutable.top());
        boostDAryHeapMutable.pop();
    }
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    std::cout << "Boost d_ary_heap mutable took " << duration << " ms" << std::endl;

    std::vector<int> resultBoostDAryHeap;
    resultBoostDAryHeap.reserve(NUM_ELEMENTS);
    start = std::chrono::high_resolution_clock::now();
    boost::heap::d_ary_heap<uint16_t, boost::heap::arity<4>, boost::heap::compare<std::greater<uint16_t>>> boostDAryHeap;
    for (const auto& val : data) {
        boostDAryHeap.push(val);
    }
    while (!boostDAryHeap.empty()) {
        resultBoostDAryHeap.push_back(boostDAryHeap.top());
        boostDAryHeap.pop();
    }
    end = std::chrono::high_resolution_clock::now();
    duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
    std::cout << "Boost d_ary_heap took " << duration << " ms" << std::endl;
}

And it turns out that mutable implementation(i.e. where we can decrease priorities) is 10-12x(!) slower than non-mutable one:

Boost d_ary_heap mutable took 11142 ms
Boost d_ary_heap took 901 ms

Implementation of boost::heap::d_ary_heap is quite difficult to read, but as I understood in order to achieve mutability it uses std::list under the hood what obviously makes it quite slow(at least due to additional allocations and non cache-friendliness).
So I decided to make our own implementation of 4-ary heap which handles mutability in a different way: instead of having stable iterators we use indexes in heap's underlying vector with a bit of additional logic to update stored indexes when elements are reordered within the heap.

Benchmark Results

Benchmark Base PR
alias aliased u32: 11076
plain u32: 10972.9
aliased double: 15129.3
plain double: 15113.2
aliased u32: 11009.8
plain u32: 10996.4
aliased double: 15039.1
plain double: 15025.3
e2e_match_ch Ops: 25.55 ± 0.01 ops/s. Best: 25.57 ops/s
Total: 5126.51ms ± 2.43ms. Best: 5122.78ms
Min time: 3.05ms ± 0.05ms
Mean time: 39.13ms ± 0.02ms
Median time: 27.37ms ± 0.08ms
95th percentile: 132.54ms ± 0.35ms
99th percentile: 158.85ms ± 0.46ms
Max time: 179.68ms ± 0.75ms
Ops: 30.69 ± 0.03 ops/s. Best: 30.75 ops/s
Total: 4268.10ms ± 5.06ms. Best: 4259.72ms
Min time: 2.93ms ± 0.05ms
Mean time: 32.58ms ± 0.04ms
Median time: 22.89ms ± 0.14ms
95th percentile: 106.88ms ± 0.12ms
99th percentile: 128.44ms ± 0.57ms
Max time: 141.70ms ± 0.89ms
e2e_match_mld Ops: 43.57 ± 0.02 ops/s. Best: 43.60 ops/s
Total: 3006.32ms ± 1.34ms. Best: 3004.43ms
Min time: 2.58ms ± 0.05ms
Mean time: 22.95ms ± 0.01ms
Median time: 12.06ms ± 0.12ms
95th percentile: 75.20ms ± 0.12ms
99th percentile: 86.91ms ± 0.14ms
Max time: 100.71ms ± 0.30ms
Ops: 59.95 ± 0.05 ops/s. Best: 60.03 ops/s
Total: 2184.99ms ± 1.71ms. Best: 2182.09ms
Min time: 2.24ms ± 0.02ms
Mean time: 16.68ms ± 0.01ms
Median time: 9.72ms ± 0.04ms
95th percentile: 52.05ms ± 0.10ms
99th percentile: 60.11ms ± 0.16ms
Max time: 69.53ms ± 0.39ms
e2e_nearest_ch Ops: 635.40 ± 2.55 ops/s. Best: 640.86 ops/s
Total: 1573.68ms ± 6.29ms. Best: 1560.39ms
Min time: 1.28ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.54ms ± 0.00ms
95th percentile: 1.96ms ± 0.01ms
99th percentile: 2.05ms ± 0.01ms
Max time: 9.38ms ± 7.45ms
Ops: 636.59 ± 3.36 ops/s. Best: 642.66 ops/s
Total: 1570.80ms ± 8.30ms. Best: 1556.04ms
Min time: 1.28ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.53ms ± 0.01ms
95th percentile: 1.96ms ± 0.01ms
99th percentile: 2.04ms ± 0.01ms
Max time: 9.32ms ± 7.40ms
e2e_nearest_mld Ops: 635.02 ± 4.27 ops/s. Best: 640.52 ops/s
Total: 1574.46ms ± 10.06ms. Best: 1561.23ms
Min time: 1.29ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.54ms ± 0.01ms
95th percentile: 1.97ms ± 0.01ms
99th percentile: 2.05ms ± 0.01ms
Max time: 9.39ms ± 7.43ms
Ops: 636.33 ± 2.53 ops/s. Best: 639.88 ops/s
Total: 1571.45ms ± 6.62ms. Best: 1562.79ms
Min time: 1.28ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.54ms ± 0.00ms
95th percentile: 1.96ms ± 0.00ms
99th percentile: 2.05ms ± 0.01ms
Max time: 9.28ms ± 7.37ms
e2e_route_ch Ops: 217.14 ± 0.62 ops/s. Best: 218.12 ops/s
Total: 4605.59ms ± 13.80ms. Best: 4584.62ms
Min time: 1.83ms ± 0.03ms
Mean time: 4.61ms ± 0.01ms
Median time: 4.70ms ± 0.01ms
95th percentile: 6.06ms ± 0.03ms
99th percentile: 6.60ms ± 0.04ms
Max time: 13.34ms ± 6.42ms
Ops: 218.93 ± 0.47 ops/s. Best: 219.97 ops/s
Total: 4567.70ms ± 10.21ms. Best: 4546.15ms
Min time: 1.84ms ± 0.05ms
Mean time: 4.57ms ± 0.01ms
Median time: 4.66ms ± 0.01ms
95th percentile: 6.01ms ± 0.03ms
99th percentile: 6.55ms ± 0.03ms
Max time: 13.24ms ± 6.37ms
e2e_route_mld Ops: 178.18 ± 0.27 ops/s. Best: 178.71 ops/s
Total: 5612.39ms ± 8.61ms. Best: 5595.71ms
Min time: 1.78ms ± 0.04ms
Mean time: 5.61ms ± 0.01ms
Median time: 5.73ms ± 0.02ms
95th percentile: 7.61ms ± 0.03ms
99th percentile: 8.15ms ± 0.04ms
Max time: 14.77ms ± 5.89ms
Ops: 186.43 ± 0.32 ops/s. Best: 187.08 ops/s
Total: 5363.86ms ± 9.38ms. Best: 5345.20ms
Min time: 1.78ms ± 0.03ms
Mean time: 5.36ms ± 0.01ms
Median time: 5.47ms ± 0.02ms
95th percentile: 7.25ms ± 0.01ms
99th percentile: 7.82ms ± 0.06ms
Max time: 14.42ms ± 5.98ms
e2e_table_ch Ops: 218.31 ± 0.48 ops/s. Best: 219.32 ops/s
Total: 4580.74ms ± 10.97ms. Best: 4559.55ms
Min time: 2.46ms ± 0.04ms
Mean time: 4.58ms ± 0.01ms
Median time: 4.58ms ± 0.01ms
95th percentile: 6.23ms ± 0.02ms
99th percentile: 6.58ms ± 0.04ms
Max time: 13.70ms ± 7.04ms
Ops: 226.94 ± 0.38 ops/s. Best: 227.56 ops/s
Total: 4406.36ms ± 7.58ms. Best: 4394.42ms
Min time: 2.40ms ± 0.03ms
Mean time: 4.41ms ± 0.01ms
Median time: 4.40ms ± 0.02ms
95th percentile: 5.96ms ± 0.02ms
99th percentile: 6.29ms ± 0.03ms
Max time: 13.45ms ± 7.07ms
e2e_table_mld Ops: 69.13 ± 0.03 ops/s. Best: 69.16 ops/s
Total: 14466.36ms ± 6.86ms. Best: 14460.18ms
Min time: 6.01ms ± 0.04ms
Mean time: 14.47ms ± 0.01ms
Median time: 14.39ms ± 0.01ms
95th percentile: 22.07ms ± 0.04ms
99th percentile: 23.22ms ± 0.08ms
Max time: 29.71ms ± 5.67ms
Ops: 78.73 ± 0.02 ops/s. Best: 78.76 ops/s
Total: 12701.45ms ± 4.01ms. Best: 12696.81ms
Min time: 5.38ms ± 0.02ms
Mean time: 12.70ms ± 0.00ms
Median time: 12.65ms ± 0.01ms
95th percentile: 19.31ms ± 0.06ms
99th percentile: 20.29ms ± 0.06ms
Max time: 26.86ms ± 5.89ms
e2e_trip_ch Ops: 62.95 ± 0.03 ops/s. Best: 63.00 ops/s
Total: 15885.05ms ± 8.02ms. Best: 15874.26ms
Min time: 2.43ms ± 0.22ms
Mean time: 15.88ms ± 0.01ms
Median time: 15.10ms ± 0.03ms
95th percentile: 28.09ms ± 0.03ms
99th percentile: 30.17ms ± 0.09ms
Max time: 32.49ms ± 1.45ms
Ops: 64.23 ± 0.04 ops/s. Best: 64.29 ops/s
Total: 15568.75ms ± 9.29ms. Best: 15553.72ms
Min time: 2.38ms ± 0.17ms
Mean time: 15.57ms ± 0.01ms
Median time: 14.78ms ± 0.03ms
95th percentile: 27.65ms ± 0.05ms
99th percentile: 29.60ms ± 0.09ms
Max time: 32.03ms ± 1.63ms
e2e_trip_mld Ops: 36.66 ± 0.01 ops/s. Best: 36.68 ops/s
Total: 27281.21ms ± 8.68ms. Best: 27263.72ms
Min time: 2.42ms ± 0.17ms
Mean time: 27.28ms ± 0.01ms
Median time: 26.38ms ± 0.05ms
95th percentile: 44.34ms ± 0.05ms
99th percentile: 47.07ms ± 0.12ms
Max time: 49.41ms ± 0.11ms
Ops: 40.05 ± 0.02 ops/s. Best: 40.08 ops/s
Total: 24966.90ms ± 14.12ms. Best: 24951.03ms
Min time: 2.45ms ± 0.19ms
Mean time: 24.97ms ± 0.01ms
Median time: 24.13ms ± 0.07ms
95th percentile: 41.08ms ± 0.04ms
99th percentile: 43.63ms ± 0.11ms
Max time: 45.98ms ± 0.42ms
json-render String: 8.89322ms
Stringstream: 14.6254ms
Vector: 9.48771ms
String: 9.15423ms
Stringstream: 14.6142ms
Vector: 9.53894ms
match_ch Default radius:
7.06186ms/req at 82 coordinate
0.0861202ms/coordinate
Radius 10m:
25.0142ms/req at 82 coordinate
0.305052ms/coordinate
Default radius:
6.28653ms/req at 82 coordinate
0.076665ms/coordinate
Radius 10m:
22.18ms/req at 82 coordinate
0.270488ms/coordinate
match_mld Default radius:
4.35379ms/req at 82 coordinate
0.053095ms/coordinate
Radius 10m:
16.2432ms/req at 82 coordinate
0.198088ms/coordinate
Default radius:
3.11549ms/req at 82 coordinate
0.0379938ms/coordinate
Radius 10m:
10.7051ms/req at 82 coordinate
0.13055ms/coordinate
node_match_ch Ops: 159.3 ± 0.5 ops/s. Best: 160.0 ops/s Ops: 186.1 ± 1.3 ops/s. Best: 187.3 ops/s
node_match_mld Ops: 219.5 ± 2.2 ops/s. Best: 221.6 ops/s Ops: 286.7 ± 1.7 ops/s. Best: 290.0 ops/s
node_nearest_ch Ops: 9989.3 ± 821.8 ops/s. Best: 11065.2 ops/s Ops: 9393.8 ± 443.1 ops/s. Best: 10209.7 ops/s
node_nearest_mld Ops: 9286.5 ± 369.8 ops/s. Best: 9970.7 ops/s Ops: 10275.8 ± 858.9 ops/s. Best: 12036.0 ops/s
node_route_ch Ops: 1002.6 ± 25.0 ops/s. Best: 1039.0 ops/s Ops: 1031.5 ± 23.5 ops/s. Best: 1057.3 ops/s
node_route_mld Ops: 506.0 ± 6.3 ops/s. Best: 515.4 ops/s Ops: 585.5 ± 3.6 ops/s. Best: 590.5 ops/s
node_table_ch Ops: 186.4 ± 1.9 ops/s. Best: 188.9 ops/s Ops: 197.6 ± 1.9 ops/s. Best: 200.9 ops/s
node_table_mld Ops: 37.8 ± 0.1 ops/s. Best: 37.9 ops/s Ops: 44.0 ± 0.1 ops/s. Best: 44.1 ops/s
node_trip_ch Ops: 185.9 ± 1.0 ops/s. Best: 187.0 ops/s Ops: 193.7 ± 0.4 ops/s. Best: 194.6 ops/s
node_trip_mld Ops: 61.4 ± 0.2 ops/s. Best: 61.6 ops/s Ops: 71.8 ± 0.3 ops/s. Best: 72.1 ops/s
osrm_contract Time: 184.30s Peak RAM: 195.01MB Time: 141.54s Peak RAM: 195.00MB
osrm_customize Time: 2.53s Peak RAM: 112.45MB Time: 1.92s Peak RAM: 112.45MB
osrm_extract Time: 24.33s Peak RAM: 398.68MB Time: 24.26s Peak RAM: 399.01MB
osrm_partition Time: 5.98s Peak RAM: 119.62MB Time: 5.95s Peak RAM: 119.43MB
packedvector random write:
std::vector 184015 ms
util::packed_vector 373595 ms
slowdown: 2.03025
random read:
std::vector 100548 ms
util::packed_vector 190832 ms
slowdown: 1.89791
random write:
std::vector 184979 ms
util::packed_vector 373751 ms
slowdown: 2.02051
random read:
std::vector 100553 ms
util::packed_vector 190728 ms
slowdown: 1.89679
random_match_ch 500 matches, default radius
ops: 120.21 ± 0.19 ops/s. best: 120.49ops/s.
total: 474.18 ± 0.76ms. best: 473.05ms.
avg: 8.32 ± 0.01ms
min: 0.23 ± 0.01ms
max: 41.83 ± 0.17ms
p99: 41.83 ± 0.17ms

500 matches, radius=10
ops: 34.48 ± 0.05 ops/s. best: 34.53ops/s.
total: 1856.08 ± 2.44ms. best: 1853.21ms.
avg: 29.00 ± 0.04ms
min: 0.22 ± 0.00ms
max: 412.12 ± 1.39ms
p99: 412.12 ± 1.39ms

500 matches, radius=20
ops: 8.11 ± 0.01 ops/s. best: 8.12ops/s.
total: 8019.57 ± 8.18ms. best: 8009.41ms.
avg: 123.38 ± 0.13ms
min: 0.49 ± 0.00ms
max: 2128.87 ± 3.56ms
p99: 2128.87 ± 3.56ms

Peak RAM: 55.000MB
500 matches, default radius
ops: 146.70 ± 0.27 ops/s. best: 147.08ops/s.
total: 388.54 ± 0.73ms. best: 387.54ms.
avg: 6.82 ± 0.01ms
min: 0.22 ± 0.00ms
max: 34.01 ± 0.07ms
p99: 34.01 ± 0.07ms

500 matches, radius=10
ops: 42.64 ± 0.04 ops/s. best: 42.68ops/s.
total: 1500.93 ± 1.58ms. best: 1499.46ms.
avg: 23.45 ± 0.02ms
min: 0.22 ± 0.00ms
max: 332.39 ± 0.72ms
p99: 332.39 ± 0.72ms

500 matches, radius=20
ops: 10.10 ± 0.01 ops/s. best: 10.12ops/s.
total: 6436.36 ± 8.71ms. best: 6423.04ms.
avg: 99.02 ± 0.13ms
min: 0.43 ± 0.00ms
max: 1691.74 ± 4.69ms
p99: 1691.74 ± 4.69ms

Peak RAM: 55.000MB
random_match_mld 500 matches, default radius
ops: 205.04 ± 0.49 ops/s. best: 205.56ops/s.
total: 278.00 ± 0.67ms. best: 277.29ms.
avg: 4.88 ± 0.01ms
min: 0.21 ± 0.00ms
max: 27.09 ± 0.05ms
p99: 27.09 ± 0.05ms

500 matches, radius=10
ops: 73.10 ± 0.06 ops/s. best: 73.17ops/s.
total: 875.56 ± 0.73ms. best: 874.64ms.
avg: 13.68 ± 0.01ms
min: 0.21 ± 0.00ms
max: 161.38 ± 0.32ms
p99: 161.38 ± 0.32ms

500 matches, radius=20
ops: 15.37 ± 0.01 ops/s. best: 15.38ops/s.
total: 4229.16 ± 2.25ms. best: 4225.76ms.
avg: 65.06 ± 0.03ms
min: 0.29 ± 0.00ms
max: 843.11 ± 1.06ms
p99: 843.11 ± 1.06ms

Peak RAM: 51.500MB
500 matches, default radius
ops: 305.47 ± 1.12 ops/s. best: 306.62ops/s.
total: 186.60 ± 0.69ms. best: 185.90ms.
avg: 3.27 ± 0.01ms
min: 0.18 ± 0.00ms
max: 18.14 ± 0.05ms
p99: 18.14 ± 0.05ms

500 matches, radius=10
ops: 112.95 ± 0.08 ops/s. best: 113.05ops/s.
total: 566.63 ± 0.38ms. best: 566.14ms.
avg: 8.85 ± 0.01ms
min: 0.20 ± 0.00ms
max: 100.66 ± 0.10ms
p99: 100.66 ± 0.10ms

500 matches, radius=20
ops: 23.63 ± 0.02 ops/s. best: 23.66ops/s.
total: 2750.81 ± 1.83ms. best: 2747.78ms.
avg: 42.32 ± 0.03ms
min: 0.26 ± 0.00ms
max: 536.31 ± 0.44ms
p99: 536.31 ± 0.44ms

Peak RAM: 51.500MB
random_nearest_ch 10000 nearest, number_of_results=1
ops: 21344.30 ± 41.26 ops/s. best: 21377.87ops/s.
total: 468.51 ± 0.91ms. best: 467.77ms.
avg: 0.05 ± 0.00ms
min: 0.02 ± 0.00ms
max: 0.16 ± 0.03ms
p99: 0.10 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15599.62 ± 3.63 ops/s. best: 15604.45ops/s.
total: 641.04 ± 0.15ms. best: 640.84ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.15 ± 0.00ms
p99: 0.12 ± 0.00ms

10000 nearest, number_of_results=10
ops: 11959.99 ± 3.89 ops/s. best: 11965.45ops/s.
total: 836.12 ± 0.27ms. best: 835.74ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.19 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
10000 nearest, number_of_results=1
ops: 21446.43 ± 39.13 ops/s. best: 21477.18ops/s.
total: 466.28 ± 0.85ms. best: 465.61ms.
avg: 0.05 ± 0.00ms
min: 0.02 ± 0.00ms
max: 0.15 ± 0.02ms
p99: 0.10 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15616.52 ± 12.28 ops/s. best: 15628.15ops/s.
total: 640.35 ± 0.50ms. best: 639.87ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.16 ± 0.02ms
p99: 0.13 ± 0.00ms

10000 nearest, number_of_results=10
ops: 11960.74 ± 4.46 ops/s. best: 11965.85ops/s.
total: 836.07 ± 0.31ms. best: 835.71ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.19 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
random_nearest_mld 10000 nearest, number_of_results=1
ops: 21347.69 ± 37.14 ops/s. best: 21374.15ops/s.
total: 468.44 ± 0.82ms. best: 467.85ms.
avg: 0.05 ± 0.00ms
min: 0.02 ± 0.00ms
max: 0.15 ± 0.02ms
p99: 0.10 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15600.41 ± 6.40 ops/s. best: 15606.31ops/s.
total: 641.01 ± 0.26ms. best: 640.77ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.15 ± 0.00ms
p99: 0.13 ± 0.00ms

10000 nearest, number_of_results=10
ops: 11957.44 ± 6.00 ops/s. best: 11963.98ops/s.
total: 836.30 ± 0.43ms. best: 835.84ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.19 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
10000 nearest, number_of_results=1
ops: 21445.62 ± 32.89 ops/s. best: 21472.30ops/s.
total: 466.30 ± 0.72ms. best: 465.72ms.
avg: 0.05 ± 0.00ms
min: 0.02 ± 0.00ms
max: 0.15 ± 0.02ms
p99: 0.10 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15618.99 ± 6.24 ops/s. best: 15631.75ops/s.
total: 640.25 ± 0.26ms. best: 639.72ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.15 ± 0.00ms
p99: 0.12 ± 0.00ms

10000 nearest, number_of_results=10
ops: 11947.05 ± 5.39 ops/s. best: 11952.48ops/s.
total: 837.03 ± 0.39ms. best: 836.65ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.19 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
random_route_ch 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 288.66 ± 0.15 ops/s. best: 288.80ops/s.
total: 3408.84 ± 1.90ms. best: 3407.15ms.
avg: 3.46 ± 0.00ms
min: 0.45 ± 0.01ms
max: 5.94 ± 0.01ms
p99: 5.27 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 324.33 ± 0.05 ops/s. best: 324.42ops/s.
total: 3083.29 ± 0.44ms. best: 3082.42ms.
avg: 3.08 ± 0.00ms
min: 0.12 ± 0.00ms
max: 8.35 ± 0.02ms
p99: 7.15 ± 0.01ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 597.96 ± 0.06 ops/s. best: 598.05ops/s.
total: 1645.59 ± 0.16ms. best: 1645.35ms.
avg: 1.67 ± 0.00ms
min: 0.35 ± 0.00ms
max: 2.68 ± 0.01ms
p99: 2.44 ± 0.00ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 618.00 ± 0.15 ops/s. best: 618.24ops/s.
total: 1618.13 ± 0.40ms. best: 1617.49ms.
avg: 1.62 ± 0.00ms
min: 0.09 ± 0.00ms
max: 5.51 ± 0.01ms
p99: 4.01 ± 0.02ms

Peak RAM: 83.500MB
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 294.71 ± 0.09 ops/s. best: 294.81ops/s.
total: 3338.92 ± 1.04ms. best: 3337.71ms.
avg: 3.39 ± 0.00ms
min: 0.43 ± 0.00ms
max: 5.80 ± 0.01ms
p99: 5.13 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 331.62 ± 0.12 ops/s. best: 331.78ops/s.
total: 3015.49 ± 1.06ms. best: 3014.04ms.
avg: 3.02 ± 0.00ms
min: 0.12 ± 0.00ms
max: 8.11 ± 0.01ms
p99: 6.97 ± 0.02ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 631.06 ± 0.12 ops/s. best: 631.29ops/s.
total: 1559.28 ± 0.31ms. best: 1558.70ms.
avg: 1.58 ± 0.00ms
min: 0.33 ± 0.00ms
max: 2.48 ± 0.01ms
p99: 2.35 ± 0.00ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 649.07 ± 0.08 ops/s. best: 649.22ops/s.
total: 1540.68 ± 0.19ms. best: 1540.32ms.
avg: 1.54 ± 0.00ms
min: 0.09 ± 0.00ms
max: 5.25 ± 0.00ms
p99: 3.78 ± 0.00ms

Peak RAM: 83.500MB
random_route_mld 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 146.74 ± 0.02 ops/s. best: 146.76ops/s.
total: 6705.87 ± 1.05ms. best: 6704.68ms.
avg: 6.81 ± 0.00ms
min: 0.42 ± 0.00ms
max: 16.37 ± 0.02ms
p99: 11.08 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 139.17 ± 0.02 ops/s. best: 139.20ops/s.
total: 7185.68 ± 0.78ms. best: 7184.05ms.
avg: 7.19 ± 0.00ms
min: 0.11 ± 0.00ms
max: 16.54 ± 0.08ms
p99: 15.57 ± 0.02ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 204.62 ± 0.03 ops/s. best: 204.67ops/s.
total: 4809.01 ± 0.81ms. best: 4807.68ms.
avg: 4.89 ± 0.00ms
min: 0.36 ± 0.00ms
max: 13.58 ± 0.03ms
p99: 8.43 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 175.00 ± 0.04 ops/s. best: 175.06ops/s.
total: 5714.13 ± 1.20ms. best: 5712.32ms.
avg: 5.71 ± 0.00ms
min: 0.08 ± 0.00ms
max: 12.83 ± 0.02ms
p99: 11.99 ± 0.02ms

Peak RAM: 73.531MB
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 168.21 ± 0.04 ops/s. best: 168.28ops/s.
total: 5850.00 ± 1.26ms. best: 5847.43ms.
avg: 5.95 ± 0.00ms
min: 0.42 ± 0.00ms
max: 13.64 ± 0.03ms
p99: 9.53 ± 0.02ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 161.19 ± 0.01 ops/s. best: 161.20ops/s.
total: 6204.02 ± 0.48ms. best: 6203.31ms.
avg: 6.20 ± 0.00ms
min: 0.11 ± 0.00ms
max: 14.53 ± 0.06ms
p99: 13.57 ± 0.01ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 248.02 ± 0.03 ops/s. best: 248.06ops/s.
total: 3967.49 ± 0.55ms. best: 3966.74ms.
avg: 4.03 ± 0.00ms
min: 0.35 ± 0.00ms
max: 10.85 ± 0.01ms
p99: 6.92 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 210.60 ± 0.03 ops/s. best: 210.65ops/s.
total: 4748.30 ± 0.64ms. best: 4747.30ms.
avg: 4.75 ± 0.00ms
min: 0.08 ± 0.00ms
max: 11.03 ± 0.02ms
p99: 10.06 ± 0.02ms

Peak RAM: 73.531MB
random_table_ch 250 tables, 3 coordinates
ops: 1158.95 ± 5.04 ops/s. best: 1162.61ops/s.
total: 215.72 ± 0.94ms. best: 215.03ms.
avg: 0.86 ± 0.00ms
min: 0.60 ± 0.00ms
max: 1.26 ± 0.14ms
p99: 1.11 ± 0.02ms

250 tables, 25 coordinates
ops: 128.52 ± 0.02 ops/s. best: 128.56ops/s.
total: 1945.21 ± 0.29ms. best: 1944.57ms.
avg: 7.78 ± 0.00ms
min: 7.09 ± 0.01ms
max: 8.53 ± 0.01ms
p99: 8.44 ± 0.01ms

250 tables, 50 coordinates
ops: 62.40 ± 0.01 ops/s. best: 62.41ops/s.
total: 4006.70 ± 0.74ms. best: 4005.48ms.
avg: 16.03 ± 0.00ms
min: 14.88 ± 0.02ms
max: 17.14 ± 0.01ms
p99: 16.99 ± 0.02ms

Peak RAM: 62.500MB
250 tables, 3 coordinates
ops: 1248.62 ± 5.26 ops/s. best: 1253.29ops/s.
total: 200.23 ± 0.85ms. best: 199.48ms.
avg: 0.80 ± 0.00ms
min: 0.55 ± 0.00ms
max: 1.16 ± 0.12ms
p99: 1.04 ± 0.02ms

250 tables, 25 coordinates
ops: 137.83 ± 0.01 ops/s. best: 137.85ops/s.
total: 1813.80 ± 0.18ms. best: 1813.51ms.
avg: 7.26 ± 0.00ms
min: 6.58 ± 0.01ms
max: 7.99 ± 0.01ms
p99: 7.86 ± 0.01ms

250 tables, 50 coordinates
ops: 66.83 ± 0.00 ops/s. best: 66.84ops/s.
total: 3740.57 ± 0.22ms. best: 3740.23ms.
avg: 14.96 ± 0.00ms
min: 13.86 ± 0.01ms
max: 15.99 ± 0.01ms
p99: 15.84 ± 0.01ms

Peak RAM: 63.000MB
random_table_mld 250 tables, 3 coordinates
ops: 227.05 ± 0.28 ops/s. best: 227.45ops/s.
total: 1101.09 ± 1.35ms. best: 1099.14ms.
avg: 4.40 ± 0.01ms
min: 3.46 ± 0.00ms
max: 5.67 ± 0.01ms
p99: 5.47 ± 0.05ms

250 tables, 25 coordinates
ops: 23.83 ± 0.00 ops/s. best: 23.84ops/s.
total: 10490.59 ± 2.20ms. best: 10487.77ms.
avg: 41.96 ± 0.01ms
min: 38.70 ± 0.04ms
max: 45.83 ± 0.11ms
p99: 45.19 ± 0.05ms

250 tables, 50 coordinates
ops: 11.02 ± 0.00 ops/s. best: 11.02ops/s.
total: 22690.88 ± 7.06ms. best: 22675.92ms.
avg: 90.76 ± 0.03ms
min: 86.23 ± 0.06ms
max: 96.50 ± 0.10ms
p99: 94.81 ± 0.07ms

Peak RAM: 62.734MB
250 tables, 3 coordinates
ops: 264.79 ± 0.35 ops/s. best: 265.04ops/s.
total: 944.16 ± 1.25ms. best: 943.26ms.
avg: 3.78 ± 0.00ms
min: 2.94 ± 0.00ms
max: 4.86 ± 0.02ms
p99: 4.70 ± 0.04ms

250 tables, 25 coordinates
ops: 27.37 ± 0.00 ops/s. best: 27.38ops/s.
total: 9132.52 ± 0.65ms. best: 9131.27ms.
avg: 36.53 ± 0.00ms
min: 33.65 ± 0.02ms
max: 39.92 ± 0.02ms
p99: 39.34 ± 0.03ms

250 tables, 50 coordinates
ops: 12.58 ± 0.00 ops/s. best: 12.58ops/s.
total: 19880.20 ± 2.43ms. best: 19876.75ms.
avg: 79.52 ± 0.01ms
min: 75.39 ± 0.04ms
max: 84.58 ± 0.02ms
p99: 83.10 ± 0.08ms

Peak RAM: 62.703MB
random_trip_ch 250 trips, 3 coordinates
ops: 329.19 ± 0.48 ops/s. best: 329.57ops/s.
total: 759.44 ± 1.11ms. best: 758.56ms.
avg: 3.04 ± 0.00ms
min: 1.59 ± 0.01ms
max: 4.46 ± 0.00ms
p99: 4.23 ± 0.01ms

250 trips, 5 coordinates
ops: 214.38 ± 0.05 ops/s. best: 214.46ops/s.
total: 1166.18 ± 0.26ms. best: 1165.72ms.
avg: 4.66 ± 0.00ms
min: 2.93 ± 0.01ms
max: 5.99 ± 0.01ms
p99: 5.86 ± 0.01ms

Peak RAM: 73.000MB
250 trips, 3 coordinates
ops: 345.60 ± 0.46 ops/s. best: 345.93ops/s.
total: 723.37 ± 0.96ms. best: 722.69ms.
avg: 2.89 ± 0.00ms
min: 1.50 ± 0.01ms
max: 4.28 ± 0.01ms
p99: 4.05 ± 0.01ms

250 trips, 5 coordinates
ops: 223.73 ± 0.07 ops/s. best: 223.79ops/s.
total: 1117.44 ± 0.35ms. best: 1117.11ms.
avg: 4.47 ± 0.00ms
min: 2.76 ± 0.00ms
max: 5.81 ± 0.01ms
p99: 5.62 ± 0.01ms

Peak RAM: 73.000MB
random_trip_mld 250 trips, 3 coordinates
ops: 106.72 ± 0.05 ops/s. best: 106.77ops/s.
total: 2342.57 ± 1.14ms. best: 2341.47ms.
avg: 9.37 ± 0.00ms
min: 5.53 ± 0.01ms
max: 12.12 ± 0.01ms
p99: 12.05 ± 0.01ms

250 trips, 5 coordinates
ops: 68.72 ± 0.01 ops/s. best: 68.74ops/s.
total: 3637.85 ± 0.46ms. best: 3637.14ms.
avg: 14.55 ± 0.00ms
min: 10.08 ± 0.01ms
max: 17.91 ± 0.03ms
p99: 17.46 ± 0.02ms

Peak RAM: 68.500MB
250 trips, 3 coordinates
ops: 124.13 ± 0.07 ops/s. best: 124.18ops/s.
total: 2014.07 ± 1.09ms. best: 2013.25ms.
avg: 8.06 ± 0.00ms
min: 4.77 ± 0.01ms
max: 10.46 ± 0.02ms
p99: 10.41 ± 0.01ms

250 trips, 5 coordinates
ops: 79.66 ± 0.02 ops/s. best: 79.68ops/s.
total: 3138.40 ± 0.65ms. best: 3137.42ms.
avg: 12.55 ± 0.00ms
min: 8.65 ± 0.01ms
max: 15.50 ± 0.04ms
p99: 15.10 ± 0.02ms

Peak RAM: 68.500MB
route_ch 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
635.942ms
0.635942ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
778.264ms
0.778264ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
245.431ms
0.245431ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
214.527ms
0.214527ms/req
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
627.748ms
0.627748ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
773.678ms
0.773678ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
238.52ms
0.23852ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
211.042ms
0.211042ms/req
route_mld 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
806.896ms
0.806896ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
1038.03ms
1.03803ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
413.451ms
0.413451ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
460.871ms
0.460871ms/req
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
749.022ms
0.749022ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
974.73ms
0.97473ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
355.375ms
0.355375ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
397.201ms
0.397201ms/req
rtree 1 result:
226.994ms -> 0.0226994 ms/query
10 results:
257.736ms -> 0.0257736 ms/query
1 result:
227.271ms -> 0.0227271 ms/query
10 results:
258.023ms -> 0.0258023 ms/query

@SiarheiFedartsou SiarheiFedartsou changed the title Prototype using custom d-ary heap implementation Use custom d-ary heap implementation Aug 20, 2024
@SiarheiFedartsou SiarheiFedartsou marked this pull request as ready for review August 22, 2024 15:08
Copy link
Collaborator

@DennisOSRM DennisOSRM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! I'd love to see a simpler implementation that keeps heap logic simple and contained within the implementation. Left some specific comments at the code site.

unit_tests/CMakeLists.txt Outdated Show resolved Hide resolved

namespace osrm::util
{
template <typename HeapData, int Arity, typename Comparator = std::less<HeapData>> class DAryHeap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the reorder handler as a template parameter seems somewhat confusing.

Do have a need to specify multiple ones at all? If this is needed, should the be implemented inside the heap?

Can we possibly aim for a simpler implementation of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the reorder handler as a template parameter seems somewhat confusing.

Well, the intention here was to be able to pass lambda and avoid using std::function which has certain performance impact... Such template parameters is frequent pattern for such things - if you want to pass lambda, compiler will infer type of this lambda for you in a most efficient way. At the same time I agree that probably having the fact that we pass only single type of function in there worth trying to make this more obvious (not sure it will be simple though, but I'll try 😄 )

If this is needed, should the be implemented inside the heap?

What do you mean by "inside the heap"? In my initial implementation DAryHeap wasn't a separate class and all code from here "lived" directly in QueryHeap, but I found it a bit messy and decided to decouple things a bit to keep code structure a bit similar to what we have now (QueryHeap and separate boost::d_ary_heap before and QueryHeap and separate DAryHeap after). I could try to return back to initial implementation...

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you mean that implementation of reorderHandler should be a part of DAryHeap? I am not sure it is also doable without merging DAryHeap to QueryHeap - this reorderHandler is kind of a "glue" between DAryHeap and QueryHeap...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option I see is... make it super straightforward. DAryHeap now is kind of generic DAryHeap implementation, but we could "couple" it to our needs, i.e. just take reference to inserted_nodes from QueryHeap and do all things we are currently doing in reorderHandler directly on that vector. Should also work without issues as well.

Anyway @DennisOSRM would love to hear your opinion on this please 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one more blogpost with concrete example of why template can be more preferable here… https://wolchok.org/posts/cxx-trap-2-std-function/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisOSRM WDYT? Tbh I am not sure I have good ideas how to remove template parameter here - all ideas have their own drawbacks. May be you have some? Or may be worth introducing a bit of complexity in favor of performance? 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants