Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pool allocator in QueryHeap #6943

Open
wants to merge 49 commits into
base: master
Choose a base branch
from
Open

Conversation

SiarheiFedartsou
Copy link
Member

@SiarheiFedartsou SiarheiFedartsou commented Jun 11, 2024

I spent some hours in profiler and noticed that we waste a lot of time for allocations in QueryHeap:

  • in std::unordered_map in UnorderedMapStorage - it is quite obvious it has to reallocate pairs quite often, just because of the way how it is implemented
  • in boost::heap::d_ary_heap - initially I thought it is very weird, because I saw that it is based on std::vector under the hood(i.e. should just re-use already allocated memory), but then after some debugging I realised that there is also std::list, which as I understood is used in the case if we need heap to be mutable (in other words if we want to be able to decrease priorities, what we obviously need).

After some experiments I ended up with custom pool allocator which uses thread local free lists to re-use already allocated memory. Idea is primitive:

  • we allocate chunk of memory using malloc and then when we need to allocate object(s) we allocate block of memory from this chunk which is of least possible 2^N size
  • when memory is deallocated we put it to one of free lists, index of free list is determined by N (if size is 2^N)
  • next time when we need to allocate something we look into corresponding free list first and allocate new block only if it is needed

What alternatives I considered, but rejected

  • boost::fast_pool_allocator - it worked well on benchmarks(~ the same as current solution), but it has huge problem which made me to reject it completely... It uses singleton pool under the hood (similar to this implementation) and it has no thread local implementation, i.e. either we use single pool for all threads and protect it by mutex (it eats all the performance improvements from using memory pool) or don't use threads at all (what is impossible for OSRM)
  • https://en.cppreference.com/w/cpp/memory/unsynchronized_pool_resource - initially looked intresting, but unfortunately boost::heap::d_ary_heap seems to not support it

Result highlights

Measured on my Raspberry Pi:
map matching CH (default radius) ops/sec: +22%
map matching MLD (default radius) ops/sec: +35%(!)
table CH (25 coordinates) ops/sec: +8%
table MLD (25 coordinates) ops/sec: +8%
route CH (overview=full, steps=true) ops/sec: +1%
route MLD (overview=full, steps=true) ops/sec: +10%
osrm-contract run time: -20%
osrm-customize run time: -23%

Worth saying that that there is some improvements almost in all benchmarks...

Benchmark Results

Benchmark Base PR
alias aliased u32: 11079.8
plain u32: 10965.2
aliased double: 15131.9
plain double: 15110.1
aliased u32: 11068.9
plain u32: 10916.2
aliased double: 15050.4
plain double: 15040.3
e2e_match_ch Ops: 26.19 ± 0.02 ops/s. Best: 26.21 ops/s
Total: 5002.36ms ± 3.05ms. Best: 4998.42ms
Min time: 3.25ms ± 0.05ms
Mean time: 38.19ms ± 0.02ms
Median time: 26.22ms ± 0.10ms
95th percentile: 129.92ms ± 0.19ms
99th percentile: 157.86ms ± 0.33ms
Max time: 164.05ms ± 0.59ms
Ops: 32.85 ± 0.01 ops/s. Best: 32.88 ops/s
Total: 3987.24ms ± 1.76ms. Best: 3983.71ms
Min time: 3.09ms ± 0.05ms
Mean time: 30.44ms ± 0.01ms
Median time: 21.32ms ± 0.03ms
95th percentile: 102.18ms ± 0.11ms
99th percentile: 117.82ms ± 0.13ms
Max time: 125.92ms ± 0.27ms
e2e_match_mld Ops: 43.46 ± 0.04 ops/s. Best: 43.51 ops/s
Total: 3014.38ms ± 2.83ms. Best: 3010.92ms
Min time: 2.59ms ± 0.04ms
Mean time: 23.01ms ± 0.02ms
Median time: 12.05ms ± 0.08ms
95th percentile: 75.53ms ± 0.16ms
99th percentile: 87.68ms ± 0.16ms
Max time: 100.83ms ± 0.28ms
Ops: 59.84 ± 0.03 ops/s. Best: 59.90 ops/s
Total: 2189.24ms ± 1.18ms. Best: 2187.15ms
Min time: 2.32ms ± 0.05ms
Mean time: 16.71ms ± 0.01ms
Median time: 9.38ms ± 0.08ms
95th percentile: 52.02ms ± 0.09ms
99th percentile: 59.06ms ± 0.16ms
Max time: 66.86ms ± 0.20ms
e2e_nearest_ch Ops: 635.84 ± 5.72 ops/s. Best: 644.05 ops/s
Total: 1572.58ms ± 15.03ms. Best: 1552.66ms
Min time: 1.28ms ± 0.00ms
Mean time: 1.57ms ± 0.02ms
Median time: 1.54ms ± 0.01ms
95th percentile: 1.96ms ± 0.01ms
99th percentile: 2.05ms ± 0.01ms
Max time: 9.32ms ± 7.40ms
Ops: 636.74 ± 3.56 ops/s. Best: 641.59 ops/s
Total: 1570.34ms ± 9.08ms. Best: 1558.63ms
Min time: 1.29ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.53ms ± 0.01ms
95th percentile: 1.95ms ± 0.01ms
99th percentile: 2.05ms ± 0.01ms
Max time: 9.37ms ± 7.46ms
e2e_nearest_mld Ops: 637.34 ± 2.76 ops/s. Best: 641.70 ops/s
Total: 1568.82ms ± 6.45ms. Best: 1558.36ms
Min time: 1.28ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.53ms ± 0.01ms
95th percentile: 1.95ms ± 0.01ms
99th percentile: 2.05ms ± 0.01ms
Max time: 9.35ms ± 7.42ms
Ops: 637.77 ± 4.62 ops/s. Best: 647.24 ops/s
Total: 1568.37ms ± 12.03ms. Best: 1545.03ms
Min time: 1.28ms ± 0.01ms
Mean time: 1.57ms ± 0.01ms
Median time: 1.53ms ± 0.01ms
95th percentile: 1.96ms ± 0.01ms
99th percentile: 2.04ms ± 0.01ms
Max time: 9.32ms ± 7.44ms
e2e_route_ch Ops: 216.80 ± 0.49 ops/s. Best: 217.46 ops/s
Total: 4612.55ms ± 10.63ms. Best: 4598.56ms
Min time: 1.86ms ± 0.03ms
Mean time: 4.61ms ± 0.01ms
Median time: 4.71ms ± 0.02ms
95th percentile: 6.07ms ± 0.02ms
99th percentile: 6.62ms ± 0.05ms
Max time: 13.32ms ± 6.36ms
Ops: 216.55 ± 0.61 ops/s. Best: 217.41 ops/s
Total: 4617.82ms ± 13.12ms. Best: 4599.68ms
Min time: 1.88ms ± 0.04ms
Mean time: 4.62ms ± 0.01ms
Median time: 4.71ms ± 0.02ms
95th percentile: 6.08ms ± 0.01ms
99th percentile: 6.59ms ± 0.06ms
Max time: 13.33ms ± 6.40ms
e2e_route_mld Ops: 178.79 ± 0.31 ops/s. Best: 179.34 ops/s
Total: 5593.32ms ± 10.29ms. Best: 5576.05ms
Min time: 1.87ms ± 0.07ms
Mean time: 5.59ms ± 0.01ms
Median time: 5.72ms ± 0.02ms
95th percentile: 7.62ms ± 0.01ms
99th percentile: 8.14ms ± 0.04ms
Max time: 14.68ms ± 5.90ms
Ops: 182.95 ± 0.28 ops/s. Best: 183.43 ops/s
Total: 5465.83ms ± 8.40ms. Best: 5451.76ms
Min time: 1.88ms ± 0.07ms
Mean time: 5.47ms ± 0.01ms
Median time: 5.59ms ± 0.01ms
95th percentile: 7.40ms ± 0.02ms
99th percentile: 7.97ms ± 0.08ms
Max time: 14.47ms ± 5.97ms
e2e_table_ch Ops: 210.45 ± 0.29 ops/s. Best: 210.91 ops/s
Total: 4751.61ms ± 6.35ms. Best: 4741.25ms
Min time: 2.52ms ± 0.05ms
Mean time: 4.75ms ± 0.01ms
Median time: 4.77ms ± 0.02ms
95th percentile: 6.48ms ± 0.01ms
99th percentile: 6.80ms ± 0.03ms
Max time: 14.02ms ± 7.01ms
Ops: 219.96 ± 0.51 ops/s. Best: 220.69 ops/s
Total: 4546.05ms ± 10.42ms. Best: 4531.26ms
Min time: 2.45ms ± 0.04ms
Mean time: 4.55ms ± 0.01ms
Median time: 4.55ms ± 0.00ms
95th percentile: 6.14ms ± 0.01ms
99th percentile: 6.50ms ± 0.05ms
Max time: 13.62ms ± 7.10ms
e2e_table_mld Ops: 68.31 ± 0.06 ops/s. Best: 68.38 ops/s
Total: 14639.82ms ± 12.41ms. Best: 14623.94ms
Min time: 6.12ms ± 0.08ms
Mean time: 14.64ms ± 0.01ms
Median time: 14.61ms ± 0.03ms
95th percentile: 22.23ms ± 0.04ms
99th percentile: 23.41ms ± 0.03ms
Max time: 29.78ms ± 5.58ms
Ops: 75.99 ± 0.05 ops/s. Best: 76.07 ops/s
Total: 13160.35ms ± 9.12ms. Best: 13145.61ms
Min time: 5.60ms ± 0.05ms
Mean time: 13.16ms ± 0.01ms
Median time: 13.12ms ± 0.01ms
95th percentile: 19.91ms ± 0.07ms
99th percentile: 21.02ms ± 0.06ms
Max time: 27.55ms ± 5.86ms
e2e_trip_ch Ops: 62.20 ± 0.03 ops/s. Best: 62.26 ops/s
Total: 16077.97ms ± 8.95ms. Best: 16061.39ms
Min time: 2.40ms ± 0.19ms
Mean time: 16.08ms ± 0.01ms
Median time: 15.31ms ± 0.01ms
95th percentile: 28.32ms ± 0.02ms
99th percentile: 30.51ms ± 0.07ms
Max time: 33.02ms ± 1.33ms
Ops: 63.20 ± 0.03 ops/s. Best: 63.25 ops/s
Total: 15821.43ms ± 8.07ms. Best: 15809.58ms
Min time: 2.44ms ± 0.15ms
Mean time: 15.82ms ± 0.01ms
Median time: 15.05ms ± 0.02ms
95th percentile: 27.88ms ± 0.02ms
99th percentile: 30.04ms ± 0.09ms
Max time: 32.59ms ± 1.39ms
e2e_trip_mld Ops: 36.86 ± 0.01 ops/s. Best: 36.88 ops/s
Total: 27129.49ms ± 7.23ms. Best: 27113.10ms
Min time: 2.49ms ± 0.16ms
Mean time: 27.13ms ± 0.01ms
Median time: 26.30ms ± 0.08ms
95th percentile: 44.24ms ± 0.05ms
99th percentile: 46.80ms ± 0.10ms
Max time: 49.16ms ± 0.15ms
Ops: 39.29 ± 0.02 ops/s. Best: 39.32 ops/s
Total: 25449.85ms ± 10.27ms. Best: 25434.69ms
Min time: 2.54ms ± 0.20ms
Mean time: 25.45ms ± 0.01ms
Median time: 24.62ms ± 0.06ms
95th percentile: 41.70ms ± 0.06ms
99th percentile: 44.15ms ± 0.11ms
Max time: 46.44ms ± 0.32ms
json-render String: 8.93734ms
Stringstream: 14.444ms
Vector: 9.49343ms
String: 8.85913ms
Stringstream: 14.6759ms
Vector: 9.445ms
match_ch Default radius:
7.05707ms/req at 82 coordinate
0.0860618ms/coordinate
Radius 10m:
24.9563ms/req at 82 coordinate
0.304345ms/coordinate
Default radius:
6.11063ms/req at 82 coordinate
0.0745199ms/coordinate
Radius 10m:
21.4601ms/req at 82 coordinate
0.261708ms/coordinate
match_mld Default radius:
4.3271ms/req at 82 coordinate
0.0527696ms/coordinate
Radius 10m:
16.1675ms/req at 82 coordinate
0.197164ms/coordinate
Default radius:
3.05044ms/req at 82 coordinate
0.0372005ms/coordinate
Radius 10m:
10.4792ms/req at 82 coordinate
0.127795ms/coordinate
node_match_ch Ops: 165.1 ± 0.6 ops/s. Best: 165.8 ops/s Ops: 210.4 ± 1.7 ops/s. Best: 213.0 ops/s
node_match_mld Ops: 219.0 ± 1.2 ops/s. Best: 220.9 ops/s Ops: 291.0 ± 1.5 ops/s. Best: 293.4 ops/s
node_nearest_ch Ops: 9980.5 ± 789.2 ops/s. Best: 11216.5 ops/s Ops: 9520.9 ± 627.2 ops/s. Best: 10995.9 ops/s
node_nearest_mld Ops: 10029.6 ± 604.1 ops/s. Best: 10895.1 ops/s Ops: 10197.0 ± 908.9 ops/s. Best: 11478.6 ops/s
node_route_ch Ops: 994.5 ± 13.7 ops/s. Best: 1006.8 ops/s Ops: 1017.1 ± 21.8 ops/s. Best: 1049.0 ops/s
node_route_mld Ops: 510.4 ± 4.8 ops/s. Best: 518.1 ops/s Ops: 567.6 ± 7.7 ops/s. Best: 577.0 ops/s
node_table_ch Ops: 174.2 ± 1.1 ops/s. Best: 175.9 ops/s Ops: 193.2 ± 1.5 ops/s. Best: 195.3 ops/s
node_table_mld Ops: 37.8 ± 0.0 ops/s. Best: 37.9 ops/s Ops: 41.6 ± 0.1 ops/s. Best: 41.7 ops/s
node_trip_ch Ops: 178.9 ± 0.9 ops/s. Best: 180.2 ops/s Ops: 189.5 ± 1.6 ops/s. Best: 191.6 ops/s
node_trip_mld Ops: 61.0 ± 0.1 ops/s. Best: 61.2 ops/s Ops: 69.2 ± 0.2 ops/s. Best: 69.5 ops/s
osrm_contract Time: 183.47s Peak RAM: 183.70MB Time: 154.99s Peak RAM: 183.67MB
osrm_customize Time: 2.54s Peak RAM: 111.67MB Time: 2.06s Peak RAM: 112.03MB
osrm_extract Time: 24.33s Peak RAM: 394.85MB Time: 24.27s Peak RAM: 394.28MB
osrm_partition Time: 5.84s Peak RAM: 122.03MB Time: 5.86s Peak RAM: 120.85MB
packedvector random write:
std::vector 184766 ms
util::packed_vector 377132 ms
slowdown: 2.04113
random read:
std::vector 100564 ms
util::packed_vector 192157 ms
slowdown: 1.91078
random write:
std::vector 183179 ms
util::packed_vector 372499 ms
slowdown: 2.03353
random read:
std::vector 100185 ms
util::packed_vector 190285 ms
slowdown: 1.89934
random_match_ch 500 matches, default radius
ops: 127.11 ± 0.10 ops/s. best: 127.23ops/s.
total: 448.43 ± 0.35ms. best: 447.99ms.
avg: 7.87 ± 0.01ms
min: 0.22 ± 0.01ms
max: 42.75 ± 0.09ms
p99: 42.75 ± 0.09ms

500 matches, radius=10
ops: 36.70 ± 0.03 ops/s. best: 36.76ops/s.
total: 1744.07 ± 1.38ms. best: 1740.86ms.
avg: 27.25 ± 0.02ms
min: 0.24 ± 0.00ms
max: 411.01 ± 1.21ms
p99: 411.01 ± 1.21ms

500 matches, radius=20
ops: 8.67 ± 0.01 ops/s. best: 8.70ops/s.
total: 7494.72 ± 8.34ms. best: 7472.85ms.
avg: 115.30 ± 0.13ms
min: 0.48 ± 0.00ms
max: 2182.14 ± 4.63ms
p99: 2182.14 ± 4.63ms

Peak RAM: 54.500MB
500 matches, default radius
ops: 163.46 ± 0.21 ops/s. best: 163.69ops/s.
total: 348.70 ± 0.44ms. best: 348.22ms.
avg: 6.12 ± 0.01ms
min: 0.22 ± 0.00ms
max: 33.45 ± 0.03ms
p99: 33.45 ± 0.03ms

500 matches, radius=10
ops: 47.44 ± 0.01 ops/s. best: 47.45ops/s.
total: 1349.17 ± 0.33ms. best: 1348.81ms.
avg: 21.08 ± 0.01ms
min: 0.24 ± 0.00ms
max: 325.07 ± 0.29ms
p99: 325.07 ± 0.29ms

500 matches, radius=20
ops: 11.22 ± 0.00 ops/s. best: 11.23ops/s.
total: 5792.58 ± 1.80ms. best: 5790.03ms.
avg: 89.12 ± 0.03ms
min: 0.43 ± 0.00ms
max: 1724.44 ± 1.42ms
p99: 1724.44 ± 1.42ms

Peak RAM: 54.500MB
random_match_mld 500 matches, default radius
ops: 203.96 ± 0.56 ops/s. best: 204.39ops/s.
total: 279.47 ± 0.77ms. best: 278.87ms.
avg: 4.90 ± 0.01ms
min: 0.20 ± 0.00ms
max: 27.06 ± 0.06ms
p99: 27.06 ± 0.06ms

500 matches, radius=10
ops: 72.37 ± 0.07 ops/s. best: 72.53ops/s.
total: 884.37 ± 0.88ms. best: 882.41ms.
avg: 13.82 ± 0.01ms
min: 0.23 ± 0.00ms
max: 162.33 ± 0.47ms
p99: 162.33 ± 0.47ms

500 matches, radius=20
ops: 15.10 ± 0.02 ops/s. best: 15.12ops/s.
total: 4304.21 ± 4.80ms. best: 4299.39ms.
avg: 66.22 ± 0.07ms
min: 0.29 ± 0.00ms
max: 845.55 ± 2.03ms
p99: 845.55 ± 2.03ms

Peak RAM: 51.000MB
500 matches, default radius
ops: 312.11 ± 1.00 ops/s. best: 313.05ops/s.
total: 182.63 ± 0.59ms. best: 182.08ms.
avg: 3.20 ± 0.01ms
min: 0.18 ± 0.00ms
max: 17.79 ± 0.04ms
p99: 17.79 ± 0.04ms

500 matches, radius=10
ops: 115.74 ± 0.02 ops/s. best: 115.77ops/s.
total: 552.94 ± 0.11ms. best: 552.80ms.
avg: 8.64 ± 0.00ms
min: 0.23 ± 0.00ms
max: 98.31 ± 0.09ms
p99: 98.31 ± 0.09ms

500 matches, radius=20
ops: 23.21 ± 0.00 ops/s. best: 23.21ops/s.
total: 2800.95 ± 0.49ms. best: 2800.09ms.
avg: 43.09 ± 0.01ms
min: 0.26 ± 0.00ms
max: 521.82 ± 0.30ms
p99: 521.82 ± 0.30ms

Peak RAM: 50.500MB
random_nearest_ch 10000 nearest, number_of_results=1
ops: 22051.57 ± 38.00 ops/s. best: 22081.55ops/s.
total: 453.48 ± 0.78ms. best: 452.87ms.
avg: 0.05 ± 0.00ms
min: 0.01 ± 0.00ms
max: 0.15 ± 0.01ms
p99: 0.12 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15946.80 ± 10.37 ops/s. best: 15956.58ops/s.
total: 627.09 ± 0.41ms. best: 626.70ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.15 ± 0.00ms
p99: 0.13 ± 0.00ms

10000 nearest, number_of_results=10
ops: 12126.05 ± 4.04 ops/s. best: 12131.66ops/s.
total: 824.67 ± 0.27ms. best: 824.29ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.17 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
10000 nearest, number_of_results=1
ops: 22161.52 ± 43.78 ops/s. best: 22199.56ops/s.
total: 451.24 ± 0.89ms. best: 450.46ms.
avg: 0.05 ± 0.00ms
min: 0.01 ± 0.00ms
max: 0.15 ± 0.01ms
p99: 0.12 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15896.29 ± 8.55 ops/s. best: 15906.51ops/s.
total: 629.08 ± 0.35ms. best: 628.67ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.18 ± 0.04ms
p99: 0.13 ± 0.00ms

10000 nearest, number_of_results=10
ops: 12034.01 ± 3.47 ops/s. best: 12039.43ops/s.
total: 830.98 ± 0.24ms. best: 830.60ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.17 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
random_nearest_mld 10000 nearest, number_of_results=1
ops: 22039.86 ± 35.28 ops/s. best: 22066.99ops/s.
total: 453.73 ± 0.73ms. best: 453.17ms.
avg: 0.05 ± 0.00ms
min: 0.01 ± 0.00ms
max: 0.14 ± 0.01ms
p99: 0.12 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15941.51 ± 12.32 ops/s. best: 15951.15ops/s.
total: 627.29 ± 0.49ms. best: 626.91ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.15 ± 0.00ms
p99: 0.13 ± 0.00ms

10000 nearest, number_of_results=10
ops: 12120.91 ± 6.18 ops/s. best: 12130.19ops/s.
total: 825.02 ± 0.41ms. best: 824.39ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.17 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
10000 nearest, number_of_results=1
ops: 22144.07 ± 35.19 ops/s. best: 22177.93ops/s.
total: 451.59 ± 0.72ms. best: 450.90ms.
avg: 0.05 ± 0.00ms
min: 0.01 ± 0.00ms
max: 0.14 ± 0.01ms
p99: 0.12 ± 0.00ms

10000 nearest, number_of_results=5
ops: 15895.86 ± 5.87 ops/s. best: 15902.36ops/s.
total: 629.09 ± 0.23ms. best: 628.84ms.
avg: 0.06 ± 0.00ms
min: 0.03 ± 0.00ms
max: 0.15 ± 0.00ms
p99: 0.13 ± 0.00ms

10000 nearest, number_of_results=10
ops: 12029.13 ± 4.09 ops/s. best: 12034.53ops/s.
total: 831.32 ± 0.28ms. best: 830.94ms.
avg: 0.08 ± 0.00ms
min: 0.04 ± 0.00ms
max: 0.17 ± 0.00ms
p99: 0.15 ± 0.00ms

Peak RAM: 34.500MB
random_route_ch 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 281.70 ± 0.17 ops/s. best: 281.87ops/s.
total: 3493.08 ± 2.08ms. best: 3491.00ms.
avg: 3.55 ± 0.00ms
min: 0.39 ± 0.01ms
max: 6.01 ± 0.04ms
p99: 5.27 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 310.41 ± 0.04 ops/s. best: 310.46ops/s.
total: 3221.54 ± 0.40ms. best: 3221.05ms.
avg: 3.22 ± 0.00ms
min: 0.08 ± 0.00ms
max: 7.22 ± 0.01ms
p99: 6.85 ± 0.01ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 575.77 ± 0.07 ops/s. best: 575.87ops/s.
total: 1709.03 ± 0.22ms. best: 1708.73ms.
avg: 1.74 ± 0.00ms
min: 0.31 ± 0.01ms
max: 2.88 ± 0.01ms
p99: 2.46 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 578.14 ± 0.07 ops/s. best: 578.26ops/s.
total: 1729.69 ± 0.22ms. best: 1729.32ms.
avg: 1.73 ± 0.00ms
min: 0.06 ± 0.00ms
max: 5.17 ± 0.01ms
p99: 4.04 ± 0.01ms

Peak RAM: 84.000MB
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 288.90 ± 0.13 ops/s. best: 289.02ops/s.
total: 3406.00 ± 1.55ms. best: 3404.62ms.
avg: 3.46 ± 0.00ms
min: 0.40 ± 0.01ms
max: 5.94 ± 0.04ms
p99: 5.16 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 313.83 ± 0.10 ops/s. best: 313.96ops/s.
total: 3186.40 ± 1.01ms. best: 3185.10ms.
avg: 3.19 ± 0.00ms
min: 0.08 ± 0.00ms
max: 7.17 ± 0.01ms
p99: 6.79 ± 0.01ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 607.56 ± 0.16 ops/s. best: 607.81ops/s.
total: 1619.59 ± 0.43ms. best: 1618.93ms.
avg: 1.65 ± 0.00ms
min: 0.30 ± 0.00ms
max: 2.69 ± 0.00ms
p99: 2.35 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 595.37 ± 0.14 ops/s. best: 595.55ops/s.
total: 1679.63 ± 0.39ms. best: 1679.12ms.
avg: 1.68 ± 0.00ms
min: 0.06 ± 0.00ms
max: 4.95 ± 0.01ms
p99: 3.91 ± 0.01ms

Peak RAM: 84.000MB
random_route_mld 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 147.72 ± 0.04 ops/s. best: 147.75ops/s.
total: 6661.26 ± 1.88ms. best: 6659.70ms.
avg: 6.77 ± 0.00ms
min: 0.37 ± 0.00ms
max: 16.23 ± 0.02ms
p99: 11.01 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 142.44 ± 0.02 ops/s. best: 142.47ops/s.
total: 7020.38 ± 0.90ms. best: 7019.19ms.
avg: 7.02 ± 0.00ms
min: 0.07 ± 0.00ms
max: 15.92 ± 0.04ms
p99: 15.01 ± 0.02ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 205.67 ± 0.01 ops/s. best: 205.69ops/s.
total: 4784.34 ± 0.16ms. best: 4784.00ms.
avg: 4.86 ± 0.00ms
min: 0.32 ± 0.00ms
max: 13.49 ± 0.01ms
p99: 8.40 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 178.17 ± 0.05 ops/s. best: 178.25ops/s.
total: 5612.66 ± 1.56ms. best: 5609.95ms.
avg: 5.61 ± 0.00ms
min: 0.05 ± 0.00ms
max: 12.15 ± 0.02ms
p99: 11.56 ± 0.01ms

Peak RAM: 73.297MB
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
ops: 163.86 ± 0.01 ops/s. best: 163.87ops/s.
total: 6005.13 ± 0.33ms. best: 6004.61ms.
avg: 6.10 ± 0.00ms
min: 0.38 ± 0.00ms
max: 14.27 ± 0.03ms
p99: 9.89 ± 0.01ms

1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
ops: 162.05 ± 0.03 ops/s. best: 162.10ops/s.
total: 6171.04 ± 1.09ms. best: 6169.02ms.
avg: 6.17 ± 0.00ms
min: 0.07 ± 0.00ms
max: 14.29 ± 0.09ms
p99: 13.33 ± 0.03ms

1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
ops: 239.43 ± 0.02 ops/s. best: 239.45ops/s.
total: 4109.78 ± 0.34ms. best: 4109.47ms.
avg: 4.18 ± 0.00ms
min: 0.31 ± 0.00ms
max: 11.47 ± 0.01ms
p99: 7.15 ± 0.02ms

1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
ops: 211.07 ± 0.17 ops/s. best: 211.36ops/s.
total: 4737.75 ± 3.72ms. best: 4731.19ms.
avg: 4.74 ± 0.00ms
min: 0.05 ± 0.00ms
max: 10.62 ± 0.10ms
p99: 9.92 ± 0.01ms

Peak RAM: 73.422MB
random_table_ch 250 tables, 3 coordinates
ops: 1067.80 ± 4.22 ops/s. best: 1071.17ops/s.
total: 234.13 ± 0.93ms. best: 233.39ms.
avg: 0.94 ± 0.00ms
min: 0.71 ± 0.00ms
max: 1.24 ± 0.15ms
p99: 1.11 ± 0.00ms

250 tables, 25 coordinates
ops: 119.99 ± 0.02 ops/s. best: 120.03ops/s.
total: 2083.47 ± 0.38ms. best: 2082.81ms.
avg: 8.33 ± 0.00ms
min: 7.74 ± 0.01ms
max: 9.01 ± 0.01ms
p99: 8.82 ± 0.01ms

250 tables, 50 coordinates
ops: 58.45 ± 0.00 ops/s. best: 58.46ops/s.
total: 4277.29 ± 0.36ms. best: 4276.78ms.
avg: 17.11 ± 0.00ms
min: 16.27 ± 0.01ms
max: 18.00 ± 0.01ms
p99: 17.94 ± 0.01ms

Peak RAM: 63.000MB
250 tables, 3 coordinates
ops: 1176.70 ± 5.86 ops/s. best: 1181.10ops/s.
total: 212.47 ± 1.07ms. best: 211.67ms.
avg: 0.85 ± 0.00ms
min: 0.63 ± 0.00ms
max: 1.15 ± 0.16ms
p99: 1.02 ± 0.02ms

250 tables, 25 coordinates
ops: 131.51 ± 0.03 ops/s. best: 131.54ops/s.
total: 1901.03 ± 0.40ms. best: 1900.61ms.
avg: 7.60 ± 0.00ms
min: 7.03 ± 0.01ms
max: 8.23 ± 0.01ms
p99: 8.07 ± 0.01ms

250 tables, 50 coordinates
ops: 63.71 ± 0.01 ops/s. best: 63.72ops/s.
total: 3924.29 ± 0.43ms. best: 3923.46ms.
avg: 15.70 ± 0.00ms
min: 14.96 ± 0.01ms
max: 16.57 ± 0.01ms
p99: 16.48 ± 0.01ms

Peak RAM: 62.500MB
random_table_mld 250 tables, 3 coordinates
ops: 226.17 ± 0.34 ops/s. best: 226.41ops/s.
total: 1105.39 ± 1.66ms. best: 1104.18ms.
avg: 4.42 ± 0.01ms
min: 3.48 ± 0.01ms
max: 5.71 ± 0.02ms
p99: 5.55 ± 0.01ms

250 tables, 25 coordinates
ops: 23.83 ± 0.01 ops/s. best: 23.85ops/s.
total: 10492.60 ± 4.27ms. best: 10482.86ms.
avg: 41.97 ± 0.02ms
min: 38.73 ± 0.05ms
max: 45.77 ± 0.10ms
p99: 45.06 ± 0.03ms

250 tables, 50 coordinates
ops: 11.04 ± 0.00 ops/s. best: 11.05ops/s.
total: 22636.41 ± 9.41ms. best: 22615.16ms.
avg: 90.55 ± 0.04ms
min: 86.47 ± 0.16ms
max: 96.15 ± 0.26ms
p99: 94.53 ± 0.19ms

Peak RAM: 63.172MB
250 tables, 3 coordinates
ops: 252.14 ± 0.33 ops/s. best: 252.37ops/s.
total: 991.53 ± 1.30ms. best: 990.60ms.
avg: 3.97 ± 0.01ms
min: 3.11 ± 0.01ms
max: 5.16 ± 0.03ms
p99: 5.03 ± 0.01ms

250 tables, 25 coordinates
ops: 26.19 ± 0.00 ops/s. best: 26.20ops/s.
total: 9543.95 ± 0.45ms. best: 9543.50ms.
avg: 38.18 ± 0.00ms
min: 35.18 ± 0.03ms
max: 41.61 ± 0.01ms
p99: 41.06 ± 0.03ms

250 tables, 50 coordinates
ops: 12.02 ± 0.00 ops/s. best: 12.02ops/s.
total: 20797.16 ± 0.91ms. best: 20795.55ms.
avg: 83.19 ± 0.00ms
min: 79.31 ± 0.05ms
max: 88.61 ± 0.14ms
p99: 87.25 ± 0.02ms

Peak RAM: 62.750MB
random_trip_ch 250 trips, 3 coordinates
ops: 316.37 ± 0.45 ops/s. best: 316.78ops/s.
total: 790.21 ± 1.13ms. best: 789.20ms.
avg: 3.16 ± 0.00ms
min: 1.72 ± 0.01ms
max: 4.40 ± 0.01ms
p99: 4.19 ± 0.01ms

250 trips, 5 coordinates
ops: 204.38 ± 0.05 ops/s. best: 204.46ops/s.
total: 1223.20 ± 0.30ms. best: 1222.76ms.
avg: 4.89 ± 0.00ms
min: 3.27 ± 0.00ms
max: 6.15 ± 0.01ms
p99: 6.04 ± 0.02ms

Peak RAM: 73.500MB
250 trips, 3 coordinates
ops: 333.92 ± 0.49 ops/s. best: 334.31ops/s.
total: 748.68 ± 1.11ms. best: 747.81ms.
avg: 2.99 ± 0.00ms
min: 1.58 ± 0.01ms
max: 4.23 ± 0.01ms
p99: 4.01 ± 0.03ms

250 trips, 5 coordinates
ops: 216.05 ± 0.05 ops/s. best: 216.12ops/s.
total: 1157.12 ± 0.28ms. best: 1156.74ms.
avg: 4.63 ± 0.00ms
min: 3.03 ± 0.00ms
max: 5.86 ± 0.01ms
p99: 5.75 ± 0.01ms

Peak RAM: 73.500MB
random_trip_mld 250 trips, 3 coordinates
ops: 107.17 ± 0.06 ops/s. best: 107.29ops/s.
total: 2332.80 ± 1.30ms. best: 2330.10ms.
avg: 9.33 ± 0.01ms
min: 5.48 ± 0.01ms
max: 12.12 ± 0.06ms
p99: 12.03 ± 0.03ms

250 trips, 5 coordinates
ops: 68.78 ± 0.13 ops/s. best: 68.99ops/s.
total: 3634.56 ± 6.63ms. best: 3623.76ms.
avg: 14.54 ± 0.03ms
min: 10.06 ± 0.04ms
max: 17.85 ± 0.08ms
p99: 17.29 ± 0.01ms

Peak RAM: 69.000MB
250 trips, 3 coordinates
ops: 119.16 ± 0.06 ops/s. best: 119.22ops/s.
total: 2098.08 ± 1.10ms. best: 2096.95ms.
avg: 8.39 ± 0.00ms
min: 4.95 ± 0.01ms
max: 11.02 ± 0.01ms
p99: 10.85 ± 0.02ms

250 trips, 5 coordinates
ops: 76.55 ± 0.02 ops/s. best: 76.59ops/s.
total: 3265.84 ± 0.78ms. best: 3264.01ms.
avg: 13.06 ± 0.00ms
min: 9.04 ± 0.01ms
max: 16.06 ± 0.02ms
p99: 15.76 ± 0.02ms

Peak RAM: 69.000MB
route_ch 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
638.512ms
0.638512ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
782.633ms
0.782633ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
245.606ms
0.245606ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
216.836ms
0.216836ms/req
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
623.853ms
0.623853ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
770.272ms
0.770272ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
235.338ms
0.235338ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
211.587ms
0.211587ms/req
route_mld 1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
809.321ms
0.809321ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
1040.37ms
1.04037ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
414.188ms
0.414188ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
461.797ms
0.461797ms/req
1000 routes, 3 coordinates, no alternatives, overview=full, steps=true
743.188ms
0.743188ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=full, steps=true
955.058ms
0.955058ms/req
1000 routes, 3 coordinates, no alternatives, overview=false, steps=false
349.408ms
0.349408ms/req
1000 routes, 2 coordinates, 3 alternatives, overview=false, steps=false
378.145ms
0.378145ms/req
rtree 1 result:
227.362ms -> 0.0227362 ms/query
10 results:
258.204ms -> 0.0258204 ms/query
1 result:
227.214ms -> 0.0227214 ms/query
10 results:
257.99ms -> 0.025799 ms/query

@SiarheiFedartsou SiarheiFedartsou changed the title Try to use boost::fast_pool_allocator in QueryHeap Try to use custom allocator in QueryHeap Jul 9, 2024
@SiarheiFedartsou SiarheiFedartsou force-pushed the sf-pool-alloc branch 3 times, most recently from e53df3b to 1436e96 Compare July 12, 2024 16:25
@SiarheiFedartsou SiarheiFedartsou changed the title Try to use custom allocator in QueryHeap Use pool allocator in QueryHeap Jul 12, 2024
@@ -116,8 +116,8 @@ class CellCustomizer
const std::vector<bool> &allowed_nodes,
CellMetric &metric) const
{
Heap heap_exemplar(graph.GetNumberOfNodes());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to work in a way that TBB copied heap_exemplar to each thread. QueryHeap started using thread local allocator, so here we had a situation when QueryHeap has PoolAllocator somewhere in its internals and this PoolAllocator has a pointer to its thread local memory pool, then we copy this QueryHeap to another thread and now new exemplar of QueryHeap has another PoolAllocator with a pointer to exactly the same memory pool, but now it is used on another thread!
With such change we guarantee that each thread creates heap for itself on its own.

@@ -653,7 +653,7 @@ jobs:
benchmarks:
if: github.event_name == 'pull_request'
needs: [format-taginfo-docs]
runs-on: ubuntu-24.04
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-paste from #6975

Just to make benchmark results more stable, I'll remove it before merge.

@SiarheiFedartsou SiarheiFedartsou marked this pull request as ready for review July 13, 2024 09:30
@SiarheiFedartsou
Copy link
Member Author

@DennisOSRM would love to have your opinion here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant