Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osrm-partition assert (crash) with planet-data on bicycle profile #6122

Closed
RZR-UA opened this issue Sep 11, 2021 · 17 comments
Closed

osrm-partition assert (crash) with planet-data on bicycle profile #6122

RZR-UA opened this issue Sep 11, 2021 · 17 comments

Comments

@RZR-UA
Copy link

RZR-UA commented Sep 11, 2021

Hi!

I'm maintainer of weekly-updated Planet OSRM routing server.
It's used for online routing in several Open Source projects.
It was running more than 1 year w/o any problems.

Cron script run every week:

  • bittorrent download fresh planet osm
  • run osrm-extract, osrm-partition, osrm-customize
  • when success, move fresh data and restart osrm-routed
  • script run osrm-programs for 3 stock profiles: car, bicycle, foot

Hardware: 20-core Xeon, 512Gb RAM (with 32gb swap) and 1.5TB SSD

Used osrm-backend package build from source:
https://aur.archlinux.org/packages/osrm-backend/

Versions with problem 100% exist: 5.23.0 (my previous) and 5.25.0 (fresh)

Case with Problem:

CRASH - osrm-partition on Planet-data with bicycle (and possible foot) profile

Cases without Problem:

OK - profile car (smaller)
OK - older (before ~Aug 20th 2021) Planet successed with all profiles (incl. bicycle and foot)

How to 100% reproduce:

  1. download planet data into bicycle.osm.pbf (for example planet-210830.osm.pbf ~60Gb)
  2. run osrm-extract -p /usr/share/osrm/profiles/bicycle.lua bicycle.osm.pbf (success)
  3. run osrm-partition bicycle.osrm (~87Gb)

After 5-6 hours it crashed.

Log output:

[info] Computing recursive bisection
[info] Loaded compressed node based graph: 741513480 edges, 1811915677 nodes
[info] running partition: 128 1.2 0.25 10 1000 # max_cell_size balance boundary cuts small_component_size
[info] Found 1530830824 SCC (827 large, 1530829997 small)
[info] SCC run took: 239.517s
[info] Full bisection done in 8751.55s
[info] Loaded node based graph to edge based graph mapping
[info] Loaded edge based graph for mapping partition ids: 3001997784 edges, 739184293 nodes
[info] Fixed 124433827 unconnected nodes
[info] Edge-based-graph annotation:
[info] level 1 #cells 2471974 bit size 22
[info] level 2 #cells 259942 bit size 18
[info] level 3 #cells 16401 bit size 15
[info] level 4 #cells 530 bit size 10
[info] Renumbered data in 3711.91 seconds
terminate called after throwing an instance of 'osrm::util::exception'
what(): Can't pack the partition information at level 4 into a 64bit integer. Would require 65 bits.
Aborted (core dumped)

GDB coredump stack trace:

(gdb) bt
#0 0x00007f0ec53bed22 in raise () from /usr/lib/libc.so.6
#1 0x00007f0ec53a8862 in abort () from /usr/lib/libc.so.6
#2 0x00007f0ec5602802 in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007f0ec560ec8a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x00007f0ec560ecf7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x00007f0ec560ef8e in __cxxabiv1::__cxa_throw (obj=, tinfo=0x7f0ec58f82e0 , dest=0x55b31866c400 osrm::util::exception::~exception())
at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6 0x00007f0ec5860cb4 in ?? () from /usr/lib/libosrm_partition.so
#7 0x00007f0ec588df2c in osrm::partitioner::Partitioner::Run(osrm::partitioner::PartitionerConfig const&) () from /usr/lib/libosrm_partition.so
#8 0x000055b31866a294 in main ()

@mjjbell
Copy link
Member

mjjbell commented Sep 11, 2021

Take a look at #6121, we're currently debugging the same issue.

@danpat
Copy link
Member

danpat commented Sep 11, 2021

As a temporary workaround, I would suggest switching to the CH algorithm for profiles you don't need to customize - bike/walk are good candidates.

This means you would use osrm-contract instead of osrm-partition+osrm-customize.

@RZR-UA
Copy link
Author

RZR-UA commented Sep 12, 2021

No success with osrm-contract:

$ time osrm-contract bike.osrm
[info] Input file: bike.osrm
[info] Threads: 40
[info] Reading node weights.
[info] Done reading node weights.
[info] Loading edge-expanded graph representation
[info] merged 3742136 edges out of 3005739920
[info] initializing node priorities... ok.
[info] preprocessing 739184293 (100%) nodes...
[info] . 10% . 20% . 30% . 40% . 50% Segmentation fault (core dumped)

real 47m10.948s

(gdb) bt
#0 0x00007f78f5ec72d7 in osrm::contractor::search(osrm::util::QueryHeap<unsigned int, unsigned int, int, osrm::contractor::ContractorHeapData, osrm::util::XORFastHashStorage<unsigned int, unsigned int, 65536ul> >&, osrm::util::DynamicGraphosrm::contractor::ContractorEdgeData const&, unsigned int, int, int, unsigned int) () from /usr/lib/libosrm_contract.so
#1 0x00007f78f5ecd9f9 in ?? () from /usr/lib/libosrm_contract.so
#2 0x00007f78f5eceb0c in ?? () from /usr/lib/libosrm_contract.so
#3 0x00007f78f58a2105 in ?? () from /usr/lib/libtbb.so.2
#4 0x00007f78f58a243c in ?? () from /usr/lib/libtbb.so.2
#5 0x00007f78f589bd97 in ?? () from /usr/lib/libtbb.so.2
#6 0x00007f78f589a3e1 in ?? () from /usr/lib/libtbb.so.2
#7 0x00007f78f589681c in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f78f5896a8a in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f78f58c5259 in start_thread () from /usr/lib/libpthread.so.0
#10 0x00007f78f59db5e3 in clone () from /usr/lib/libc.so.6

@danpat
Copy link
Member

danpat commented Sep 12, 2021

Hmm, that is interesting. Are you able to run a debug build to get better symbols (and possibly trigger an assertion?) It might take a while....

@RZR-UA
Copy link
Author

RZR-UA commented Sep 13, 2021

I will try to build debug version of package, soon.

Now I have some success with --max-cell-sizes (as workaround from another bug).

I'm not sure what this option do and not sure I have full routing data as result.

Success run:

$ osrm-partition --max-cell-sizes=1024,16384,262144,4194304 bike.osrm

[info] Computing recursive bisection
[info] Loaded compressed node based graph: 741513480 edges, 1811915677 nodes
[info] running partition: 1024 1.2 0.25 10 1000 # max_cell_size balance boundary cuts small_component_size
[info] Found 1530830824 SCC (827 large, 1530829997 small)
[info] SCC run took: 126.636s
[info] Full bisection done in 8699.01s
[info] Loaded node based graph to edge based graph mapping
[info] Loaded edge based graph for mapping partition ids: 3001997784 edges, 739184293 nodes
[info] Fixed 170434511 unconnected nodes
[info] Edge-based-graph annotation:
[info] level 1 #cells 389777 bit size 19
[info] level 2 #cells 65506 bit size 16
[info] level 3 #cells 4145 bit size 13
[info] level 4 #cells 274 bit size 9
[info] Renumbered data in 4185.79 seconds
[info] MultiLevelPartition constructed in 135.113 seconds
[info] CellStorage constructed in 332.083 seconds
[info] MLD data writing took 123.576 seconds
[info] Cells statistics per level
[info] Level 1 #cells 389777 #boundary nodes 487556962, sources: avg. 712, destinations: avg. 1014, entries: 133887566085226 (1071100528681808 bytes)
[info] Level 2 #cells 65506 #boundary nodes 459313891, sources: avg. 4046, destinations: avg. 5748, entries: 135126970295854 (1081015762366832 bytes)
[info] Level 3 #cells 4145 #boundary nodes 419664351, sources: avg. 59586, destinations: avg. 84092, entries: 154510341438426 (1236082731507408 bytes)
[info] Level 4 #cells 274 #boundary nodes 382683842, sources: avg. 836812, destinations: avg. 1172722, entries: 415441504194742 (3323532033557936 bytes)
[info] Bisection took 20925.2 seconds.
[info] RAM: peak bytes used: 252468625408

@RZR-UA
Copy link
Author

RZR-UA commented Sep 13, 2021

osrm-customize crashed after previous osrm-partition with max-cell-sizes:

$ osrm-partition --max-cell-sizes=1024,16384,262144,4194304 bike.osrm
success

$ time osrm-customize bike.osrm
[info] Loaded edge based graph: 3001997784 edges, 739184293 nodes
[info] Loading partition data took 812.704 seconds
Segmentation fault (core dumped)

real 23m14.582s

(gdb)
#0 0x00007f6610067cdf in ?? ()
...
#18 0x0000000000000000 in ?? ()

No useful info in gdb backtrace.

Doing debug package build...

@RZR-UA
Copy link
Author

RZR-UA commented Sep 13, 2021

-DCMAKE_BUILD_TYPE=Debug failed to compile (scripting_environment_lua.cpp:450:9: error: ‘this’ pointer is null [-Werror=nonnull]
erros)

-DCMAKE_BUILD_TYPE=RelWithDebInfo compiled successfully

Run osrm-customize after "success" osrm-partition with --max-cell-sized:

$ time osrm-customize bike.osrm

[info] Loaded edge based graph: 3001997784 edges, 739184293 nodes
[info] Loading partition data took 7495.16 seconds
[assert][139948046890560] /tmp/osrm-backend/src/osrm-backend-5.25.0/include/customizer/cell_customizer.hpp:94
in: void osrm::customizer::CellCustomizer::Customize(const GraphT&, osrm::customizer::CellCustomizer::Heap&, const CellStorage&, const std::vector&, osrm::customizer::CellMetric&, LevelID, CellID) const [with GraphT = osrm::partitioner::MultiLevelGraph<osrm::partitioner::EdgeBasedGraphEdgeData, osrm::storage::Ownership::Container>; osrm::customizer::CellCustomizer::Heap = osrm::util::QueryHeap<unsigned int, unsigned int, int, osrm::customizer::CellCustomizer::HeapData, osrm::util::ArrayStorage<unsigned int, int> >; osrm::partitioner::CellStorage = osrm::partitioner::detail::CellStorageImplosrm::storage::Ownership::Container; osrm::customizer::CellMetric = osrm::customizer::detail::CellMetricImplosrm::storage::Ownership::Container; LevelID = unsigned char; CellID = unsigned int]: !weights.empty()
terminate called without an active exception

Aborted (core dumped)

real 135m4.916s

Unfortunately, there was no free space in current directory to write coredump, but you can see some assertions before crash.

Strange but this time osrm-customize run for 2+ hours, compared to 23 minutes previous run.

Next I'll try to run osrm-contract, again ...

@RZR-UA
Copy link
Author

RZR-UA commented Sep 13, 2021

Strange things.

Version with -DCMAKE_BUILD_TYPE=RelWithDebInfo failed even on osrm-extract.

Please explain what to do next ?

Maybe try with patch(-es) from #6121 ?

$ time osrm-extract -p /usr/share/osrm/profiles/bicycle.lua bike.osm.pbf

[info] Parsed 0 location-dependent features with 0 GeoJSON polygons
[info] Using script /usr/share/osrm/profiles/bicycle.lua
[info] Input file: bike.osm.pbf
[info] Profile: bicycle.lua
[info] Threads: 40
[info] Parsing in progress..
[info] input file generated by planet-dump-ng 1.2.0
[info] timestamp: 2021-08-30T00:00:01Z
[info] Using profile api version 4
[info] Parse relations ...
[info] Parse ways and nodes ...
[info] Using profile api version 4
...
[info] Sorting all nodes ... ok, after 16.5169s
[info] Building node id map ... ok, after 165.25s
[info] Confirming/Writing used nodes ... ok, after 337.98s
[info] Writing barrier nodes ... ok, after 0s
[info] Writing traffic light nodes ... ok, after 0s
[info] Processed 1811915677 nodes
[info] Sorting edges by start ... ok, after 131.535s
[info] Setting start coords ... ok, after 850.106s
[info] Sorting edges by target ... ok, after 110.224s
[info] Computing edge weights ... ok, after 1233.76s
[info] Sorting edges by renumbered start ... ok, after 99.5453s
[info] Writing used edges ... ok, after 173.559s -- Processed 1953419277 edges
[info] Writing way meta-data ... ok, after 1.36812s -- Metadata contains << 190794335 entries.
[info] Collecting node information on 64 maneuver overrides...ok, after 0.001548s
[info] Collecting node information on 0 restrictions...ok, after 0s
[info] writing street name index ... ok, after 17.7488s
[info] extraction finished after 6711.13s
[info] Generating edge-expanded graph representation
[info] [assert][140410027075392] /tmp/osrm-backend/src/osrm-backend-5.25.0/src/extractor/restriction_compressor.cpp:228
in: osrm::extractor::RestrictionCompressor::Compress(NodeID, NodeID, NodeID)::<lambda(auto:20)> [with auto:20 = osrm::extractor::NodeBasedTurn*]: ptr->via == to
terminate called without an active exception
Aborted (core dumped)

(gdb) bt
#0 0x00007fb3c25d8d22 in raise () from /usr/lib/libc.so.6
#1 0x00007fb3c25c2862 in abort () from /usr/lib/libc.so.6
#2 0x00007fb3c281c802 in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007fb3c2828c8a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x00007fb3c2828cf7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x00007fb3c37075cd in ?? () from /usr/lib/libosrm_extract.so
#6 0x00007fb3c37075e9 in boost::assertion_failed(char const*, char const*, char const*, long) () from /usr/lib/libosrm_extract.so
#7 0x00007fb3c3529490 in ?? () from /usr/lib/libosrm_extract.so
#8 0x00007fb3c352ae6d in ?? () from /usr/lib/libosrm_extract.so
#9 0x00007fb3c352ef47 in osrm::extractor::RestrictionCompressor::Compress(unsigned int, unsigned int, unsigned int) () from /usr/lib/libosrm_extract.so
#10 0x00007fb3c34d8f14 in osrm::extractor::GraphCompressor::Compress(std::unordered_set<unsigned int, std::hash, std::equal_to, std::allocator > const&, std::unordered_set<unsigned int, std::hash, std::equal_to, std::allocator > const&, osrm::extractor::ScriptingEnvironment&, std::vector<osrm::extractor::TurnRestriction, std::allocatorosrm::extractor::TurnRestriction >&, std::vector<osrm::extractor::UnresolvedManeuverOverride, std::allocatorosrm::extractor::UnresolvedManeuverOverride >&, osrm::util::DynamicGraphosrm::util::NodeBasedEdgeData&, std::vector<osrm::extractor::NodeBasedEdgeAnnotation, std::allocatorosrm::extractor::NodeBasedEdgeAnnotation > const&, osrm::extractor::CompressedEdgeContainer&) () from /usr/lib/libosrm_extract.so
#11 0x00007fb3c35166c9 in osrm::extractor::NodeBasedGraphFactory::Compress(osrm::extractor::ScriptingEnvironment&, std::vector<osrm::extractor::TurnRestriction, std::allocatorosrm::extractor::TurnRestriction >&, std::vector<osrm::extractor::UnresolvedManeuverOverride, std::allocatorosrm::extractor::UnresolvedManeuverOverride >&) () from /usr/lib/libosrm_extract.so
#12 0x00007fb3c3520fc4 in osrm::extractor::NodeBasedGraphFactory::NodeBasedGraphFactory(boost::filesystem::path const&, osrm::extractor::ScriptingEnvironment&, std::vector<osrm::extractor::TurnRestriction, std::allocatorosrm::extractor::TurnRestriction >&, std::vector<osrm::extractor::UnresolvedManeuverOverride, std::allocatorosrm::extractor::UnresolvedManeuverOverride >&) ()
from /usr/lib/libosrm_extract.so
#13 0x00007fb3c34ca1d3 in osrm::extractor::Extractor::run(osrm::extractor::ScriptingEnvironment&) () from /usr/lib/libosrm_extract.so
#14 0x00007fb3c340d14e in osrm::extract(osrm::extractor::ExtractorConfig const&) () from /usr/lib/libosrm_extract.so
#15 0x0000558480de0c28 in main ()

@mjjbell
Copy link
Member

mjjbell commented Sep 13, 2021

Looks like the osrm-extract RelWithDebInfo build is hitting an unrelated assertion due to invalid or incorrectly parsed maneuver overrides.

With regards to osrm-partition + osrm-customize, patching #6124 and #6123 is required for planet scale data.
Applying 9884684 will help make things more efficient, but processing should still work without it.

@RZR-UA
Copy link
Author

RZR-UA commented Sep 14, 2021

  • osrm-extract successed with Release build

  • osrm-contract crashed again, with RelWithDebInfo build

Details:

$ time osrm-contract bike.osrm
[info] Input file: bike.osrm
[info] Threads: 40
[info] Reading node weights.
[info] Done reading node weights.
[info] Loading edge-expanded graph representation
[info] merged 3742134 edges out of 3005739876
[info] initializing node priorities... ok.
[info] preprocessing 739184275 (100%) nodes...
[info] . 10% . 20% . 30% . 40% . 50% [assert][139845057885760] /tmp/osrm-backend/src/osrm-backend-5.25.0/src/contractor/contractor_search.cpp:28
in: void osrm::contractor::{anonymous}::relaxNode(osrm::contractor::ContractorHeap&, const ContractorGraph&, NodeID, EdgeWeight, NodeID): to != SPECIAL_NODEID
terminate called without an active exception
Aborted (core dumped)

real 181m33.688s

(gdb) bt
#0 0x00007f31375a1d22 in raise () from /usr/lib/libc.so.6
#1 0x00007f313758b862 in abort () from /usr/lib/libc.so.6
#2 0x00007f31377e7802 in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007f31377f3c8a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x00007f31377f3cf7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x00007f31382b13fd in ?? () from /usr/lib/libosrm_contract.so
#6 0x00007f31382b1419 in boost::assertion_failed(char const*, char const*, char const*, long) () from /usr/lib/libosrm_contract.so
#7 0x00007f31382a03a4 in ?? () from /usr/lib/libosrm_contract.so
#8 0x00007f31382a0441 in osrm::contractor::search(osrm::util::QueryHeap<unsigned int, unsigned int, int, osrm::contractor::ContractorHeapData, osrm::util::XORFastHashStorage<unsigned int, unsigned int, 65536ul> >&, osrm::util::DynamicGraphosrm::contractor::ContractorEdgeData const&, unsigned int, int, int, unsigned int) () from /usr/lib/libosrm_contract.so
#9 0x00007f31382acea2 in ?? () from /usr/lib/libosrm_contract.so
#10 0x00007f31382acfde in ?? () from /usr/lib/libosrm_contract.so
#11 0x00007f31382ad386 in ?? () from /usr/lib/libosrm_contract.so
#12 0x00007f31382ad449 in ?? () from /usr/lib/libosrm_contract.so
#13 0x00007f31382afc9e in ?? () from /usr/lib/libosrm_contract.so
#14 0x00007f31382afe5b in ?? () from /usr/lib/libosrm_contract.so
#15 0x00007f313752a105 in ?? () from /usr/lib/libtbb.so.2
#16 0x00007f313752a43c in ?? () from /usr/lib/libtbb.so.2
#17 0x00007f3137523d97 in ?? () from /usr/lib/libtbb.so.2
#18 0x00007f31375223e1 in ?? () from /usr/lib/libtbb.so.2
#19 0x00007f313751e81c in ?? () from /usr/lib/libtbb.so.2
#20 0x00007f313751ea8a in ?? () from /usr/lib/libtbb.so.2
#21 0x00007f313754d259 in start_thread () from /usr/lib/libpthread.so.0
#22 0x00007f31376635e3 in clone () from /usr/lib/libc.so.6

@RZR-UA
Copy link
Author

RZR-UA commented Sep 15, 2021

Applied patches #6123 #6124 and have same assertion and crash:

$ time osrm-partition bike.osrm
[info] Computing recursive bisection
[info] Loaded compressed node based graph: 741513462 edges, 1811915677 nodes
[info] running partition: 128 1.2 0.25 10 1000 # max_cell_size balance boundary cuts small_component_size
[info] Found 1530830833 SCC (827 large, 1530830006 small)
[info] SCC run took: 676.208s
[info] Full bisection done in 18063.9s
[info] Loaded node based graph to edge based graph mapping
[info] Loaded edge based graph for mapping partition ids: 3001997742 edges, 739184275 nodes
[info] Fixed 42634 unconnected nodes
[info] Edge-based-graph annotation:
[info] level 1 #cells 2465902 bit size 22
[info] level 2 #cells 259539 bit size 18
[info] level 3 #cells 16419 bit size 15
[info] level 4 #cells 523 bit size 10

[info] Renumbered data in 6971.75 seconds
terminate called after throwing an instance of 'osrm::util::exception'
what(): Can't pack the partition information at level 4 into a 64bit integer. Would require 65 bits.

Aborted (core dumped)

real 855m38.852s

(gdb) bt
#0 0x00007efd5c96ed22 in raise () from /usr/lib/libc.so.6
#1 0x00007efd5c958862 in abort () from /usr/lib/libc.so.6
#2 0x00007efd5cbb2802 in __gnu_cxx::__verbose_terminate_handler () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x00007efd5cbbec8a in __cxxabiv1::__terminate (handler=) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x00007efd5cbbecf7 in std::terminate () at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x00007efd5cbbef8e in __cxxabiv1::__cxa_throw (obj=, tinfo=0x7efd5d2ed2b0 , dest=0x5588ff37ca90 osrm::util::exception::~exception())
at /build/gcc/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6 0x00007efd5d170272 in ?? () from /usr/lib/libosrm_partition.so
#7 0x00007efd5d178bb3 in ?? () from /usr/lib/libosrm_partition.so
#8 0x00007efd5d18e6c7 in osrm::partitioner::detail::MultiLevelPartitionImpl<(osrm::storage::Ownership)0>::MultiLevelPartitionImpl<std::enable_if<true, void> >(std::vector<std::vector<unsigned int, std::allocator >, std::allocator<std::vector<unsigned int, std::allocator > > > const&, std::vector<unsigned int, std::allocator > const&) ()
from /usr/lib/libosrm_partition.so
#9 0x00007efd5d194620 in osrm::partitioner::Partitioner::Run(osrm::partitioner::PartitionerConfig const&) () from /usr/lib/libosrm_partition.so
#10 0x00005588ff37c166 in main ()

@mjjbell
Copy link
Member

mjjbell commented Sep 15, 2021

This is where the --max-cell-sizes argument is needed.

The default max cell sizes are 128, 4096, 65536, 2097152 at each level of the partition. Planet scale data is generating too many cells to pack all the numbers into a uint64. You'll need to increase these values until the cell counts can be packed into 64 bits. You don't want to increase the sizes too much though as that will negatively impact query performance.

This will need to be addressed in OSRM at some point given OSM data is only getting bigger, but for now this workaround is required if you see that exception message.

@RZR-UA
Copy link
Author

RZR-UA commented Sep 17, 2021

$ osrm-partition --max-cell-sizes=1024,16384,262144,4194304 bike.osrm

processed successfully w/o crash but:

$ time osrm-customize bike.osrm

time osrm-customize bike.osrm
[info] Loaded edge based graph: 3001997742 edges, 739184275 nodes
[info] Loading partition data took 7411.1 seconds
[info] RAM: peak bytes used: 203168149504
[error] [exception] std::bad_alloc
[error] Please provide more memory or consider using a larger swapfile

real 124m23.947s

I have 512GB RAM (~450GB free) plus 116GB SWAP

This was RelWithDebInfo, next try with Release build.

@RZR-UA
Copy link
Author

RZR-UA commented Sep 17, 2021

Same problem but instead of 124 minutes, Release build end just after 15 minutes:

$ time osrm-customize bike.osrm
[info] Loaded edge based graph: 3001997742 edges, 739184275 nodes
[info] Loading partition data took 914.953 seconds
[info] RAM: peak bytes used: 203156135936
[error] [exception] std::bad_alloc
[error] Please provide more memory or consider using a larger swapfile

real 15m31.712s

@mjjbell
Copy link
Member

mjjbell commented Sep 17, 2021

Can you list the data directory?
ls -lh

@RZR-UA
Copy link
Author

RZR-UA commented Sep 17, 2021

Now 90GB more swap added and osrm-customize try again.
If failed I'll try to cleanup data and full restart with patched-Release build.

Update: failed even with 90GB swap added

$ free -h
total used free shared buff/cache available
Mem: 503Gi 57Gi 416Gi 2.8Gi 29Gi 439Gi
Swap: 204Gi 0B 204Gi

$ ls -lh
-rw-rw-r-- 1 osrm osrm 60G Sep 9 15:09 bike.osm.pbf
-rw-rw-r-- 1 osrm osrm 87G Sep 13 19:29 bike.osrm
-rw-rw-r-- 1 osrm osrm 8.5G Sep 16 04:07 bike.osrm.cells
-rw-rw-r-- 1 osrm osrm 5.6G Sep 13 21:47 bike.osrm.cnbg
-rw-rw-r-- 1 osrm osrm 5.6G Sep 16 03:10 bike.osrm.cnbg_to_ebg
-rw-rw-r-- 1 osrm osrm 68K Sep 17 11:14 bike.osrm.datasource_names
-rw-rw-r-- 1 osrm osrm 34G Sep 16 04:08 bike.osrm.ebg
-rw-rw-r-- 1 osrm osrm 8.4G Sep 16 03:17 bike.osrm.ebg_nodes
-rw-rw-r-- 1 osrm osrm 9.8G Sep 13 22:33 bike.osrm.edges
-rw-rw-r-- 1 osrm osrm 8.3G Sep 16 03:34 bike.osrm.enw
-rwx------ 1 osrm osrm 36G Sep 16 03:11 bike.osrm.fileIndex
-rw-rw-r-- 1 osrm osrm 38G Sep 13 22:35 bike.osrm.geometry
-rw-rw-r-- 1 osrm osrm 6.8G Sep 13 22:33 bike.osrm.icd
-rw-rw-r-- 1 osrm osrm 6.5K Sep 16 03:34 bike.osrm.maneuver_overrides
-rw-rw-r-- 1 osrm osrm 204M Sep 13 19:29 bike.osrm.names
-rw-rw-r-- 1 osrm osrm 21G Sep 13 21:46 bike.osrm.nbg_nodes
-rw-rw-r-- 1 osrm osrm 5.6G Sep 16 04:07 bike.osrm.partition
-rw-rw-r-- 1 osrm osrm 6.0K Sep 13 19:29 bike.osrm.properties
-rw-rw-r-- 1 osrm osrm 145M Sep 13 22:47 bike.osrm.ramIndex
-rw-rw-r-- 1 osrm osrm 4.0K Sep 13 22:14 bike.osrm.restrictions
-rw-rw-r-- 1 osrm osrm 3.5K Sep 13 18:41 bike.osrm.timestamp
-rw-rw-r-- 1 osrm osrm 3.5K Sep 13 22:33 bike.osrm.tld
-rw-rw-r-- 1 osrm osrm 5.5K Sep 13 22:33 bike.osrm.tls
-rw-rw-r-- 1 osrm osrm 2.8G Sep 13 22:14 bike.osrm.turn_duration_penalties
-rw-rw-r-- 1 osrm osrm 17G Sep 13 22:15 bike.osrm.turn_penalties_index
-rw-rw-r-- 1 osrm osrm 2.8G Sep 13 22:14 bike.osrm.turn_weight_penalties

@RZR-UA
Copy link
Author

RZR-UA commented Sep 20, 2021

Have success with patched version.

Need:

  1. Patches: Fix MLD level mask generation to support 64-bit masks. #6123 Fix metric offset overflow for large MLD partitions #6124

  2. -DCMAKE_BUILD_TYPE=Release build only (other builds failed with alloc-related errors)

  3. Success commands:

osrm-extract -p /usr/share/osrm/profiles/bicycle.lua bike.osm.pbf # 255m20.950s
osrm-partition --max-cell-sizes=1024,16384,262144,4194304 bike.osrm # 336m24.008s
osrm-customize bike.osrm # 21m19.010s

Checked Planet-routing (bicycle profile) accross few countries.

Everything seems ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants