Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose wayID in osrm route response by post processing #257

Closed
wangyoucao577 opened this issue Mar 30, 2020 · 4 comments · Fixed by #296
Closed

Expose wayID in osrm route response by post processing #257

wangyoucao577 opened this issue Mar 30, 2020 · 4 comments · Fixed by #296
Assignees
Labels
NewFeature New feature or feature improvement Prototype Proof of concept Research Read paper/issue/code for better understanding

Comments

@wangyoucao577
Copy link

wangyoucao577 commented Mar 30, 2020

This issue is subtask of #200.

Currently we'll go with proposal-1, but implement it in osrm-ranking first for simple instead of set up a new microservice.
We'll consider to split it to a separate service once it's stable and good enough as a long-term solution.

@wangyoucao577 wangyoucao577 self-assigned this Mar 30, 2020
@wangyoucao577 wangyoucao577 added the Prototype Proof of concept label Mar 30, 2020
@wangyoucao577 wangyoucao577 added this to the Sprint 61(Dev:04/21) milestone Mar 30, 2020
@wangyoucao577 wangyoucao577 changed the title osrm-annotate to expose wayID in osrm route response by post processing Expose wayID in osrm route response by post processing Mar 30, 2020
@wangyoucao577 wangyoucao577 added the InProgress Currently working on it label Apr 1, 2020
@wangyoucao577
Copy link
Author

After a quick try to build a nodeID->wayID,wayID,... mapping in memory, it needs about 130GB RAM for North America(totally about 5 billions nodeIDs), which is not acceptable.
Next we'll have a try to query wayIDs for nodeIDs from UniDB directly, even it may be slow.
In the near future, a better way maybe split the data set by spatial, then only load necessary data when query, which balances performance and RAM cosumption.

@wangyoucao577
Copy link
Author

Research

After look through some embedded key-value DBs, the blotdb is excellent for this task: B+tree, simple, fast, pure-go, uses memory-mapped file, etc. Some other choices are badger and lmdb-go, which are both fast key-value stores with a little different features.

In blotdb's comparison with other databases, it mentioned that "If your application is read-heavy or does a lot of range scans then Bolt could be a good choice.", seems very good to us(even we don't need to scan the db, only random read).
One possible Cons of blotdb is that its memory usage can be deceptive due to the memory-mmapped db file.
https://github.com/boltdb/bolt#comparison-with-other-databases, https://tech.townsourced.com/post/boltdb-vs-badger and https://dgraph.io/blog/post/badger-lmdb-boltdb provides more insights of these choices.

Conclusion

Let's go with blotdb first since it's well-documented and easy to use. If its performance is not good enough, then turn to other alternatives.
The author has archived the blotdb repo and recommend to use CoreOS fork bblot, so we''ll end up trying bblot at the moment.

References:

@wangyoucao577 wangyoucao577 added the Research Read paper/issue/code for better understanding label Apr 7, 2020
@wangyoucao577
Copy link
Author

wangyoucao577 commented Apr 10, 2020

The querying performance is excellent in my testing on bblot. I believe that we're going with the right direction.

Here's some details:

  • Generate a bblot db file from wayid2nodeids.csv
    • The original wayid2nodeids.csv is about 9GB, which includes about 60 millions ways and 5 billiions nodes. Its snappy compressed file is about 3GB.
    • Stores fromNodeID,toNodeID->wayID mapping in db, each key costs 16bytes, each value costs 8 bytes.
    • The generated db file is about 28 GB(about 9.2GB after snappy compressed), looks acceptable.
    • The generating process needs about 10 hours for now, which is too slow and looks have a big space to improve. Will try to improve it later: Improve Generate nodes2way db performance #272.
  • use a cmd tool to query ways from nodes
    • query tens of nodes only takes 1ms!
time nodes2way-cli -alsologtostderr -db nodes2way_2.db -nodes 237197260002101,237197260001101,49873205102,237197410001101,49873130102,237197300001101,237197300002101,237197300003101,237197300004101,237197300005101,49873161102,975235466102,49873159102,1165068100102,886941850102,49873157102,1165068016102,970282321102,49873169102,237301590001101,237301590002101,49873171102,237301600001101,49939882102,49939887102,1008867724102,1008867725102,955126824102
I0410 07:42:56.753693   96136 main.go:55] Querying takes 0.001044 seconds
[-23719726 -23719726 -23719741 -23719741 23719730 23719730 23719730 23719730 23719730 23719730 -955012508 -955012507 -1231425143 -1231425142 -833969493 1231425025 1231425026 949074635 23730159 23730159 23730159 23730160 23730160 23706222 999840485 999840487 999840488]

real    0m0.012s
user    0m0.003s
sys     0m0.013s

# stats of the db: totally 504954866 keys
$ nodes2way-cli -db nodes2way_2.db -dbstat
{"dbstat":"{FreePageN:0 PendingPageN:0 FreeAlloc:0 FreelistInuse:0 TxN:0 OpenTxN:0 TxStats:{PageCount:0 PageAlloc:0 CursorCount:0 NodeCount:0 NodeDeref:0 Rebalance:0 RebalanceTime:0s Split:0 Spill:0 SpillTime:0s Write:0 WriteTime:0s}}","bucketstat":"{BranchPageN:83113 BranchOverflowN:0 LeafPageN:7186582 LeafOverflowN:0 KeyN:504954866 Depth:5 BranchAlloc:340430848 BranchInuse:233960016 LeafAlloc:29436239872 LeafInuse:20313179952 BucketN:1 InlineBucketN:0 InlineBucketInuse:0}"}

One more bug #276 found in this testing and need to investigate further. Without correct output of OSRM's nodes, we can not apply historical speeds correctly.

@wangyoucao577
Copy link
Author

Some profile data after integrated the nodes->ways retrieving in osrm-ranking:

  • short distance(about 20km)
I0422 06:38:16.957646   58630 route_by_osrm.go:22] osrm request to backend: http://127.0.0.1:5001/route/v1/driving/-122.006349,37.364336;-121.875654,37.313767?alternatives=3&annotations=true
I0422 06:38:17.014933   58630 route_by_osrm.go:39] osrm response received from backend, http status 200, response code Ok, message , data_version , takes 0.057340 seconds.
I0422 06:38:17.027278   58630 retrieve_ways.go:31] Retrieved 1595 wayIDs from 1598 nodeIDs for 3 legs(3 routes), takes 0.012301 seconds.
  • long distance(about 4700km)
I0422 06:37:23.803783   58630 route_by_osrm.go:22] osrm request to backend: http://127.0.0.1:5001/route/v1/driving/-122.006349,37.364336;-73.854703,40.686063?alternatives=3&annotations=true
I0422 06:37:27.359852   58630 route_by_osrm.go:39] osrm response received from backend, http status 200, response code Ok, message , data_version , takes 3.556213 seconds.
I0422 06:37:28.759916   58630 retrieve_ways.go:31] Retrieved 226390 wayIDs from 226394 nodeIDs for 4 legs(4 routes), takes 1.399967 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NewFeature New feature or feature improvement Prototype Proof of concept Research Read paper/issue/code for better understanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant