-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recost shortcuts in bidastar: second approach #2711
Conversation
e5ef728
to
c2798a3
Compare
|
||
// Special case code if the last edge of the forward path is the destination edge | ||
// which means we need to worry about partial distance on the edge | ||
if (edgelabels_reverse_[idx2].predecessor() == kInvalidLabel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know its a bit heavy to use recosting to fix this problem, but it feels really good to be able to remove this anyway 😄
f975ae9
to
956d008
Compare
956d008
to
a17ab92
Compare
src/sif/recost.cc
Outdated
// grab the edge | ||
edge = reader.directededge(edge_id, tile); | ||
if (!edge) { | ||
throw std::runtime_error("Edge cannot be found"); | ||
} | ||
|
||
// re-derive uturns, would have been nice to return this but we dont know the next edge yet | ||
if (label.opp_local_idx() == edge->localedgeidx()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add a branch to this code path, we simply store the value of this comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because by default we use deadend
flag from the edge https://github.com/valhalla/valhalla/blob/master/valhalla/sif/edgelabel.h#L67
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. from the other hand, if it's deadend, this condition will be always true
src/thor/alternates.cc
Outdated
@@ -58,7 +58,8 @@ void filter_alternates_by_stretch(std::vector<CandidateConnection>& connections) | |||
|
|||
// Limited Sharing. Compare duration of edge segments shared between optimal path and | |||
// candidate path. If they share more than kAtMostShared throw out this alternate. | |||
bool validate_alternate_by_sharing(GraphReader& graphreader, | |||
// Note that you should recover all shortcuts before call this function. | |||
bool validate_alternate_by_sharing(GraphReader& /*graphreader*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you actually just remove the graphreader argument completely, now that we dont ahve to recover shortcuts its not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two things:
- remove the branch in recosting
- remove the graphreader arg in path sharing
well crap... I performance tested this and saw that it added approximately a 25% slowdown. assuming it must be the shortcut recovery. i did this little bit of work: #2714 after testing with that enabled on this branch it was just as slow 😄 so i suspect some things..
i'll look a bit closer to figure out where the performance drop lies |
Ok did a bit more digging. First I enabled the shortcut cache. Then I took the variable alias stats='R -q -e "x <- read.csv(\"stdin\", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])"' It seemed to me that the worst offenders in terms of routes were driving up my benchmarking scores to the point that i couldnt really make sense of it. Non insane routes were almost unaffected by this change. Anyway to prove out this thought I ran up my server with master. And did a run of RAD but with the json output format: ./run_with_server.py --test-file auto.txt --url http://localhost:8002/route --concurrency 24 --format json then i did the same but ran this branch with my small changes added. then i compared the percentiles between the two: master: grep -F response_time 20201207_235004_auto/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
0% 50% 90% 99% 100%
0.002395153 0.036434412 0.122441101 0.709217098 1.643116236
[1] 0.1298356
>
> this branch with my slight modifications: grep -F response_time 20201207_235232_auto/* | sed -e "s/.* //g" | stats
> x <- read.csv("stdin", header = F); quantile(x[ ,1] , c(0, .5, .90, .99, 1)); sd(x[ , 1])
0% 50% 90% 99% 100%
0.002358675 0.038644910 0.150891209 0.788010578 1.731230021
[1] 0.1442929
>
> You can see that for the average case the performance is nearly unchanged where as the performance of the top 10 percent of the slowest routes is something like 10-15% worse. @genadz i tried to update your other branch where we dont recost the whole route but rather just the parts that need it as they are recovered. can you use the same testing method to see what kind of results you get? |
on my machine I got the following results: master (route not found for 76 request)
recost_shortcuts_new (second approach) (route not found for 126 request)
recost_shortcuts (first approach) (route not found for 1183 request)
hm, that's interesting. despite the fact that times for the first approach are closer to master than second approach, I got approximately the same total time for both recosting branches (~25% slower). |
first thing i would do is figure out which routes are failing and fix them in both branches so that at least correctnesswise the code is shippable. then we'll have to ponder about performance. |
fixed first branch. it's ~15% slower than master recost_shortcuts (first approach)
master
|
d583af7
to
9470f5b
Compare
@genadz testing the latest code here i still see very large performance difference, in absolute terms it takes me about 23% more time to complete a 14k set of routes. also there are about 10% diffs in a RAD of the same route set. i would have expected some diffs but 10% is staggering. we should that we are seeing diffs we expect. @dgearhart would you be able to take a look? the summary here is that the code now turns shortcut edges in the path into the list of underlying edges. it doesnt add the intersecting edges at the nodes that were previously non-existant. at first i thought at sharp turns and stuff, the narrative might get an extra maneuver but without in the intersecting edges that seems highly unlikely (i am pretty sure it wont happen). this leads me to believe we are somehow getting different paths but i fail to see how that is possible considering the change to recover the shortcuts is after the path is found! |
@kevinkreiser I can kick off in the background - i am assuming master vs this branch? |
@kevinkreiser I am seeing diffs like the following: Why are we not adding the intersecting edge info? That would help with some missing maneuvers I see a 32% delta for user routes - i do not have time to review all of them but someone should make a good pass at reviewing diffs |
@dgearhart i would love to add all the edges too but it turns out that costs a ton of CPU and therefore really makes the request slower. that would be the ultimate goal but the original goal of this work is just to be able to get the right costing/time for the edges along the path. i recently got my perf tools working in my IDE so i might have a look here. I'll say this, one of the big contributors is the state shield verbal regexs in odin. They alone are about 7% of the total request latency. I have an idea how to refactor them to make them less expensive but haven't worked on it yet. |
208daf8
to
f5a6dcd
Compare
|
@genadz ill test it again today maybe i didnt get the updates for some reason! |
thanks! |
ok looking at the 50th percentile yeah its about 5%. oddly the 90th is something like 15% slower but the 99th and 100th are 5 and 0 respectively. frankly its hard to tell what i should understand from this result. i guess mostly the 50th percentile is the most important since the bulk of requests will be in this range. when i look at the total time to complete the benchmark though its about a 7.5% performance drop. i think if we merge the optimizations branch and do a couple more optimizations we'll be able to pay for this with those 😄 i personally dont think we have to wait to merge this before we merge those, does anyone else have an opinion? @purew or @danpat ?
|
(!costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index, | ||
time_restrictions_TODO) && | ||
!ignore_access)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the flag up in the if so that allowed doesnt get called at all
(!costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index, | |
time_restrictions_TODO) && | |
!ignore_access)) { | |
!ignore_access && !costing.Allowed(edge, label, tile, edge_id, localtime, offset_time.timezone_index, | |
time_restrictions_TODO)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't do this because we need to evaluate time restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can explicitly call costing.EvaluateRestrictions
in case ignore_access = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the restrictions are based on access though as well, is there really a point of checking them? i guess im more thinking we should make the boolean called throw_if_not_allowed
or something more generic like strict
to just completely turn off checking allowed at all. maybe we can flip the meaning and say something like bool allow_all = false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/valhalla/valhalla/blob/master/test/astar.cc#L784 - this test fails if we skip Allowed
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. what if we back to the first approach where we recost only shortcuts ? - in this case we will save all time_restrictions
that we calculated for regular edges.
and, when recosting shortcuts, we can 1) use flag to turn off checking Allowed
or 2) don't use this flag but in case recosting fails - we just don't expand shortcut edge, add shortcut to the final path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i personally prefer this implementation as its a lot less complex (no splicing). you have a good point about showing the restrictions. id say just let it as it is and maybe make a note that we have to put the flag second so we can get restriction information.
@mandeepsandhu the main slowdown in odin is running over these regex's for every street name: https://github.com/valhalla/valhalla/blob/master/valhalla/baldr/verbal_text_formatter_us.h#L37 i think we can rewrite this to do our own form of matching that isnt as smart as regex to speed this up. |
11a4177
to
aa81388
Compare
* use functor instead of vector to get next edge
a1644be
to
9982ec8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you again so much for the very long path to getting here. i hope we can make a few more performance imporovements and then even remove the optimization for not adding intersecting edges.
Tasklist
Requirements / Relations
Link any requirements here. Other pull requests this PR is based on?