-
-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements: Faster Overpass queries #1514
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@mmd-osm Regarding putting the name later: You said in #1457 (comment)
If the heuristic is there to cut down the number of objects as early as possible, why does it not work in the cases mentioned above? Is it because the grouping by filter type comes first and only then within these groups, the filters are sorted by that heuristic? If I understand correctly, this query would not be very optimized because it pulls in first all elements with no name, even though pulling in only elements with
Did I understand this correctly? |
No, what I meant by the heuristic is a certain sequence in which those different groups are being evaluated. Although a bit dated and somewhat high level, Roland gave a presentation at FOSSGIS2013 on how the evaluation works, starting on slide 23: https://www.fossgis.de/konferenz/2013/programm/attachments/428_workshop.odp / https://www.fossgis.de/konferenz/2013/programm/events/520.de.html Some of the filters take size of the bounding box into account, or switch evaluation strategy on the fly, if one approach produces too much data during evaluation. I'm not aware of a concise description on all the details (and there are many more, which I didn't mention yet), except for the source code.
The major issue here is that there's no statistical information available on local or even global tag distribution and frequency, i.e. it is not known up front that superspecialkey is quite rare, and name is very frequent. By forcing a different evaluation sequence, I'm bringing in my own knowledge about tag frequency - which should be fairly universal in case of a |
But that could be done, in the future, right? A server-local taginfo of the like. After all, the data is all already there and overpass itself has the best means to acquire this data (quickly). |
Uhm, did I miss something or how did you extract the compiled overpass queries now?
… but then you just came up with the results (out of nowhere). I also checked the |
I wrote it, but did not commit it yet, work in progress. If you intend to copy&paste the queries to the wiki page, better wait until I am through with the optimizations. Minus imports, it is less than 30 lines of code. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
It's quite a brain teaser to find an algorithm to convert any boolean expression to overpass syntax. Though I may have cracked it now. Here is a teaser of an automatically generated query: AddBusStopNameTag Filters
|
Both queries look surprisingly similar 😎 A quick comment,though: You really need to double check if moving the second line (producing .n1) towards the end helps avoiding excessive memory consumption, see #1514 (comment) Example:
.n1 has 172850 nodes in this case... this will not really fly. vs:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Alrighty, this is the final syntax: AddRoadName
AddPlaceName
AddOnewayAddBusStopNameTag Filters
AddIsBuildingUndergroundTag Filters
AddHousenumber
MarkCompletedHighwayConstruction
AddReligionToPlaceOfWorshipTag Filters
AddParkingAccessTag Filters
AddRecyclingTypeTag Filters
AddSportTag Filters
AddRoadSurfaceTag Filters
AddMaxSpeedTag Filters
AddMaxHeight
AddRailwayCrossingBarrier
AddPostboxCollectionTimesTag Filters
AddOpeningHoursTag Filters
AddBikeParkingCapacityTag Filters
AddOrchardProduceTag Filters
AddCycleway
AddSidewalk
AddProhibitedForPedestriansTag Filters
AddCrossingTypeTag Filters
AddBuildingLevelsTag Filters
AddBusStopShelterTag Filters
AddVegetarianTag Filters
AddVeganTag Filters
AddInternetAccessTag Filters
AddParkingFeeTag Filters
AddMotorcycleParkingCapacityTag Filters
AddPathSurfaceTag Filters
AddTracktypeTag Filters
AddForestLeafType
AddBikeParkingTypeTag Filters
AddWheelchairAccessToiletsTag Filters
AddPlaygroundAccessTag Filters
AddWheelchairAccessBusinessTag Filters
AddToiletAvailabilityTag Filters
AddFerryAccessPedestrianTag Filters
AddFerryAccessMotorVehicleTag Filters
AddBuildingTypeTag Filters
AddWayLitTag Filters
AddToiletsFeeTag Filters
AddBabyChangingTableTag Filters
AddBikeParkingCoverTag Filters
AddTrafficSignalsSoundTag Filters
AddRoofShapeTag Filters
AddWheelchairAccessPublicTransportTag Filters
AddWheelchairAccessOutsideTag Filters
AddTactilePavingBusStopTag Filters
AddTactilePavingCrosswalkTag Filters
AddBridgeStructureTag Filters
AddReligionToWaysideShrineTag Filters
AddCyclewaySegregationTag Filters
MarkCompletedBuildingConstruction
AddCyclewayPartSurfaceTag Filters
AddFootwayPartSurfaceTag Filters
AddMotorcycleParkingCoverTag Filters
AddFireHydrantTypeTag Filters
AddParkingTypeTag Filters
AddPowerPolesMaterialTag Filters
AddCarWashTypeTag Filters
AddBenchBackrestTag Filters
AddTrafficSignalsButtonTag Filters
|
The following times are from https://overpass.maptime.in/api/ which has little load, so the times are more stable (and probably less than one can expect from the main instance). The bbox used was
So, this is in waiting for the response time alone, almost 1.5 minutes, or overall, 53% faster.
By the way, how come that the pull request you mentioned drolbr/Overpass-API#167 has not been merged after more than 4 years? That's a pretty long time. |
By the way, do you know who hosts https://overpass.maptime.in/api/ ? I would like to ask, if it is okay to add it to StreetComplete's alternative overpass servers list |
AFAIK it was set up for OSMCha by the Mapbox team in India. I don't know exactly if the team is stilll around or who currently maintains the box. I think @willemarcel should have more details. |
I did the above performance test also on other servers, the results are:
|
So even simply switching away from the default overpass instance, of which at least z.overpass-api.de seems to be hopelessly overloaded, will lead to a massive time reduction in download. Something between 3x to 6x faster. |
lz4 has seen a massive increase in load during the last two weeks and has higher than expected response times., z.* uses a slower compression, and is expected to be slower. one of the 4 kumi boxes lags behind by 7 years, another one is currently unavailable. |
overpass-api.de uses DNS round robin rather than a load balancer, i.e. the server has two DNS entries, one pointing to the z.* IP address, the other one to lz4.*. And yes, this might change... I think both servers have rate limiting in place: http://lz4.overpass-api.de/api/status |
I figure they have both rate limiting in place, but the test did not run into any rate limiting, I guess because it is has enough free capacities. So, can you confirm that lz4 may be used and is also intended to be used directly, without the DNS round robin / load balancer in between? Not querying the overloaded server every ~second time would already increase the speed the quests are downloaded three-fold. |
StreetComplete always correctly sets the user agent for every request it makes. Can it be ascertained from munin or other statistics for how many % of requests / for how much load the app is actually responsible for? |
If you manage a list of servers in your app and fall back to other servers if one turns out to be unavailable, that should be ok. Ideally, that would be part of some configuration that won't require users to download a new version of the app, in case some changes are needed. I can't really help with the question re. production, you need to get in touch with Roland by email. |
So may I use the queries from #1514 (comment) for updating the wiki now? |
If you change your script a bit, you can create a wiki page with links to overpass turbo: https://wiki.openstreetmap.org/wiki/User:Mmd/Test_SC (btw: dev branch runs all queries in 24s) |
Cool, nice Page! @rugk keeps links to all the overpass queries on this page https://wiki.openstreetmap.org/wiki/StreetComplete/Quests What, you mean, all of them together in 24s?? |
Yeah, I know this page, I thought it might be easier to just generate the content based on the existing quests in the code. That’s the “Total response time for downloading all quests” as before. |
How can the dev instance be 12 times faster than lz4? |
It runs a different fork/branch with performance fixes. |
That's pretty incredible
…On September 17, 2019 7:22:04 AM GMT+02:00, mmd ***@***.***> wrote:
It runs a different codeline with performance fixes.
|
So would not it be useful to contribute these back upstream? 😄 |
Sure, that’s ongoing... |
I've noticed that in the resulting regular expression syntax, some have the ORs in parentheses, most don't. For example, AddRoadName has both "^(primary|secondary|tertiary|unclassified|residential|living_street|pedestrian)$" and '^private|no$', and AddCycleway has them explicitly but AddWheelchairAccessBusiness doesn't have them. Is there a benefit to including the parentheses or detriment to avoiding them? |
Hm, I think |
Right, the overpass query was generated without the parentheses. Should be fixed now. |
Anyhow, a big thanks for noticing and notifying me about it! That's quite a big bug! |
Testing the |
NB: It seems this is entirely unrelated to the Performance optimization in this issue. The original queries already had the same issue. As an example: 6d234a4 |
Thanks. Any possibility if you could yet again update the final syntax? |
Some Overpass queries are very slow and clog up the download process by pitching the app in a long queue, waiting to replenish its Overpass download quota. If the queries were executed faster, the app would reach the quota later.
So:
Find out which quests take long and why. First step here is to create a little program in the
tests/
directory that iterates through the quest type list and outputs the compiled overpass query for each for some bbox to console. See Initial quest scan may not found any quests before running out of query quota #1457 (comment). The output of this program would also be interesting for @rugk who is maintaining the quests wiki page and links to the used overpass queries for each quest type.Executing each query on some location should give an idea which quest types are problematic.
Implement optimizations based on these observations. In the predecessor ticket to this one, a few suggestions have already been made:
or
s in OQL withif: <condition>
blocks if they don't contain regexThe text was updated successfully, but these errors were encountered: