Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve database performance #3741

Merged
merged 15 commits into from
Feb 11, 2022

Conversation

Helium314
Copy link
Collaborator

ElementGeometryDao is optimized in this PR, see #3609 (comment) for performance comparison with current version.

The element geometry table is split into separate tables for ways and relations to avoid slow queryIn in getAllEntries(keys).
Nodes do not have a separate geometry table, as nodes have a simple point geometry that can be created from the node position.

A spatial index is added to NodeTable, and some functions for querying nodes inside a bbox are added to NodeDao (equivalents to functions in ElementGeometryDao).

MapDataController.getMapDataWithGeometry makes use of the the new table layout by not querying node geometry and data separatly, instead point geometry is created from node positions.

Still a draft, because database upgrade is missing and tests have not been updated yet.

@Helium314 Helium314 marked this pull request as ready for review February 10, 2022 15:00
@Helium314
Copy link
Collaborator Author

Tests are added, so I marked the PR as ready.

I tested the database upgrade by installing a current version of SC and downloaded some area in addition to initial scan. Then I installed a build from this PR and SC worked as expected when opening a few quests. Is that enough?

Upgrading the DB gave a black screen at start, about 15 seconds for a 20 MB database (14 MB after upgrade) on my S4 mini. This is not great, but I would expect the average phone to be considerably faster than mine.

Copy link
Member

@westnordost westnordost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I am going to attempt some more optimizations on top of this branch so you can leave the processing of my comments to me


fun getAll(bbox: BoundingBox): List<Node> {
return db.query(NAME, where = inBoundsSql(bbox)) { cursor ->
Node(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplication of creating Node from cursor, same with ElementGeometryEntry

@Helium314
Copy link
Collaborator Author

So is there anything left for me, or are you doing the rest?

@westnordost
Copy link
Member

I did the rest. However, I created the branch db_improvements_improvements which is basically the changes from fetch-by-bbox on top of this branch. I'd prefer this because I find it cleaner (also: less code) and the behavior would be like the OSM API functions.

However, I am not sure about performance.

You still have this nice test setup, don't you? Could you test / compare the performance of db_improvements and db_improvements_improvements? If it doesn't make the performance clearly slower, I'd like to merge db_improvements_improvements into db_improvements

@westnordost
Copy link
Member

Helium314#334

@Helium314
Copy link
Collaborator Author

I remember it was faster, but returned fewer elements: #2803 (comment)

Is this still the case? Then I'll try make the testing a bit more exhaustive and compare the number of returned elements as well.

@westnordost
Copy link
Member

Yes, it is deliberate that it returns less elements. This is how the OSM API works. It returns all elements who have at least one node inside the given bbox. Current StreetComplete behavior is that it returns all elements whose bbox intersect with the bbox given.

E.g. imagine a bicycle route relation that goes from the town center out to the suburbs in the North-West. The bbox of this relation would be quite large, so the route relation would be included in every bbox-query between the center and anywhere in the north or the west or northwest. When querying the OSM API, the route relation would only be included if a node of a way to which it refers is included in the bbox.

@westnordost
Copy link
Member

Thank you for the link to #2803. I've read it again and see there is the issue with #2803 (comment)

But on the other hand, we already add a padding of 20m to each side of the bbox plus it is only used for fuzzy filters to not ask for certain elements. Fuzzy filters are fuzzy anyways, it's OK if the user instead answers e.g. "cycleway displayed separately on map" because one might argue that this info was missing anyway.

@smichel17
Copy link
Member

It seems like the main performance problem comes from fetching relations that intersect the bbox. However, the use case in #2803 (comment) seems like it applies primarily to ways.

Would it make sense to use the current SC approach (show if intersecting) for ways and the osmapi approach (show only if a node is contained) for relations? Or is that not possible for technical reasons?

@westnordost
Copy link
Member

Technical reasons that speak against this is that it is more complex (more code, possibly hacky-looking code) and looks inconsistent, plus is inconsistent with the OSM API. For me, this is reason enough to not consider it.

@smichel17
Copy link
Member

In my opinion, performance optimizations are possibly the best justification for hacky code.

I imagine an interface like

fun fetchByBbox(bbox: Bbox, includeIntersectingWays: Boolean = false, includeInteresctingRelations: Boolean = false)

Under the hood, the initial, cleaner implementation would perform a fetch-by-bbox query that works like the osm api, then separate call(s) to find intersecting ways or relations. I suspect doing separate calls would be a performance hit, although I'm not sure by how much (maybe still better than always looking for relations). Then we could decide whether combining the queries would be worth the messier code.

But anyway, I do not have a very strong opinion. Just wanted to share the idea. If your opinion remains the same, no need to respond.

@Helium314
Copy link
Collaborator Author

action / function this PR (s) fetch-by-bbox (s)
ElementDao.putAll 204 204
ElementGeometryDao.putAll 16 12
query bench run 530 477
MapDataController.getMapDataWithGeometry 129 81
ElementGeometryDao.getAllEntries(keys) 9 9
ElementGeometryDao.getAllKeys 4 does not exist
NodeDao.get 9 8
NodeDao.getAll 83 83
WayDao.get 122 122
OsmQuestDao.getAllInBBox 14 14
OsmQuestDao.getAllForElements 108 107

By a very short look, getMapDataWithGeometry on this PR usually returns 10-20 elements more than on fetch-by-bbox. This appears to be more accurate than some %, as it holds for 371 vs 359 and for 25 vs 11 elements

@westnordost
Copy link
Member

Thank you! Nice, so it is not slower, probably actually a little faster (if only because it returns less elements, who knows).

So, can you merge it into this branch and I merge this branch into master?

Also, compared to current situation, what would be the speed gain, overall in %? Would be nice to have a % for the changelog

implement changes from "fetch-by-bbox" branch
@Helium314
Copy link
Collaborator Author

Helium314 commented Feb 11, 2022

Comparing to the current state:

  • database is ca 30% smaller
  • persisting elements is ca 30% faster (whole download + persisting + quest creation is ca 20% faster)
  • DB access in "average use" (as in the logged queries for the bench) is ca 130% faster

@westnordost
Copy link
Member

westnordost commented Feb 11, 2022

130%?! Wow, I completely underestimated the difference this makes.... how? ElementGeometryDao had an index on type+id didn't it?

Edit: Yeah, a primary key is automatically an index as far as I know -
https://github.com/streetcomplete/StreetComplete/pull/3741/files#diff-fa800f65682c1415b95f276d039e7a6d6f982782d59f323732eb9e77693f9d02L31-L34

@Helium314
Copy link
Collaborator Author

My test showed that

fun getAllEntries(keys: Collection<ElementKey>): List<ElementGeometryEntry> {
if (keys.isEmpty()) return emptyList()
return db.queryIn(NAME,
whereColumns = arrayOf(ELEMENT_TYPE, ELEMENT_ID),
whereArgs = keys.map { arrayOf(it.type.name, it.id) }

was quite slow (500 ms+).
When I replaced the queryIn by querying a single column with my hacky single index, ElementGeometryDao.getAllEntries(keys) was a lot faster. That's what started trying the optimizations on the normal database, because in that query there is no R-tree index involved.

@westnordost
Copy link
Member

Aaahh, right. This is because it is actually not really supported by SQLite, queryIn is a workaround:

https://github.com/streetcomplete/StreetComplete/blob/master/app/src/main/java/de/westnordost/streetcomplete/data/DatabaseXt.kt#L28

@westnordost westnordost merged commit d27f5d0 into streetcomplete:master Feb 11, 2022
@westnordost westnordost deleted the db_improvements branch February 11, 2022 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants