Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplatform OSM API client #5686

Merged
merged 51 commits into from
Aug 15, 2024
Merged

Multiplatform OSM API client #5686

merged 51 commits into from
Aug 15, 2024

Conversation

westnordost
Copy link
Member

@westnordost westnordost commented Jun 13, 2024

Resolves #5410


Also took the opportunity to make the implementations and naming for all the clients that use the multiplatform Ktor-Client for Http (which is everything now) consistent.


I did a bit of performance testing. Unfortunately, the results are very unsatisfying.

Downloading city center of Hamburg (~35000 elements)

(Done on a Samsung S10e, release APK)

performance

Edit: The download+parse time for ktor+xmlutil has now been reduced to about 5.2s for this particular test

The previously used Java library osmapi...

  • uses Java's HttpUrlConnection to do the Http request
  • the response InputStream is consumed by a streaming parser which produces one OSM element after another
  • which in turn each is copied into the app's own data structure

With this PR, we use

  • Ktor-client with CIO engine
  • response body is copied into a string
  • the string is parsed by xmlutil into a (DOM) data structure
  • we copy the data structure into our own

It is clear that the second approach must be slower, especially for large data, because we don't stream anything (not really supported by ktor and xmlutil just yet) but always copy the whole data. On the other hand, getting the response body as a string (rather than as a stream) and copying all the data in the end seems to be not be the cause for why it is so slow.
I am (even more) surprised how slow Ktor-client (and/or CIO) is. I see on the log that the garbage collector is quite busy. What is it doing?

westnordost and others added 30 commits June 6, 2024 01:19
# Conflicts:
#	app/src/main/java/de/westnordost/streetcomplete/data/osm/edits/upload/changesets/OpenChangesetsManager.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/osmnotes/NotesApi.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/osmnotes/NotesApiImpl.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/osmnotes/edits/NoteEditsUploader.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/osmtracks/TracksApi.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/osmtracks/TracksApiImpl.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/user/UserDataController.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/user/UserUpdater.kt
#	app/src/test/java/de/westnordost/streetcomplete/data/osmnotes/edits/NoteEditsUploaderTest.kt
@westnordost
Copy link
Member Author

I was informed that Ktor is working as intended and streaming is possible via another interface documented here. For that to work (multiplatform) properly, I'll have to wait first that both xmlutil and ktor are based on the same IO library (kotlinx-io).

# Conflicts:
#	app/src/main/java/de/westnordost/streetcomplete/data/osm/edits/upload/ElementEditUploader.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/user/UserDataController.kt
#	app/src/main/java/de/westnordost/streetcomplete/data/user/UserLoginController.kt
@Helium314
Copy link
Collaborator

I just did a comparison with my old S4 Mini, instead of the A3 I had with me the past few days.
for 57k elements, dowload is about 25 s on master, and 52 s on osmapi. Difference is much bigger than on A3.

@westnordost
Copy link
Member Author

Thanks for testing! In debug mode?

@Helium314
Copy link
Collaborator

Yes, debug mode.

@westnordost
Copy link
Member Author

I'll have to wait first that both xmlutil and ktor are based on the same IO library (kotlinx-io).

Beta of ktor is now based on kotlinx-io. However, it does not expose any Source that could be parsed by parsers. Not sure how they intend their interface should be used, so I asked https://youtrack.jetbrains.com/issue/KTOR-7224

@pdvrieze
Copy link

pdvrieze commented Aug 1, 2024

I had a look at the parsing speed with the optimized library (11813680 bytes read). When parsing the document from a string it parses in 351ms. When reading from inputstream it parses in 2000ms (2829ms on first parse). The manual parsing took about 200ms from a string). Obviously the manual parsing does less (it doesn't check unexpected items) - btw. it is better to loop through the attribute indices rather than look them up by name - the latter is repetitive.

@westnordost
Copy link
Member Author

westnordost commented Aug 1, 2024

Oh, interesting! How cool that you checked out this project to test it!

I am assuming that you did the test to compare with #5686 (comment) - from 1.6s to 0.35s + 0.2s = 0.55s (if I read this right) is a pretty huge improvement.
How come that there doesn't seem to be any improvement when reading from input stream?

btw. it is better to loop through the attribute indices rather than look them up by name - the latter is repetitive.

You mean e.g. like this? (Pseudo-code)

- "member" -> members.add(RelationMember(
-   type = ElementType.valueOf(attribute("type").uppercase()),
-   ref = attribute("ref").toLong(),
-   role = attribute("role")
- ))
+ "member" -> {
+ var type: ElementType? = null
+ var ref: Long? = null
+ var role: String? = null
+ for (attribute in attributes) {
+   when (attribute.name) {
+     "type" -> type = ElementType.valueOf(attribute.value.uppercase())
+     "ref" -> ref = attribute.value.toLong()
+     "role" -> role = attribute.value
+   }
+ }
+ members.add(RelationMember(type!!, ref!!, role!!))
+ }

@pdvrieze
Copy link

pdvrieze commented Aug 2, 2024

I checked out your project, but wrote my own little test class. Reading from inputstream appears to be network constrained. What I mean is yes, that pseudocode should be faster (although somewhat lax, it doesn't check namespaces nor duplication).

@westnordost
Copy link
Member Author

westnordost commented Aug 8, 2024

What I mean is yes, that pseudocode should be faster

...because there are less iterations. I rewrote so that I iterate the attributes myself and tested the performance. There is almost no difference, though. Actually, it is about 5% slower.

My guess why this is marginally slower is because for every element parsed, I have to initialize a few new local variables, these are allocated on the heap, which is not for free and in sum slightly more costly than iterating more (See pseudo-code above).

@westnordost
Copy link
Member Author

I intend to merge this now into v59.0.

I have been waiting on this because I am assuming that download+parsing speeds could be improved by letting the XML parsing library (xmlutils) parse the stream of bytes as it arrives through the network, i.e. do the parsing in parallel to the IO and not after. Currently, in this branch, first al bytes are streamed into a string, then that string is parsed. This is potentially slow for large data.

There is no Kotlin Multiplatform replacement for Java's InputStream in the standard library. The light at the end of the tunnel is Source from the kotlinx-io multiplatform library, which is just that. kotlinx-io is on its way to be stabilized and then form part of the set of official standard libraries maintained by JetBrains (creators of Kotlin).

For this to work, both xmlutil needs to be able to consume Source and Ktor (multiplatform HTTP library) needs to produce a Source representing the HTTP response. Ktor now migrated to use kotlinx-io under the hood, so things were looking promising.

However, long story short, Ktor uses kotlinx-io only under the hood, i.e. it does not expose a Source that could be consumed by a parser on its API. This is deliberate, as was explained to me in issue KTOR-7224, as Ktor's API is suspending while Source is a blocking API.
I posted a comment in kotlinx-io#163, the ticket in kotlinx-io in which a possible future async API is discussed, suggesting Ktor devs exchange ideas with kotlinx-io devs. But I don't think things will move quickly there, as an async API seems to be still somewhat in the brainstorm phase and I suspect that their priorities are first to stabilize the API that's already there.
And this is the reason why I merge this now, cause this could otherwise just as well lie around for another year and generate maintenance work (merging, keeping it up to date in parallel to current code).

The caveat of using this code now, is, as I previously mentioned, that there is a performance loss for downloads of large areas. Since part of the download process is also the persisting in the database and the creation of quests, it comes down to a performance loss of up to 25%.

# Conflicts:
#	app/build.gradle.kts
@westnordost westnordost removed the blocked blocked by another issue label Aug 15, 2024
@westnordost westnordost merged commit 6090c26 into master Aug 15, 2024
@westnordost westnordost deleted the osmapi branch August 15, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iOS necessary for iOS port
Projects
Status: Released
Development

Successfully merging this pull request may close these issues.

Multiplatform OSM API client
4 participants