Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch OSM for any region from Protomaps extract API #842

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

abyrd
Copy link
Member

@abyrd abyrd commented Nov 25, 2022

Fetch OSM data with minutely updates applied using the OSM extract service from https://protomaps.com. This is a minimum viable endpoint and backend component with background TaskAction progress reporting. A successful download yields an OSMPBF DataSource. This is very similar to what we were doing in the past with Vanilla Extract, but using a much more mature tool, https://github.com/protomaps/OSMExpress. (In fact there's an experimental branch of Vanilla Extract taking some cues from OSMExpess at https://github.com/conveyal/vanilla-extract/tree/lmdb).

This is not really intended to be used as-is, but as a sketch of how we might eventually treat OSM (and eventually GTFS) as both uploadable and auto-fetchable data sources, which can then be selected and combined into networks. It also serves as working example code for interacting with this particular extract service.

This requires a Protomaps API token from https://app.protomaps.com/dashboard. I believe the sign-up process is not public yet. I have a key and have tested this out, it seems to work exactly as intended.

Screen Shot 2022-11-25 at 18 05 05

Screen Shot 2022-11-25 at 18 05 42

Some important caveats:

  • The extracted data seems to behave like --strategy complete_ways in Osmium, so long ways may extend very far outside the bounding box. This can be relatively harmless if it's a few short segments, but could also lead to very oversized bounds or analysis areas, or at the very least the inclusion of lots of data that don't actually affect the analysis at hand.
  • Extracted data include all types of ways, including buildings, coastlines, administrative boundaries etc. In some places, these constitute the majority of the data, yet they have no impact on routing. These are also the ways that tend to be very large with lots of detail. Just changing an extract so it doesn't touch the coastline, or filtering out buildings and land use polygons can have an extreme effect on file size and geographic extent, reducing sizes by an order of magnitude in some cases. This means less data to transfer and load into database files before processing into networks.
  • Simply skipping the really vast ways like coastline polygons reduces the negative impact of the complete_ways behavior, as long roads like highways are typically broken into smaller segments.

For production use, it would be advantageous to perform some simple filtering on the fly in OSMExpress, skipping over ways with certain tags, and possibly truncating ways that extend too far outside the bounding box.

minimum viable endpoint with progress reporting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant