Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Schema for offline database #3373

Closed
jfirebaugh opened this issue Dec 21, 2015 · 9 comments
Closed

Schema for offline database #3373

jfirebaugh opened this issue Dec 21, 2015 · 9 comments

Comments

@jfirebaugh
Copy link
Contributor

We need to write a good schema for the database used for offline access. Because the offline database is a persistent object, any format changes we make in the future will need to be supported with a mechanism for runtime migration of existing offline database to the new format. That's complicated and error prone, so let's try to get the format right to begin with.

Requirements

From most required to most flexible:

  • The database needs to store:
    • Vector tiles, from multiple sources
    • Raster tiles, from multiple sources
    • Glyphs, from multiple URL templates and for multiple font stacks
    • Source TileJSON, for multiple sources
    • Sprite JSON and images, for multiple styles
    • GeoJSON data, for multiple GeoJSON sources
  • The database needs to record pixel ratio information about raster tiles and sprites, and possibly support storage in multiple pixel ratios simultaneously.
  • The database should support the management of multiple "datasets" (using that as a generic term for the result of populating with bbox+z-range, "tiles along a path", or what ever other population strategies we support)
    • Management means downloading a dataset piecemeal, bulk adding a pre-compiled dataset, and deleting a dataset.
  • Where two datasets overlap, they should ideally share storage rather than duplicating common elements (tiles mainly). A shared elements should remain extant until the last dataset that includes it is deleted.
  • The format is stable and well-documented, so that third parties can supply their own precompiled datasets in the format.

cc @adam-mapbox @incanus

@jfirebaugh
Copy link
Contributor Author

See #2939 (comment) for some related discussion.

MBTiles has the significant flaw that it's a single tileset schema -- there's no ability to store tiles from multiple tilesets within the specification, which is a requirement for offline.

@jfirebaugh
Copy link
Contributor Author

Work on #3374 made clear some more additions:

  • Raster tiles may need to be stored at multiple pixel ratios (@1x, @2x) -- or at the very least, the pixel ratio of the tile should be recorded.
  • Same for sprite images / JSON

Added to the description.

@benstadin
Copy link

We are currently also looking to define a scheme for our own rendering engine (native 3D map, but pretty much different internals and data compared to Mapbox GL). I wonder if it isn't time to move towards GeoPackage [1], thereby allowing a wider range of features and integration into existing GIS tools. The spec [2] allows customization of the scheme. Even though the spec mentions there is "no applicable existing consensus" [3] about tiling, my gut feeling is that it is worth investigating to have a Mapbox "branded" GeoPackage data container (and a "Heidelberg Mobil" one for our data), simply due to the growing adoption rate and integration possibilities.

For scheme changes, maybe SQLite's new diff tool [4] is enough to apply data and scheme changes to the Geopackage SQLite. If there are any works to be done to migrate between versions, a simple post-scheme task could be implemented rather easy following the db changes.

[1] http://www.geopackage.org
[2] http://www.geopackage.org/spec/
[3] http://www.geopackage.org/spec/#_tile_matrix_introduction
[4] https://www.sqlite.org/sqldiff.html

@benstadin
Copy link

About custom data: I think it is well defined for features in GeoPackage via gpkg_data_columns, but not defined for associating data with tiles. But maybe this can be achieved by using the data description from gpkg_data_columns (MIME type etc.), adding a BLOB column "associated_data" to the tile table. This could contain a protocol buffer created from the description of gpkg_data_columns and data from the origin table / column defined. Some optimization for duplicates, and maybe this could be a very fast but generic format.

@jfirebaugh
Copy link
Contributor Author

GeoPackage could be a possible approach. That would provide us a lot of the necessary specification groundwork.

We'd primarily use the Tiles option of GeoPackage combined with Mapbox extensions. Off the top of my head, Mapbox would define GeoPackage extensions for the following:

  • Storing vector tile data in Tile Pyramid User Data Tables. This would be an extension very similar to the WebP extension.
  • Extensions defining tables storing style, source, sprite, and glyph content.
  • Extensions defining tables for the storage of "Mapbox packages" consisting of all data needed to render a particular region of interest, route, flyTo path, etc.: a style, its associated sources, sprite, glyphs, and tiles. This extension would also specify cross-tables needed to deduplicate overlap between multiple "Mapbox packages". It's possible this extension could come later and the "v1" implementation of offline in mbgl could enforce a 1-1 relationship between a GeoPackage file and a "Mapbox package". But I'm confident that we'll eventually want a one-to-many relationship, with overlapping data between multiple "Mapbox packages" deduplicated in a single GeoPackage database.

We'd likely ignore the following portions of the GeoPackage specification:

  • Features. Our initial offline implementation would store all feature data as vector tiles. GeoJSON source data would be stored pre-tiled. Perhaps eventually we'd investigate the use of GeoPackage Feature storage for GeoJSON source data, or as a read-only option for providing data to render. But for now we'd ignore this portion of the specification.
  • Metadata extensions. We'd instead develop our own extensions as sketched above.
  • Support for SRSs other than WGS-84.

@benstadin
Copy link

Very close to what I have in mind for our purpose. We have need for a different tile scheme also (or cube, tbd) that we'd need a custom extension for.

Some comments/question:

  • I couldn't find out how to describe a "custom" GeoPackage in a standard way, so defining a "Geopackage with HDM extensions v1". So that other tools could know what this is, and interpret non-geopackage extensions if want to. But I didn't read the whole spec in detail yet.
  • If you define the srs_id to be epsg:4326 / wgs84 for your data set it should be all that is required
  • if there was a way (external tool) to translate MapBox tiles within the gpkg file to and from feature tables within it, together with dynamic created columns in the feature tables and proper gpkg_data_columns description, this would allow for maximum integration possibilities for existing tools supporting gpkg.

@jfirebaugh
Copy link
Contributor Author

Based on internal discussions, we've ruled out using GeoPackage. We're going to use a MBTiles-influenced schema, with extensions for support of multiple offline regions, multiple tilesets, and HTTP revalidation. However for the initial release of offline, the exact schema will remain a private implementation detail. In future releases we'll look to publicly spec the format and add an import API for external offline databases.

@georgbachmann
Copy link

Hey @jfirebaugh . I am currently trying to migrate an existing offline raster-mbtiles database to mapbox. I don't just want my users having to re-download all their offline tiles, but rather keep using them until they decide it's time to go full vector.
So this is why your last sentence here make me hope :)

add an import API for external offline databases

Does this exist on iOS and Android? Am I somehow able to migrate my raster tiles into the caches.db?
Also... If I might ask here... I first thought I just manually have to rebuild an the caches.db myself. So I was looking at it but discovered that the data stored in tiles.data is not just plain jpg/png?!? (I tried with the mapbox satellite map, which should?!? be raster only?) How would I have to encode my images for it to work?

@friedbunny
Copy link
Contributor

Hi @georgbachmann — thanks for using Mapbox. This repo is for reporting bugs, requesting features, and coordinating work related to the various Mapbox Map SDKs. For help with “how do I” questions, please ask other users on StackOverflow or reach out to our excellent support team.

@mapbox mapbox locked as resolved and limited conversation to collaborators Apr 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants