Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make osrm-backend work with GDAL 2 #1738

Closed
daniel-j-h opened this issue Oct 17, 2015 · 12 comments
Closed

Make osrm-backend work with GDAL 2 #1738

daniel-j-h opened this issue Oct 17, 2015 · 12 comments

Comments

@daniel-j-h
Copy link
Member

@joto just mentioned to me that Debian wants to update their GDAL package to version 2. It seems like this breaks osrm-backend.

From a quick glance at the code, we only need GDAL for the tools (only for the components actually?).

Investigate.

@TheMarex
Copy link
Member

@daniel-j-h Yes only used for the osrm-components tool. Actually I would like to remove this dependency. Can we maybe switch to GeoJSON or something trivial like that? We don't really need shapefiles.

@wilhelmberg
Copy link
Contributor

Can we maybe switch to GeoJSON or something trivial like that?

My 0.02$ to just let you know what's generally moving on the shapefile vs. GeoJSON front.
Still investigating if GeoJSON is a feasible format, as it can grow really big really fast and is a nightmare to parse.
To be processed it has to be read into memory as a whole and sequentially parsed, although @artemp is working on making this problem go away with custom spatial indices.

@freenerd provided a 1GB shapefile, that I think came out of the osrm-components tool.
Converting this results in a 1.7GB to 2.3GB GeoJSON (depending on formatting: line breaks, indentation, decimal places).
A 2GB GeoJSON is almost always too big to be opened by conventional means (GIS) except there is a lot of memory to burn and even if it can be opened, display might be exceptionally slow.

@daniel-j-h
Copy link
Member Author

@BergWerkGIS what about osrm-backend outputting geojson, which we then convert to shapefile before giving it out? I understand the problems in large geojson files, but as an intermediary format to convert from, it should suffice I guess. Would this be feasible?

@daniel-j-h
Copy link
Member Author

More updates in this regard:

The old demo server uses the components tool to produce a shapefile and pushes that to a Geofabrik server. We then query this Geofabrik server e.g. here: http://project-osrm.org/osrm-frontend-v2/ (components layer, bottom left).

That is, the components layer is still the outdated one from months ago, because the demo server does not do any updates any more!

/cc @freenerd @danpat re. the new demo server: is it possible that we generate the components (shapefile, geojson, ...) ourselves? I talked to Fred from Geofabrik, and he is open to integrate this then e.g. at their inspector that that a lot of osm people are using at http://tools.geofabrik.de/osmi/.

@daniel-j-h
Copy link
Member Author

Even more updates:

The Debian GIS guys seem to package an OSRM 4.7 version (which e.g. depends on lbpng..) and seem to have all the outdated cpack information in it.

I think we should get in touch with them. Maybe opening a separate ticket for this.

@freenerd
Copy link
Member

@daniel-j-h I broke out to two new tickets, let's look at GDAL here only. I think it would make sense to export as Geojson, since it would remove the dependency to GDAL. I have not tried it but assume GDAL's ogr2ogr would allow users to transform the Geojson to Shapefiles if needed.

@TheMarex could we then also remove the -DBUILD_TOOLS=1 flag?

@wilhelmberg
Copy link
Contributor

I have not tried it but assume GDAL's ogr2ogr would allow users to transform the Geojson to Shapefiles if needed.

Just tried ogr2gr (it's still running) with the 1.7GB GeoJSON, that I've created from the small-components.shp:
The process filled the free 14GB RAM and the 35GB swap immediately and is now idling at ~9% CPU with +57GB commited memory.

Don't know how this would work on Linux, but on Windows I don't think that's an option for the average user.

image
image

@TheMarex
Copy link
Member

@BergWerkGIS yeah I don't think ogr2gr implements the streaming "extension" of GeoJSON that is used in tippecanoe (no outer feature collection, but just an array). Not sure how hard it is to contribute to GDAL but we should consider this anyway. (also this is kind of easy to implement, just check if the file starts with [ instead of {) @ericfischer did you pursue this route already and have some insight?

I agree with you that GeoJSON (like JSON or plain text formats in general) is a terrible format for bigger datasets, but it is a very convenient intermediate because it is much easier serialization target than shapefiles. We could also do geobuf but I guess that is still somewhat in flux?

@joto
Copy link
Contributor

joto commented Oct 19, 2015

Maybe it is better to just leave this as it is? The whole point of GDAL/OGR is that it abstracts those decisions away and allows you to easily write any file format you like. And because this isn't really needed for core OSRM functionality but just for the "small components", it is not a show-stopper for any kind of release or so. I doubt many "normal OSRM users" will need this tool. I suggest we leave it as it is for now, upgrade this to GDAL 2.0 whenever we need it for our tool chain and remove the tool from the Debian package.

@e-n-f
Copy link

e-n-f commented Oct 19, 2015

@TheMarex When I am parsing JSON in tippecanoe I just look for objects with type=Feature without regard to what kind of container they are inside. My streaming JSON parser continues parsing when it reaches the end of a top-level object, so concatenated JSON objects work as a side effect of that.

GeoJSON is much slower to parse than I would like, so it would be great to have another format that is easy for scripts to generate but that isn't so big and slow to work with.

@danpat
Copy link
Member

danpat commented Oct 19, 2015

GeoPBF anyone? Although there is already WKB in this space, which it would also be fairly easy to output.

@daniel-j-h
Copy link
Member Author

#3570 rewrote the components tool to output GeoJSON. For the planet we currently generate a ~2 GB GeoJSON file which compresses down to ~100 MB. Closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants