-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Normalized File Format a supported feature #825
Comments
I like the idea, but reading/writing arrays of objects one-by-one has a good performance penalty over writing all in one go. |
One of the benefits of this approach is that the API can be decoupled from the underlying implementation. The API just needs a reasonably stable interface that might include some convenience functions for the caller but the underlying implementation is hidden from the caller. For example postgresql does this when you fetch query results, you have a simple iterator to access results row by row or all at one go, but you have no idea how the database may be caching results or processing the query. Anyway, regardless, anything that hides some of the current implementation will make this interface more stable from a third party point of view, which will increase its usage and make OSRM more widely accessible. I'm willing to help with testing and documentation. And will release FOSS code for a postgresql to osrm tools based on it. |
I see the point. I'll keep this feature request in mind. |
Thanks. Let me throw out another alternative approach to supporting this. Define a text file format, it could roughly follow that of the the binary format and then create a reader for the text files. Maybe something like:
I would ignore lines matching regex /^\s_#|^\s_$/. While text files are not space efficient for large graphs, it does provide a human readable format that could be b2zip'd to reduce its size. Another benefit of this approach is that one would not need to link to the OSRM libraries and the files could be generated using Perl, Python, PHP, etc. Regardless, it might be nice to be able to dump an osrm file set as text for debugging purposes. I specifically added the section headers and 'END' to provide a means to verify that the reader is reading the correct section and to assist in validating the data on input to make it a little more user friendly when it encounters an issues with the data. |
ASCII/text based formats are even worse when it comes to performance. But it appears that an easy to use library interface would suffice, right? |
Yes, An easy to use interface would be fine. |
OK, cool. Officially on the todo list, now |
@DennisOSRM I just started looking at updating my tools to the current master code and it looks like another reverse engineering task to sync up the code. Looks like there have been a lot of changes in Extractor/ExtractionContainers.cpp since my last go at this. #984 is also dealing with this issue. I'm wondering is this is anywhere close to the top of your todo list? |
It has been brought to my attention the my orsm-tools pgr2orsm does not convey the polyline information about an edge. Most GIS programs represent an edge as a polyline and the graph only needs the start and end node ids and the edge id that represents that edge. Then using the edge id you can reference the polyline. This seems to be handled differently in OSM but I'm not sure I understand all the implications of this. It seems that GIS edges need to be passed to Project-OSRM as collection of noded straight line segments. For example if I have an edge with coordinates A-B-C, then I need to pass A-B and B-C regardless that the graph only needs A-C because there is no way to pass the additional shape vertices in the polyline. Q: Is this correct? Assuming it is correct. Then in the API it would be good to have the ability to pass a polyline as an edge. Q: Does OSRM do anything to simplify the graph by eliminating node with only two edges connecting to it? or does this happen as natural artifact of the contraction process? Q: would separating out the edge shape information into a separate table so the graph only contained simplified graph structure reduce memory requirements and speed up processing? |
A way in OSM can have many nodes; it's a polyline with straight lines between nodes. I believe OSRM splits ways into individual pieces at every node. It also applies turn penalties at every node, as can be seen from the following example: Scenario: Winding roads
Given the profile "turnbot"
And the node map
| a | b | c | d |
| | | | |
| e | f | | |
| | g | h | |
And the ways
| nodes |
| abcd |
| efgh |
When I route I should get
| from | to | route | distance | time |
| a | d | abcd | 300m +-1 | 30s +-1 |
| e | h | efgh | 300m +-1 | 50s +-1 | The above scenario passes. The ways abcd and efgh have equal length, but efgh takes longer to pass because it contains two bends. |
@DennisOSRM FYI, the pgr2osrm is does not work with the OSRM v0.4 because the file formats have changed again and I'm not going to have time to reverse engineer and debug the new code for them on every release. It would be really great of we could come of with a simple table-like api for loading data, like a table of point, a table of edges, a table of restrictions. Most shapefile data would have an edge as a polyline with only a node_id for each end of the polyline, ie each edge is properly noded but might contain multiple shape points. So if you can provide a stable api, I will write the the loading code shapefiles and postgresql database, or look into get GDAL/OGR to support the API. |
As mentioned in the other issue: The way to go is to use a structured but efficient serialization protocol like protobuf. It's the same framework that is used to encode the pbf files for OpenStreetMap data. Protobuf has binding for all major programming languages and is pretty easy to use/read/understand. Let's put this on the roadmap for the 0.4.x release series. |
For reference: This is a C and Java implementation that supposedly reads and writes OSM pbf file |
Protobuf is the general framework while the osm binary format is an application of it. We plan to use the framework with our own format spec. |
Ok, that makes sense. I'll look forward to that and wait for you to publish more on it. Thank you. |
Is there any update on this? |
No progress on this front. In a different project I implemented a zero-copy streaming Protobuf powered I/O layer for serializing and deserializing huge graph datastructures (think north america), with an optional transparent compression and uncompression stream in between. The implementation is not too hard, and had a lot of advantages: for example you could pre-process graphs on a larger machine (think aws ec2) and then move the portable serialized artifact to your laptop. Or write tools against the formalized Protobuf schema, even from different languages. The huge disadvantage and the reason we do not yet have this for osrm-backend is the following: the overhead from serializing and deserializing, that is parsing and constructing the Protobuf messages. Byte-wise dumping and reading datastructures --- as bad as it is (and we are aware of this #1682 #1685) --- is orders of magnitude faster than Protobufs. This is something where Cap'n Proto or Flatbuffers might shine, but we have not yet explored these options. If you consider this in the context of our recent Traffic effort where we strive to cut off every seconds possible from subsequent pre-processing steps, it's clear that the current byte-wise dumping is the baseline a portable format has to fight against. |
Superseded by #2242. @danpat @ghoshkaj and @duizendnegen specced this out and are currently working on it. The first sweep over the code base abstracted all the manual read and writes behind types which can be re-used. See #2242 for more details. |
Currently
osrm-extract
extracts data from OSM into a Normalized File Format, that is documented on the Wiki. A few people, including myself, have made attempts to use these files to import non-OSM data into OSRM.The problem currently is that the documentation is not being updated when the code changes. This requires users of these file to reverse engineer them after every release. In addition to that the UUIDs that are written in the file headers are reflect the state of the OSRM library files that read/write these files and have changed in size from the last version to the current version. Unfortunately these files are not packaged into an API for others can use them easily in their code that is intended to work with OSRM.
In an ideal world, it would make sense to minimize the effort of supporting this valuable feature of OSRM by creating a small C++ class and library for reading and writing these files that would only require documenting the Class. Then other parties can link to that library and OSRM would use the same. A high level API for writing the *.osrm, *.osrm.names and *.osrm.restrictions files might be packaged something like libOSRMio.h and libOSRMio.so or include it in the existing libOSRM.so or whatever makes sense.
A simple Class might looks something like:
For very large graphs we might want to have an interface that is more iterative in nature, like:
etc for edges and restrictions. In my use case I would not use an array but rather records coming out of a database that need to get written.
The idea being to hide the details of file headers, internal file formats, etc. I'm assuming that osrm-extract and osrm-prepare would/could be able to use this same API so everything stays in sync.
The text was updated successfully, but these errors were encountered: