Make shapefile processing multi-threaded #614
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently shapefile reading is entirely single-threaded. This PR shunts the geometry processing into separate threads.
The Lua
attribute_function
doesn't do much, only basic tag remapping. The expensive bit is the geometry assembly, which includes our old friendsgeom::intersection
andmake_valid
. So we can get away with putting a single mutex around the Lua access rather than creating a Lua state for each thread.Total processing time is reduced by 6m30 for North America. (The North America .pbf crosses the 180° meridian so it ends up pulling in shapefiles for -180° to 180°.)
This also moves some geometry error output to verbose mode.
The shapefile reader code is very old and needs tidying up and moving into a
ShpReader
class, but this will do for now. Ideally we should refactor theStoreShapefileGeometry
calls out of processShapeGeometry and put them in the calling lambda, which would make more sense conceptually and mean we didn't have to pass so many parameters around. But that will require a little bit of refactoring work on multilinestrings.