Krawler based jobs to scrape various data related to administrative entities.
This job relies on:
- osmium to extract administrative boundaries at different level from OSM pbf files,
- ogr2ogr to generate sequential GeoJson files to handle large datasets,
- mapshaper to simplify complex geometries,
- tippecanoe to generate MBTiles,
- turfjs to compute the position of toponyms.
Important
osmium, ogr, mapshaper and tippecanoe command-line tools must be installed on your system.
To setup the regions to process, you must export the environment variables REGIONS
with the GeoFabrik regions. For instance:
export REGIONS="europe/france;europe/albania"
If you'd like to simplify geometries you can setup the simplification tolerance and algorithm:
export SIMPLIFICATION_TOLERANCE=500 # defaults to 128
export SIMPLIFICATION_ALGORITHM=visvalingam # defaults to 'db'
Note
The given simplification tolerance will be scaled according to administrative level using this formula:
tolerance at level N = tolerance / 2^(N-2)
For testing purpose you can also limit the processed administrative levels using the MIN_LEVEL/MAX_LEVEL
environment variables.
To generate the whole planet use continent extracts like this first to launch the osm-boundaries
job from level 3 to 8:
export REGIONS="africa;asia;australia-oceania;central-america;europe;north-america;south-america"
As large files are generated for e.g. Europe you might have to increase the default NodeJS memory limit:
export NODE_OPTIONS=--max-old-space-size=8192
Then, launch the osm-planet-boundaries
job for level 2, which uses a planet extract, and planet MBTiles generation. Indeed, country-level (i.e. administrative level 2) requires a whole planet file to avoid missing relation between continental and islands areas.
Last but not least, launch the generate-osm-boundaries-mbtiles.sh
script to generate a MBTils file from GeoJson files produced by the job.
To avoid generating data multiple times you can easily dump/restore it from/to MongoDB databases:
mongodump --host=localhost --port=27017 --username=user --password=password --db=atlas --collection=osm-boundaries --gzip --out dump
mongorestore --db=atlas --gzip --host=mongodb.example.net --port=27018 --username=user --password=password dump/atlas
This job relies on archive shape files from IGN and the mapshaper and 7z tools.
https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#admin-express
This job relies on archive shape files from IGN and the mapshaper and 7z tools.
https://geoservices.ign.fr/documentation/diffusion/telechargement-donnees-libres.html#bdpr
To debug you can run this command from a local krawler install node --inspect . ../k-atlas/jobfile-bdpr.js
.
To run it on the infrastructure we use Docker images based on the provided Docker files, if you'd like to test it manually you can clone the repo then do:
docker build --build-arg KRAWLER_TAG=latest -f dockerfile.bdpr -t k-atlas/bdpr-latest .
docker run --name bdpr --network=host --rm -e S3_ACCESS_KEY -e S3_SECRET_ACCESS_KEY -e S3_ENDPOINT -e S3_BUCKET -e "DEBUG=krawler*" k-atlas:bdpr-latest