Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration to limit the number of layers loaded on start #122

Open
CharlesG-Branch opened this issue Apr 8, 2020 · 11 comments
Open

Configuration to limit the number of layers loaded on start #122

CharlesG-Branch opened this issue Apr 8, 2020 · 11 comments

Comments

@CharlesG-Branch
Copy link

CharlesG-Branch commented Apr 8, 2020

Use-cases

Starting up the pip-service can take an extremely long amount of time even when limiting what you want via importPlace. Sometimes certain layers aren't needed & it would be nice to disable the loading of them to improve that startup time. From what I've seen locality and localadmin in particular take much longer then the other layers.

Proposal

Pass in the layers you want loaded into here https://github.com/pelias/pip-service/blob/master/app.js#L95

As far as I can tell, this feature is already supported in wof-admin-lookup:

I'm happy to implement this myself file a PR, I just need to know what the procedure for updating the config is since it's shared across all projects for pelias + since this project currently doesn't load the config.

@CharlesG-Branch CharlesG-Branch changed the title Allow configuration to limit the number of layers Configuration to limit the number of layers loaded on start Apr 9, 2020
@missinglink
Copy link
Member

missinglink commented Apr 9, 2020

Hi @CharlesG-Branch, we have been working on a new system which starts up instantly, would you be interested in BETA testing that?

If you were to remove locality from the list of layers this would have a negative effect on quality since address data would no longer be associated with a locality. Could you please explain more about your specific use-case that this wouldn't matter?

@CharlesG-Branch
Copy link
Author

@missinglink I'd be happy to beta test it & I'm curious how that was accomplished (is there a runtime perf hit?)

In my case I'm only deploying this service without the rest of the pelias stack as I only need the reverse geocoding component. And only the layers for counties and larger are important for my case & so I figured that locality would be safe to not load then as it's lower in the hierarchy https://github.com/whosonfirst/whosonfirst-placetypes — will not loading it impact the accuracy of things higher in the hierarchy?

@missinglink
Copy link
Member

Well then you're going to love this...

curl -s https://data.geocode.earth/wof/dist/spatial/whosonfirst-data-admin-us-latest.spatial.db.bz2 | lbunzip2 > whosonfirst-data-admin-us-latest.spatial.db
docker run --rm -it -v "${PWD}:/data" -p 3000:3000 pelias/spatial server --db=/data/whosonfirst-data-admin-us-latest.spatial.db

There is a demo on port 3000

@missinglink
Copy link
Member

Try out these paths locally:

GET /explore/pip#14/37.785240/-122.424624
GET /query/pip?lon=-122.42457937449218&lat=37.78471707419765&role=boundary
GET /query/pip/verbose?lon=-122.42457937449218&lat=37.78471707419765&role=boundary
GET /query/pip/_view/pelias/-122.42457937449218/37.78471707419765

With the last of these being a 'reverse compatible' endpoint with this repo, although that's where the BETA comes in.
I would appreciate your feedback.

@missinglink
Copy link
Member

The magic here is that the data is loaded in mmap mode so the linux filesystem cache provides an in-memory LRU cache for the 'hot pages', you don't need to configure anything but the more memory you have the moar faster it is, I can explain more if you find it useful.

@CharlesG-Branch
Copy link
Author

Wow, the startup time & demo page are incredible. Exposing the localization information is also extremely helpful.

I did get some exceptions for the last two links:

2020-04-09T19:44:59.758Z - info: [geometry] ::ffff:172.17.0.1 - GET /query/pip/_view/pelias/-122.42457937449218/37.78471707419765 HTTP/1.1 500 1018 - 17.145 ms
TypeError: Cannot read property 'split' of null
    at rows.forEach.row (/code/server/routes/pip_verbose.js:29:33)
    at Array.forEach (<anonymous>)
    at Object.module.exports (/code/server/routes/pip_verbose.js:28:8)
    at module.exports (/code/server/routes/pip_pelias.js:14:33)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at next (/code/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/code/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at /code/node_modules/express/lib/router/index.js:281:22
    at param (/code/node_modules/express/lib/router/index.js:354:14)
2020-04-09T19:45:11.456Z - info: [geometry] ::ffff:172.17.0.1 - GET /query/pip/verbose?lon=-122.42457937449218&lat=37.78471707419765&role=boundary HTTP/1.1 500 1033 - 23.600 ms
TypeError: Cannot read property 'split' of null
    at rows.forEach.row (/code/server/routes/pip_verbose.js:29:33)
    at Array.forEach (<anonymous>)
    at module.exports (/code/server/routes/pip_verbose.js:28:8)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at next (/code/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/code/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at /code/node_modules/express/lib/router/index.js:281:22
    at Function.process_params (/code/node_modules/express/lib/router/index.js:335:12)
    at next (/code/node_modules/express/lib/router/index.js:275:10)

I'll play around with it loading + using the full wof dataset later today. Limiting the placeids loaded (currently done with imports.whosonfirst.importPlace) might still be useful since it'll prevent unneeded places from filling up the cache — tho the cost from failing to find may just be more. I'll have to check.

@missinglink
Copy link
Member

Looks like a bug thanks, easily fixed.
I'm opening up pelias/spatial#47 for further feedback, please add any more beta testing notes over there so I can track them in one place.

@missinglink
Copy link
Member

More download options from our site https://geocode.earth/data

@missinglink
Copy link
Member

If y'all would like commercial support we'd be happy to supply other data such as OSM and US CENSUS data for your business as seen in our demo https://spatial.demo.geocode.earth/explore/pip

@missinglink
Copy link
Member

bug resolved in pelias/spatial#48

@bradjones1
Copy link

FWIW I came to this issue after having serious performance issues starting pip-service in development (using the shipped Docker image.) Spatial does the trick and starts immediately. I generally find the documentation on what datasets are applicable to which products and why, and the proper way to import really confusing. BUT, with the examples in issues and by reading the code I was able to make it work. Thanks for making these projects open-source. I would recommend anyone needing PIP to go straight to Spatial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants