Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache calls #159

Open
mpadge opened this issue Nov 26, 2018 · 0 comments
Open

Cache calls #159

mpadge opened this issue Nov 26, 2018 · 0 comments

Comments

@mpadge
Copy link
Member

mpadge commented Nov 26, 2018

Instead of repeatedly bombing the overpass API, it'd be pretty easy to implement a local caching system that would record the call and store the pre-processed data returned from the API. Subsequent calls would then just re-load the local data and deliver anew.

The R.cache package has a hard-coded default that only allows enduring storage in "~/.Rcache/", used in the .onLoad call. This package sticks a few things in options(), but does not use any environmental variables.

A bit more flexibility could be added here via environmental variables, by defaulting to ~/.Rosmdata (or maybe piggyback on ~/.Rcache if it exists?), but allowing override if Sys.getenv("OSMDATA_CACHE_DIR") exists.

cache duration

Because OSM is constantly updated, it will be important to allow control over cache duration, so that local versions will be automatically updated at some stage. While this could also be handled via an environmental variable, "OSMDATA_CACHE_DURATION", that would need to be explicitly set by a user to work, so would impose additional burdens.

A less burdensome option would be an equivalent function parameter, which would best be placed in overpass_query(), because it's the overpass calls themselves that will actually be cached. Problem there is that that function is not exported. The general workflow is

opq() %>%
    add_osm_feature() %>%
    osmdata_sf/sp/sc/xml/pbf()

A cache_duration parameter could potentially be set in the initial opq() call, but that does not contain the full overpass query, and so this parameter would then need to be passed on to any and all subsequent functions. That suggests that the end-point calls are the best place for such a parameter. These currently only have 2 primary parameters (q, doc), so wouldn't suffer from an additional one there. If that is the point at which caching is determined, then it will likely be better to cache the full processed result, rather than just the direct result of the API call. The call itself could be digest-ed, while the cached object would be the final processed end-point. The timestamp could simply be read (file.info()$mtime), and the cache updated if difftime(...) > cache_duration, otherwise just re-load cached version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant