-
Notifications
You must be signed in to change notification settings - Fork 18
Managing and visualizing movement data with PostGIS and R
Recent technological progress allowed ecologists to obtain a huge amount and diversity of animal movement data sets of increasing spatial and temporal resolution and size, together with complex associated information related to the environmental context, such as habitat types based on remote sensing, population density, weather. Most advanced movement data management now relies on the use of an integrated database system based on PostGIS, an extension of the open-source database management system PostgreSQL that adds support for spatial data.
Storing spatial objects in a PostGIS-enabled database is particularly useful for movement data (usually from wildlife collars/sensors), which are often very large, regularly updated, and require cleaning and manipulation prior to being used in research. On the other end of the process, the advancement of a movement ecology theoretical framework led to an unprecedented development of new analytical tools and methods, mostly available in the R statistical environment.
This project focuses on streamlining the workflow for biologists storing/processing movement data in PostGIS and analyzing it in R, and aims at providing the tools to transparently benefit from the power of the most advanced database and statistical systems available for movement data.
Four other packages are worth mentioning here:
-
rgdal:
rgdal
provides bindings to the Geospatial Data Abstraction Library (GDAL), which allows R to import and export spatial data in the form of points, lines, polygons or rasters. Whilergdal
(and underlying GDAL) can really be considered as the Swiss Army knife of handling GIS data, its scope is very general and focuses on standard GIS spatial classes. Using it in a specialized context, with specific data such as movements, is fairly cumbersome and tedious, if not simply impossible. Note thatrgdal
provides limited communication capability to PostGIS. -
RPostgreSQL:
RPostgreSQL
is a database interface and PostgreSQL driver for R. In other words, RPostgreSQL allows for bidirectional communication between R and PostgreSQL, but does not provide PostGIS features to handle spatial data. -
rpostgis:
rpostgis
provides additional functions to RPostgreSQL to enable importing and exporting spatial data between PostGIS and R. The aim of this package is however generic, and it does not features to handle movement data. -
adehabitatLT:
adehabitatLT
is a collection of tools for the analysis of animal movements. In particular, it builds on a dedicated class for animal movement data (ltraj
), which abstract movement to a set of trajectories and its geometrical descriptors. However, adehabitatLT does not provide automated tools to import trajectories.
The core objective of this project is to create a new R package
(rpostgisLT
) which will streamline location dataset processing into
trajectories, including full integration with the R package
adehabitatlt
data type ltraj
. The main end product is thus the
publication of rpostgisLT
on CRAN, which involves the development of
functions, documentation and examples necessary for a full-fledged R
package.
The basic workflow will involve:
- Definition of a new Postgresql data structure (schema/views/tables)
to store a new PostGIS data type
pgltraj
(being the PogtGIS database version of theltraj
); - Creation of a visualization tool for pgltraj’s using
Shiny/leaflet/etc. to interactively process in-database
pgltraj
’s; - Writing functions for seamless transitioning between in-database
pgltraj
and in-Rltraj
(and vice versa), which will allowpgltraj
objects to access the full functionality of theadehabitat
suite, andltraj
objects to be consistently stored in PostGIS.
This package (rpostgisLT
) will thus require R functions that do a
one-time “installation” on a PostgreSQL database in R, setting up the
new PostgreSQL data structure for storing pgltraj
and their
ancilliary information (either from in-database or from an existing
ltraj
). This will involve significant SQL and PL/pgSQL programming in
addition to R, and can take advantage of the PostGIS geography
data
type as the standard for pgltraj.
Incidently, a first step of this project will involve consolidating
the existing rpostgis
package, by extending and improving functions
to import spatial dataset from PostGIS in R (as sp
objects, these
functions already exist but will be extended), and export sp
objects
back to the database. The package rpostgis
will also be published
on CRAN.
The R community will benefit from this project in two ways:
- The
rpostgisLT
package will provide a unique opportunity to unleash the combined power of PostGIS and R in the study of animal movement, one of the most dynamic field in ecology. - The additional development of
rpostgis
will also provide generic tools to allow bidirectional communication between R and PostGIS for all kinds of spatial data (points, lines, polygons and rasters), hence with a much broader focus.
The student will be mentored by three experts:
- Mathieu Basille (basille@ufl.edu) is an Assistant Professor at the University of Florida, with his main program dealing with animal movement and distribution. As a quantitative ecologist, he is bringing an extensive knowledge of R, in particular in the context of movement ecology.
- David Bucklin (dbucklin@ufl.edu) is a geographer specialized in spatial technologies (GIS, remote sensing, and GPS), with an emphasis on their application in conservation biology and ecology; he is an expert in spatial databases and spatial data management, and develops tools and techniques for geo-processing workflows.
- Clement Calenge (clement.calenge@oncfs.gouv.fr)
is a biometrician at the intersection of three
scientific domains: biology, statistics and computer science. He
developed the
adehabitat
suite to provide adequate mathematical models and statistical methods to analyse biological data structures, such as animal location or movement data.
Note that all three mentors are generally well versed in all aspects involved in the project (PostGIS, R, movement data).
We are looking for a motivated student that shows fluency in SQL and strong familiarity in R. The emphasis, in terms of advanced skills, is put on SQL, and particularly indexes (for spatial and temporal data). Familiarity in R is also required, but necessary skills for the completion of the project (such as building a R package) can be learned during the project.
We propose the following test:
- From R, take a
SpatialPointsDataFrame
with a time column, and export it to PostGIS; - Build the necessary spatial and temporal indexes;
- Write a SQL function that select the points within a spatio-temporal window (i.e. given X and Y boundaries and time limits).
Here is a starting point that can be used as example:
library("spacetime") library("sp") data(fires) fires$X <- fires$X * 100000 fires$Y <- fires$Y * 100000 fires$Time <- as.POSIXct(as.Date("1960-01-01")+(fires$Time-1)) coordinates(fires) <- c("X", "Y") proj4string(fires) <- CRS("+init=epsg:2229 +ellps=GRS80") plot(fires, pch = 3)
Then here is the outcome for points within X = 6400000 and 6500000, Y = 1950000 and 2050000, and during the 90s:
(subfires <- subset(fires, coordinates(fires)[, 1] >= 6400000 & coordinates(fires)[, 1] <= 6500000 & coordinates(fires)[, 2] >= 1950000 & coordinates(fires)[, 2] <= 2050000 & fires$Time >= as.POSIXct("1990-01-01") & fires$Time < as.POSIXct("2000-01-01"))) rect(6400000, 1950000, 6500000, 2050000, border = "red", lwd = 2) points(subfires, col = "red")
The SQL function should be able to retrieve the points in red making good use of indexes.
Students, please post a link to your test results here.
https://github.com/balazsdukai/rpostgisLT_test/blob/master/test.R
https://github.com/TrigonaMinima/rpostgisLT-tests
https://github.com/nistara/gsoc2016_movement/blob/master/gsoc_nistara.R
https://github.com/liyujiao1026/gsoc2016/blob/master/Yujiao_gsoc.R