A dedicated Slack channel has been created for announcements, support and to help build a community of practice around this open source package. You may request an invitation to join from jonathan.callahan@dri.com.
Utility functions for working with environmental time series data from known
locations. The compact data model is structured as a list with two dataframes. A
'meta' dataframe contains spatial and measuring device metadata associated with
deployments at known locations. A 'data' dataframe contains a 'datetime' column
followed by columns of measurements associated with each "device-deployment".
This package supports data management activities associated with environmental time series collected at fixed locations in space. The motivating fields include both air and water quality monitoring where fixed sensors report at regular time intervals.
The most compact format for time series data collected at fixed locations is a
list including two tables. MazamaTimeSeries stores time series measurements in a
data
table where each row is a synoptic record containing all measurements
associated with a particular UTC time stamp and each
column contains data measured by a single sensor (aka "device"). Any time invariant
metadata associated with a sensor at a known location (aka a "device-deployment")
is stored in a separate meta
table. A unique deviceDeploymentID
connects the two tables.
In the language of relational databases, this "normalizes" the database and can
greatly reduce the disk space and memory needed to store and work with the data.
Time series data from a single environmental sensor typically consists of
multiple parameters measured at successive times. This data is stored in an R
list containing two dataframes. The package refers to this structure as an sts
object
for SingleTimeSeries:
sts$meta
-- 1 row = unique device-deployment; cols = device/location metadata
sts$data
-- rows = UTC times; cols = measured parameters (plus an additional datetime
column)
sts
objects can support the following types of time series data:
- stationary device-deployments only (no "mobile" sensors)
- single sensor only
- regular or irregular time axes
- multiple parameters
Raw, "engineering data" containing uncalibrated measurements, instrument voltages and QC flags may be stored in this format. This format is also appropriate for processed and QC'ed data whenever multiple parameters are measured by a single device.
Note: The sts
object time axis specified in data$datetime
reflects device
measurement times and is not required to have uniform spacing. (It may be
regular but it need not be.)
Working with timeseries data from multiple sensors at once is often challenging
because of the amount of memory required to store all the data from each
sensor. However, a common situation is to have time series that share a
common time axis -- e.g. hourly measurements. In this case, it is possible to
create single-parameter data
dataframes that contain all data for all
sensors for a single parameter of interest. In air quality applications, common
parameters of interest include PM2.5 and Ozone.
Multi-sensor, single-parameter time series data is
stored in an R list with two dataframes. The package refers to this structure as
an mts
object for MultipleTimeSeries:
mts$meta
-- N rows = unique device-deployments; cols = device/location metadata
mts$data
-- rows = UTC times; N cols = device-deployments (plus an additional datetime
column)
A key feature of mts
objects is the use of the deviceDeploymentID
as a
"foreign key" that allows sensor data
columns to be mapped onto the associated
spatial and sensor metadata in a meta
row. The following will always be true:
identical(names(mts$data), c('datetime', mts$meta$deviceDeploymentID))
mts
objects can support the following types of time series data:
- stationary device-deployments only (no "mobile" sensors)
- multiple sensors
- regular (shared) hourly time axes only
- single parameter only
Each column of mts$data
represents a timeseries associated with a particular
device-deployment while each row represents a synoptic snap shot of all
measurements made at a particular time.
In this manner, software can create both timeseries plots and maps from a single
mts
object in memory.
Note: The mts
object time axis specified in data$datetime
is
guaranteed to be a regular hourly axis with no gaps.
This R package was created with funding from the USFS AirFire Research Team.