Skip to content

Latest commit

 

History

History
110 lines (70 loc) · 5.98 KB

README.md

File metadata and controls

110 lines (70 loc) · 5.98 KB

InvTraitR

R-CMD-check

Pipeline to assign body length-dry biomass allometry equations and other functional traits to a taxonomic name based on taxonomic hierarchy in freshwater invertebrates. However, the name-matching pipeline can technically be used for any taxonomic group.

Installation

The package is WIP. It will, hopefully, be available on CRAN soon.
You can try to install it from GitHub, but as it's a WIP, it may or may not work depending on the tides, temperature, the color of your socks and what you had for dinner the day before yesterday.

To date, we transitively depend on terra, which may require additional installation steps (GDAL) on your OS. Please see the corresponding install section of their README first.

Now, if you feel lucky, try:

install_github("haganjam/InvTraitR")

Usage

The package exports a single function get_trait_from_taxon:

get_trait_from_taxon(
    data,                   # data.frame with at least five columns: target taxon, life stage, latitude (dd), longitude (dd) and body size (mm) if trait == "equation"
    target_taxon,           # character string with the column name containing the taxon names
    life_stage,             # character string with the column name containing the life stages
    latitude_dd,            # character string with the column name containing the latitude in decimal degrees
    longitude_dd,           # character string with the column name containing the longitude in decimal degrees
    body_size,              # character string with the column name containing the body size data if trait = "equation"
    workflow = "workflow2", # options are "workflow1" or "workflow2" (default = "workflow2)
    max_tax_dist = 3,       # maximum taxonomic distance acceptable between the target and the taxa in the database (default = 3)
    trait = "equation",     # trait to be searched for (default = "equation")
    gen_sp_dist = 0.5       # taxonomic distance between a genus and a species(default = 0.5)
)

See the docs for more details.

Companion Scripts

companion_scripts contains all the scripts used to create access and analyse the database. The different folders hold scripts for different tasks. The numbers of the folders and the numbers of the scripts within the folders indicate in which order the scripts should be run.

There is one helper script which contains a customised plotting theme that is used throughout the analyses performed:

  • helper-plot-theme.R

01_data_cleaning

The data cleaning folder holds scripts that are used to clean the raw data that was compiled in excel files. The raw data files are then saved as .rds files and stored in the database folder.

02_create_database

There are three scripts in this folder. The first is the script that we use to create the higher-level taxonomic graphs. This works by first harmonising all taxon names in the equation database to three different taxonomic backbones: COL, GBIF and ITIS. Once the names are harmonised, we extract either the family or order of each taxon name. Descendent taxa from all unique families and orders are then extracted and compiled into igraph objects that describe how the different taxon names i.e. species, genera, families, orders etc. relate to each other. These igraph objects are exported as .rds files and stored in the database folder.

  • 01_create_taxon_databases.R

The second script is used to add biogeographical realm, major habitat type and ecoregion information to each equation in the database using the latitude and longitude data associated with each equation and Abell et al.'s (2008) global ecoregion map:

  • 02_set_freshwater_ecoregion_data.R

The third script is a helper function used to generate the higher-level taxonomic graphs:

  • helper-taxon-matrix-function.R

03_accuracy_analysis

This folder contains the script where we test the accuracy of our method for matching names to appropriate equations. Specifically, we compare the biomass generated by selecting equations in the database to actual measured biomass that we compiled from the literature along with biomass generated from equations selected by experts.

First, we use a script to prepare the test data that we compiled from the literature all the files of which are stored in the database folder:

  • 01_prep_test_data.R

Next, we use these test data to test the accuracy of workflow2 which is our automated method for selecting appropriate equations based on a taxonomic name and the geographic/habitat similarity with the equations in the database.

Third, scripts 3 and 4 are used to examine the sources of error variation that we get from workflow2:

  • 03_analyse_error_variation.R
  • 04_model_error_variation.stan

The final script contains helper functions used in the analyses:

  • helper-miscellaneous.R

04_database_characteristics

This script is used to examine the taxonomic and geographical coverage of the equations in our database.

  • 01_descrobe_database.R

Development

There's a devcontainer setup included. If you use VSC you should be prompted to open the project in a container automatically.

devtools are bundled with the devcontainer. Load library(devtools) and you have load_all(), test() and check() ready at hand.

We use renv to provide reproducibility as far as it gets with R. Use renv::snapshot() after changing dependencies, renv::restore() to install declared versions of the dependencies and renv::update() to update to latest CRAN versions (before pushing to CRAN).

The database files will be put into an appdata dir (given by rappdirs) when executing tests or when people load the actual package. If you made changes to the DB files and need to update the files in the appdata dir there's the utility function update_user_db().