This is a collection of R functions to facilitate the analysis of structure in biological communities. We originally developed these functions to describe and compare parental structure for dispersed seeds (Scofield et al. 2012 American Naturalist 180: 719-732), but their usefulness extents to biodiversity data or any other diversity-like data that can be expressed with this same data structure.
When genotypes are available for individuals under study, allelic diversity may be described and compared using functions having similar conceptual and statistical principles, as described in Sork et al. (in press).
NOTE: The version used for Scofield et al. (2012) has a different interface and lacks an R package structure. To use this version, get the release tagged 0.1, and simply source
the files of interest.
These statistical tools were developed in collaboration with Peter Smouse (Rutgers University) and Victoria Sork (UCLA) and were funded in part by U.S. National Science Foundation awards NSF-DEB-0514956 and NSF-DEB-0516529.
The Mann-Whitney-Wilcoxon nested ranks test we originally provided here has been made into an R package and is available in its own repository. It can be installed via Github:
devtools::install_github("douglasgscofield/nestedRanksTest")
vignette("nestedRanksTest")
Install from github using devtools
:
install.packages("devtools")
devtools::install_github("douglasgscofield/dispersalDiversity")
library(dispersalDiversity)
This package requires the readGenalex
package v1.0 available from CRAN, which is
a prerequisite for this package and will be installed by default.
For producing plots, many find aesthetic colour choices to be available through
palettes in the RColorBrewer
package. membershipPlot
(see below) will use
this package if it is available. If it is not installed, get it from CRAN with:
install.package("RColorBrewer")
Most functions take as input a simple data structure, an object of class
divtable
which is a table of site (rows) by
source (columns) counts, which each cell representing the number of times that
source/species/type was observed at that site.
One method to create a random divtable
:
n.sites <- 5
n.sources <- 10
n.samples <- 120
## data frame of site-source pairs
t <- data.frame(site = sample(n.sites, n.samples, replace = TRUE),
source = round(runif(n.samples) * n.sources + 0.5))
## site-by-source matrix
m1 <- do.call(rbind, lapply(split(t, t$site), function(x) table(x$source)))
# this creates a class c("matrix")
m1 <- as.divtable(m1)
## or use xtabs
m2 <- xtabs(data = t)
## this creates a class c("xtabs","table")
m2 <- as.divtable(m2)
One of the first things you may want to do is produce a membership plot. Membership plots provide a visual representation of the site-by-source table. Note that the plot shown below does not match that produced by the example statements.
data(granaries_2004_qlob)
plot(granaries_2004_qlob)
If fill.method = "colour"
and the library RColorBrewer
is available, then
it is used to pick the colours used for filling groups.
Singleton sources (those that appear just once in just one site) are
distinguished using a white background, while multiton sources (those that
appear multiple times but still in just one site) can be distinguished with a
gray background using the option distinguish.multiton = TRUE
. Other options
are provided for controlling labelling of the plot and producing output to PDF
or PostScript files.
Both plots show that if the number of categories requiring colour/fill is
sufficiently large, hatching is combined with colours. Both plots were
produced with distinguish.multiton = FALSE
.
Further published examples of membership plots produced by this code or earlier versions of this code can be seen Scofield et al. 2010, 2011, 2012.
The function diversity
calculates PMI statistics (Grivet et al.
2005, Scofield et al.
2010, Scofield et al.
2011) and dispersal diversity
statistics(Scofield et al.
2012) statistics
(qgg, αg, etc.).
For a divtable
table holding site-by-species counts, the function
calculates PMI and diversity statistics three different ways, each of
which treats possible bias differently and each of which is returned in
an otherwise identical sublist:
-
q
sublist, containing qgg-based statistics, known to be biased (Grivet et al. 2005) -
q.nielsen
sublist, q*gg-based, which apply the transformation developed by Nielsen et al. (2003) to be unbiased and seem to perform well (Scofield et al. 2010, Scofield et al. 2011, Scofield et al. 2012). -
r
sublist, rgg-based, unbiased but poor performers at low sample sizes (Grivet et al. 2005, Scofield et al. 2012)
Several additional functions are provided.
Function | Description |
---|---|
alphaDiversityTest(tab) |
Test for differences in α diversity among sites within a single dataset |
alphaContrastTest(tab.a, tab.b) |
Test whether there is a difference in the α diversity between two datasets |
alphaContrastTest3(tab.a, tab.b, tab.c) |
Test whether there is a difference in the α diversity among three datasets |
pairwiseMeanTest(tab) |
Test whether mean pairwise divergence/overlap among sites is different from the null espectation |
gammaContrastTest(tab.a, tab.b) |
Test whether there is a difference in the γ diversity between two datasets |
gammaContrastTest3(tab.a, tab.b, tab.c) |
Test whether there is a difference in the γ diversity among three datasets |
gammaAccum(tab) |
Permute species among sites and accumulate gamma diversity across permutations |
plot()
methods are provided for the results of the above tests.
For example, using plot()
on the result of alphaDiversityTest()
might look like
Using plot()
on the result of pairwiseMeanTest()
might look like:
Provides a function for plotting pairwise diversity matrices as returned by the
diversity()
function, examples of which can be seen in Figure 4A-C of
Scofield et al. Am Nat.
plotPairwiseMatrix()
: Create a visual plot of pairwise divergence or overlap values as calculated by
pmiDiversity()
For example, with tab
defined as above, plot the divergence matrix based on
rgg calculations, labelling the axes "Seed Pool", using the
following code:
d <- diversity(tab)
plotPairwiseMatrix(d$r$divergence.mat,
pairwise.mean = d$r$divergence,
statistic = "divergence",
axis.label = "Seed Pool")
dispersalDiversity provides functions for calculating γ accumulation across sites, and plotting the result, examples of which can be seen in Figure 4D-F of Scofield et al. Am Nat.
A typical workflow using these functions would be:
rga.result <- gammaAccum(tab)
plot(rga.result)
Perform a γ diversity accumulation on the site-by-source data in tab
.
The result is returned in a list, which may be passed to plot
to
plot the result. Several arguments control the method of accumulation and
value of γ calculated. Only the defaults have been tested; the others were
developed while exploring the data and must be considered experimental.
tab
: Site-by-source table, same format as that passed to pmiDiversity()
gamma.method
: Calculate γ using "r"
(default), "q.nielsen"
or "q"
method (see paper)
resample.method
: "permute"
(default) or "bootstrap"
; whether to resample
sites without ("permute"
) or with ("bootstrap"
) replacement
accum.method
: "random"
(default) or "proximity"
. If proximity
is
used, then distance.file
must be supplied
distance.file
: A file or data.frame containing three columns of data, with
the header/column names being pool
, X
, and Y
, containing the spatial
locations of the seed pools named in the row names of tab; only used with
accum.method="proximity"
Create a visual plot of γ accumulation results from gammaAccum()
.
The following functions typically won't be used separately, use runGammaAccum()
instead.
gammaAccum()
: Workhorse function for γ accumulation
gammaAccumStats()
: Extracts stats from the result of gammaAccum()
runGammaAccumSimple()
: Wrapper that runs and then returns stats from gammaAccum()
These functions calculate allelic alpha, beta and gamma diversity as described by Sork et al. (unpublished), following the same conceptual and statistical principles described in Scofield et al. (2012). These have been used to calculate the structure of allelic diversity for complete progeny genotypes as well as their decomposed male and female gametes, and to compare alpha and gamma diversity of progeny dispersed relatively short vs. long distances.
Input begins as a file of genotypes in GenAlEx format, which is read using the [readGenalex
] R package (https://github.com/douglasgscofield/readGenalex) available via Github:
devtools::install_github("douglasgscofield/readGenalex")
The augmented data.frame
returned by readGenalex()
is then processed into a list of tables, one per locus, using allele.createTableList()
. If the true ploidy of the input data does not the ploidy of the GenAlEx file, for example if haploid gametes are represented by a pair of homozygous alleles, then use the new.ploidy=1
argument to allele.createTableList()
to reduce the ploidy to its true level.
The list of tables is analysed as a unit by allele.pmiDiversity()
, and can be passed to one of the contrast functions for testing against another list of tables.
The workflow to calculate basic allelic diversity statistics:
dat <- readGenalex("GenAlEx-format-file-of-genotypes.txt")
gt <- createAlleleTables(dat)
div <- diversityMultilocus(gt)
For comparing allele diversity between two different samples:
dat1 <- readGenalex("file-of-genotypes-sample-1.txt")
dat2 <- readGenalex("file-of-genotypes-sample-2.txt")
gt1 <- createAlleleTables(dat1)
gt2 <- createAlleleTables(dat2)
alpha.contrast <- alphaContrastTest(gt1, gt2)
gamma.contrast <- gammaContrastTest(gt1, gt2)
For calculating and plotting gamma accumulation curves across all loci:
dat <- readGenalex("genotypes.txt")
lst <- createAlleleTables(dat)
allele.rga.result <- gammaAccum(lst)
plot(allele.rga.result)
diversityMultilocus()
: The function calculating diversity for a set of loci. The single argument is a list produced by createAlleleTables()
, and it uses the function diversitySingleLocus()
.
createAlleleTables()
: Take genotypes read by readGenalex(), produce a list of allele count tables used by the other functions. Each entry of the list is, for each locus, a table of site x allele counts, with row names being the site names, and column names being the names given to the individual alleles.
diversitySingleLocus()
: The single argument is, for a single locus, a table of site × allele counts, with row names being the site names, and column names being the names given to the individual alleles.
alphaDiversityTest(lst)
: Test whether there is a difference in the alpha diversity among patches in an allele diversity dataset, that is, whether β = 1 or δ = 0 across a collection of patches at a site (see Sork et al.).
alphaContrastTest(lst.a, lst.b)
: Test whether there is a difference in the alpha diversity between two lists of allele diversity datasets.
gammaContrastTest(lst.a, lst.b)
: Test whether there is a difference in the gamma diversity between two allele diversity datasets.
gammaAccum(lst)
: Perform a gamma diversity accumulation on the site-by-source data in tab. Several arguments control the method of accumulation and value of gamma calculated. Other arguments are identical to gammaAccum()
. Only the defaults have been tested; the others were developed while exploring the data and must be considered experimental. The result is returned in a custom class which may be passed to plot()
to plot the result.
We also provide the datasets granaries_2002_Qlob
and granaries_2004_Qlob
,
which include assignments of Quercus lobata acorns harvested from acorn
woodpecker granaries in 2002 and 2004 to seed source trees.
Grivet D, Smouse PE, Sork VL. 2005. A novel approach to an old problem: tracking dispersed seeds. Molecular Ecology 14: 3585-3595.
Nielsen R, Tarpy DR, Reeve HK. 2003. Estimating effective paternity number in social insects and the effective number of alleles in a population. Molecular Ecology 12: 3157-3164.
Scofield DG, Smouse PE, Karubian J, Sork VL. 2012. Use of alpha, beta, and gamma diversity measures to characterize seed dispersal by animals. American Naturalist 180: 719-732, supplement, data.
Scofield DG, Alfaro VR, Sork VL, Grivet D, Martinez E, Papp J, Pluess AR, Koenig WD, Smouse PE. 2011. Foraging patterns of acorn woodpeckers (Melanerpes formicivorus) on valley oak (Quercus lobata Née) in two California oak savanna-woodlands. Oecologia 166: 187-196, supplement.
Scofield DG, Sork VL, Smouse PE. 2010. Influence of acorn woodpecker social behaviour on transport of coast live oak (Quercus agrifolia) acorns in a southern California oak savanna. Journal of Ecology 98: 561-572, supplement.
Sork VL, Smouse PE, Grivet D, Scofield DG. (In press) Impact of asymmetric male and female gamete dispersal on allelic diversity and spatial genetic structure in valley oak (Quercus lobata Née). Evolutionary Ecology.