Note: Package tests may fail on Julia v1.6 or later due to an update in random number generation. I need to update the tests in this package so they don't depend on the specifics of the underlying random number generator in Julia, which I will do in due course. For now, don't worry if the tests fail, nothing of note has been changed in this package in years.
A module for the Julia language that implements several varieties of the dependent statistical bootstrap as well as the corresponding block-length selection procedures.
This package is compatible with julia v1.0+. If you are running v0.6, you will need to use Pkg.pin("DependentBootstrap", v"0.1.1")
at the REPL, and if you are running v0.5, use Pkg.pin("DependentBootstrap", v"0.0.1")
. Compability with versions before v0.5 is not available.
This module allows Julia users to estimate the distribution of a test statistic using any of several varieties of the dependent bootstrap.
The following bootstrap methods are implemented:
- the iid bootstrap proposed in Efron (1979) "Bootstrap Methods: Another Look at the Jackknife",
- the stationary bootstrap proposed in Politis, Romano (1994) "The Stationary Bootstrap"
- the moving block bootstrap proposed in Kunsch (1989) "The jackknife and the bootstrap for general stationary observations" and (independently) Liu, Singh (1992) "Moving blocks jackknife and bootstrap capture weak dependence",
- the circular block bootstrap proposed in Politis, Romano (1992) "A circular block resampling procedure for stationary data", and
- the non-overlapping block bootstrap described in Lahiri (1999) Resampling Methods for Dependent Data (this method is not usually used and is included mainly as a curiosity).
The module also implements the following block length selection procedures:
- the block length selection procedure proposed in Politis, White (2004) "Automatic Block Length Selection For The Dependent Bootstrap", including the correction provided in Patton, Politis, and White (2009)
Bandwidth selection for the block length procedures is implemented using the method proposed in Politis (2003) "Adaptive Bandwidth Choice".
Some work has been done to implement the tapered block bootstrap of Paparoditis, Politis (2002) "The tapered block bootstrap for general statistics from stationary sequences", along with corresponding block-length selection procedures, but it is not yet complete.
The module is implemented entirely in Julia.
I have not included any procedures for bootstrapping confidence intervals in a linear regression framework, or other parametric models. This functionality is provided by Bootstrap.jl, and work is currently under way to add bootstrap methods from this package to the Bootstrap
API.
I also have not included support for the jackknife, wild bootstrap, or subsampling procedures. I would be quite open to pull requests that add these methods to the present package, but have not had time to implement them myself. Work is ongoing to include the tapered block bootstrap, and ideally, the package will also eventually include the extended tapered block bootstrap. If you are interested in working on any of these projects, please feel free to contact me.
This package should be added using using Pkg ; Pkg.add("DependentBootstrap")
, and can then be called with using DependentBootstrap
. The package depends on StatsBase
and Distributions
for some functionality, and on DataFrames
and TimeSeries
so that DataFrame
and TimeArray
datasets can be supported by this packages methods.
In what follows, I use the terminology from Lahiri (1999) Resampling Methods for Dependent Data and refer to the underlying test statistic of interest as a level 1 statistic, and the distribution parameter of the test statistic that is of interest as a level 2 parameter. For example, the user might have some dataset x
of type T_data
, and be interested in the variance of the sample mean of x
. In this case, the level 1 statistic is the sample mean function mean
, and the level 2 parameter is the sample variance function var
.
I use T_data
to refer to the type of the users dataset, T_level1
to refer to the output type obtained by applying the level 1 statistic function to the dataset, and T_level2
to refer to the output type obtained by applying the level 2 statistic to a Vector{T_level1}
(i.e. a vector of resampled level 1 statistics).
The package exports the following functions, all of which have docstrings that can be called interactively at the REPL:
dbootinds(...)::Vector{Vector{Int}}
-> Returns indices that can be used to index into the underlying data to obtain bootstrapped data. Note, each inner vector of the output corresponds to a single re-sample for the underlying data.dbootdata(...)::Vector{T_data}
-> Returns the bootstrapped data. Each element of the output vector corresponds to one re-sampled dataset, and the output vector will have length equal tonumresample
(a parameter discussed later).dbootlevel1(...)::Vector{T_level1}
-> Returns a vector of bootstrapped level 1 statistics, where the output vector will have length equal tonumresample
.dbootlevel2(...)::T_level2
-> Returns the bootstrapped distribution parameter of the level 1 statistic.dboot(...)::T_level2
-> Identical to dbootlevel2. Most users will want to use this function.dbootvar(...)::Float64
-> Identical todboot
but automatically setsflevel2
tovar
(the sample variance function)dbootconf(...)::Vector{Float64}
-> Identical todboot
but automatically setsflevel2
to the anonymous functionx -> quantile(x, [0.025, 0.975])
, so the level 2 distribution parameter is a 95% confidence interval. In addition to the usual keywords, the keyword version of this function also accepts the keywordalpha::Float64=0.05
, which controls the width of the confidence interval. Note,0.05
corresponds to a 95% confidence interval,0.1
to a 90% interval, and0.01
to a 99% interval (and so on).optblocklength(...)::Float64
-> Returns the optimal block length.
The function bandwidth_politis_2003{T<:Number}(x::AbstractVector{T})::Tuple{Int, Float64, Vector{Float64}}
is not exported, but the docstrings can be accessed using ?DependentBootstrap.bandwidth_politis_2003
at the REPL. This function implements the bandwidth selection procedure from Politis (2003) discussed above, and may be of independent interest to some users.
All of the above functions exhibit the following two core methods:
f(data ; kwargs...)
f(data, bi::BootInput)
where data
is the users underlying dataset, kwargs
is a collection of keyword arguments, and bi::BootInput
is a core type exported by the module that will be discussed later (but can be safely ignored by most users). The following types for data
are currently accepted:
Vector{<:Number}
,Matrix{<:Number}
where rows are observations and columns are variables,Vector{Vector{<:Number}}
where each inner vector is a variable,DataFrame
TimeArray
Of the two core methods, most users will want the kwargs
method. A list of valid keyword arguments and their default values follows:
blocklength
<- Block length for bootstrapping procedure. The default value is0
. Set to <= 0 to auto-estimate the optimal block length from the dataset.Float64
inputs are allowed.numresample
<- Number of times to resample the input dataset. The default value is the module constantNUM_RESAMPLE
, currently set to1000
.bootmethod
<- Bootstrapping methodology to use. The default value is:stationary
(for the stationary bootstrap).blocklengthmethod
<- Block length selection procedure to use if user wishes to auto-estimate the block length. Default value is:ppw2009
(use the method described in Patton, Politis, and White (2009)).flevel1
<- A function that converts the input dataset to the estimator that the user wishes to bootstrap. The default value ismean
.flevel2
<- A function that converts a vector of estimators constructed byflevel1
into a distributional parameter. The default value isvar
.numobsperresample
<- Number of observations to be drawn (with replacement) per resample. The default value is the number of observations in the dataset (the vast majority of users will want this default value).fblocklengthcombine
<- A function for converting aVector{Float64}
of estimated blocklengths to a singleFloat64
blocklength estimate, which is necessary when the input dataset is a multivariate type. The default value ismedian
.
A list of acceptable keyword arguments for bootmethod
and blocklengthmethod
follows. Note you can use either String
or Symbol
when specifying these arguments. For bootmethod
we have:
:iid
or:efron
<- IID bootstrap:stationary
<- Stationary bootstrap:movingblock
or:moving
<- Moving block bootstrap:nonoverlappingblock
or:nooverlap
<- Nonoverlapping block bootstrap:circularblock
orcircular
<- Circular block bootstrap
For blocklengthmethod
we have:
:ppw2009
<- Block length selection method of Patton, Politis, and White (2009)
Acceptable arguments can also be examined interactively by examining the keys of the module dictionaries BOOT_METHOD_DICT
and BLOCKLENGTH_METHOD_DICT
.
In practice, the keyword argument method f(data ; kwargs...)
actually just wraps a call to f(data, BootInput(kwargs...))
under the hood. However, most users will not need to concern themselves with this level of detail.
For those who wish more fine-grained control, please use ?BootInput
at the REPL to get more information on this core module type.
Let data::Vector{Float64}
.
The variance of the sample mean of data
can be bootstrapped using a stationary bootstrap with optimally estimated block length using dboot(data)
or dbootvar(data)
.
A 90% confidence interval for the sample median using a circular block bootsrap with block length of 5 can be estimated using dboot(data, blocklength=5, bootmethod=:circular, flevel1=median, flevel2=(x -> quantile(x, [0.05, 0.95])))
or dbootconf(data, blocklength=5, bootmethod=:circular, flevel1=median, alpha=0.1)
.
Moving block bootstrap indices for generating bootstrapped data with optimally estimated block length can be obtained using dbootinds(data, bootmethod=:moving)
, or if the user wants the bootstrapped data not the indices, dbootdata(data, bootmethod=:moving)
. If the user wants bootstrapped sample medians of data
, then use dbootlevel1(data, bootmethod=:moving, flevel1=median)
.
If the user wants the optimal block length using the method proposed in Patton, Politis, and White (2009), use optblocklength(data, blmethod=:ppw2009)
.
Now let data::Matrix{Float64}
.
If the user wants the median optimal block length from each column of data
, use optblocklength(data, blmethod=:ppw2009)
. If the user wants the average optimal block length use optblocklength(data, blmethod=:ppw2009, fblocklengthcombine=mean)
.
If the user wants the median of the test statistic that is the maximum of the sample mean of each column, using a stationary bootstrap with optimal block length, then use dboot(data, flevel1=(x -> maximum(mean(x, dims=1))), flevel2=median)
. If data::Vector{Vector{Float64}}
instead, and the user wanted the 95% confidence interval, use dbootconf(data, flevel1=(x -> maximum([ mean(x[k]) for k = 1:length(x) ])))
.