bibliography |
---|
readme.bib |
Author: Alberto Quaini
The intrinsicFRP
library implements functions designed for comprehensive evaluation and testing of linear asset pricing models, focusing on the estimation and assessment of factor risk premia,
factor SDF coefficients,
selection of "useful" risk factors (those displaying non-zero population correlation with test asset returns), examination of model misspecification, validation of model identification, and heteroskedasticity and autocorrelation robust covariance matrix estimation.
Given
The vector of compensations for risk factor exposures
For estimating and testing factor risk premia, our toolkit incorporates the classic two-pass approach outlined by [@fama1973risk], the misspecification-robust methodology proposed by [@kan2013pricing], and the tradable and "Oracle" tradable approaches introduced by [@quaini2023tradable].
The classic two-pass factor risk premia are given by:
If the law of one price holds, then there is a Stochastic Discount Factor (SDF)
Many problems arise when some factors are useless or weak, that is, when their
population correlations with test asset excess returns are zero or near-zero,
or more generally when the matrix collecting such population correlations is
reduced-rank. In that case, the matrix of regression coefficients
For selecting the set of useful risk factors, our toolkit implements the iterative factor screening procedure of [@gospodinov2014misspecification], the one-step Oracle selection procedure of [@quaini2023tradable], and the three-step procedure of [@feng2020taming].
The screening methodology in [@quaini2023tradable] removes the factors associated to a zero Oracle tradable factor risk premia estimates, which arise from the one-step
closed-form estimator:
$$\check{\lambda}_k = sign(\hat\lambda_k)\max{|\hat{\lambda}_k|-\tau/||\rho_k||2^2,0},$$
where $\hat{\lambda}=\widehat{Cov}[R_t,F_t]\widehat{Var}[R_t]^{-1}\widehat{E}[R_t]$ is the sample
tradable factor risk premia estimator, which simply replaces population moments with
empirical moments, $\tau$ is a penalty parameter that can be tuned via, e.g., cross validation, and $\rho_k=\widehat{Corr}[F{tk}, R_t]$ is the estimated correlation
between factor
[@feng2020taming] propose a three-steps testing procedure that evaluates the contribution to cross-sectional pricing of any new factors on top of a set of control factors. The third step is a OLS regression of average returns on the covariances between asset returns and the new factors, as well as the control factors selected in either one of the first two steps. The first stwo steps consists in (i) a Lasso regression of average returns on the ovariances between asset returns and all control factors and (ii) a Lasso regression of the covariances between asset returns and the new factors on the ovariances between asset returns and all control factors. The second selection aims at correcting for potential omitted variables in the first selection.
For evaluating model misspecification, the toolkit implements point and
confidence interval estimation for the HJ model misspecification distance
formulated by [@kan2008specification],
which is a modification of the prominent Hansen-Jagannathan misspecification
distance of [@hansen1997assessing].
Such (squared) distance
Lastly, the functions for testing model identification are specialized versions of the rank tests proposed by [@kleibergen2006generalized] and [@chen2019improved]. These tests are specifically tailored to assess the regression coefficient matrix of test asset returns on risk factors.
Factor risk premia
For heteroskedasticity and autocorrelation robust covariance matrix estimation
of a centred time series
To cite intrinsicFRP
in publications, please use:
Quaini, A. (2023).
intrinsicFRP
: An R Package for Factor Model Asset Pricing.R
package version 2.0.0. URL: https://CRAN.R-project.org/package=intrinsicFRP.
Package intrinsicFRP
is on CRAN (The Comprehensive R Archive Network),
hence the latest release can be easily installed from the R
command
line via
install.packages("intrinsicFRP")
To install the latest (possibly unstable) development version from
GitHub, you can pull this repository and install it from the R
command
line via
# if you already have package `devtools` installed, you can skip the next line
install.packages("devtools")
devtools::install_github("a91quaini/intrinsicFRP")
Package intrinsicFRP
contains C++
code that needs to be
compiled, so you may need to download and install the necessary tools
for MacOS or the
necessary tools for
Windows.
R package intrinsicFRP
implements the following functions:
FRP()
: Computes the [@fama1973risk] factor risk premia:FMFRP = (beta' * beta)^{-1} * beta' * E[R]
wherebeta = Cov[R, F] * V[F]^{-1}
or the misspecification-robust factor risk premia of [@kan2013pricing]:KRSFRP = (beta' * V[R]^{-1} * beta)^{-1} * beta' * V[R]^{-1} * E[R]
, from data on factorsF
and test asset excess returnsR
. These notions of factor risk premia are by construction the negative covariance of factorsF
with candidate SDFM = 1 - d' * (F - E[F])
, where SDF coefficientsd
are obtained by minimizing pricing errors:argmin_{d} (E[R] - Cov[R,F] * d)' * (E[R] - Cov[R,F] * d)
andargmin_{d} (E[R] - Cov[R,F] * d)' * V[R]^{-1} * (E[R] - Cov[R,F] * d)
, respectively. Optionally computes the corresponding heteroskedasticity and autocorrelation robust standard errors (accounting for a potential model misspecification) using the [@newey1994automatic] plug-in procedure to select the number of relevant lags, i.e.,n_lags = 4 * (n_observations/100)^(2/9)
. For the standard error computations, the function allows to internally pre-whiten the series by fitting a VAR(1), i.e., a vector autoregressive model of order 1. All the details can be found in [@kan2013pricing].TFRP()
: Computes tradable factor risk premia from data on factorsF
and test asset excess returnsR
:TFRP = Cov[F, R] * Var[R]^{-1} * E[R]
; which are by construction the negative covariance of factorsF
with the SDF projection on asset returns, i.e., the minimum variance SDF. Optionally computes the corresponding heteroskedasticity and autocorrelation robust standard errors using the [@newey1994automatic] plug-in procedure to select the number of relevant lags, i.e.,n_lags = 4 * (n_observations/100)^(2/9)
. For the standard error computations, the function allows to internally pre-whiten the series by fitting a VAR(1), i.e., a vector autoregressive model of order 1. All details are found in [@quaini2023tradable].OracleTFRP()
: Computes Oracle tradable factor risk premia of [@quaini2023tradable] from data onK
factorsF = [F_1,...,F_K]'
and test asset excess returnsR
:OTFRP = argmin_x ||TFRP - x||_2^2 + tau * sum_{k=1}^K w_k * |x_k|
, whereTFRP
is the tradable factor risk premia estimator,tau > 0
is a penalty parameter, and the Oracle weights are given byw_k = 1 / ||corr[F_k, R]||_2^2
. This estimator is called "Oracle" in the sense that the probability that the index set of its nonzero estimated risk premia equals the index set of the true strong factors tends to 1 (Oracle selection), and that on the strong factors, the estimator reaches the optimal asymptotic Normal distribution. Here, strong factors are those that have a nonzero population marginal correlation with asset excess returns. Tuning of the penalty parametertau
is performed via Generalized Cross Validation (GCV), Cross Validation (CV) or Rolling Validation (RV). GCV tunes parametertau
by minimizing the criterium:||PE(tau)||_2^2 / (1-df(tau)/T)^2
wherePE(tau) = E[R] - beta_{S(tau)} * OTFRP(tau)
are the pricing errors of the model for given tuning parametertau
, withS(tau)
being the index set of the nonzero Oracle TFRP computed with tuning parametertau
, andbeta_{S(tau)} = Cov[R, F_{S(tau)}] * (Cov[F_{S(tau)}, R] * V[R]^{-1} * Cov[R, F_{S(tau)}])^{-1}
the regression coefficients of the test assets excess returns on the factor mimicking portfolios, anddf(tau) = |S(tau)|
are the degrees of freedom of the model, given by the number of nonzero Oracle TFRP. CV and RV, instead, choose the value oftau
that minimize the criterium:PE(tau)' * V[PE(tau)]^{-1} PE(tau)
whereV[PE(tau)]
is the diagonal matrix collecting the marginal variances of pricing errorsPE(tau)
, and each of these components are aggregated over k-fold cross-validated data or over rolling windows of data, respectively. Oracle weights can be based on the correlation between factors and returns (suggested approach), on the regression coefficients of returns on factors or on the first-step tradable risk premia estimator. Optionally computes the corresponding heteroskedasticity and autocorrelation robust standard errors using the [@newey1994automatic] plug-in procedure to select the number of relevant lags, i.e.,n_lags = 4 * (n_observations/100)^(2/9)
. For the standard error computations, the function allows to internally pre-whiten the series by fitting a VAR(1), i.e., a vector autoregressive model of order 1. All details are found in [@quaini2023tradable].SDFCoefficients
: Computes the SDF coefficients of [@fama1973risk]:FMSDFcoefficients = (C' * C)^{-1} * C' * E[R]
, or the misspecification-robust SDF coefficients of [@gospodinov2014misspecification]:GKRSDFcoefficients = (C' * V[R]^{-1} * C)^{-1} * C' * V[R]^{-1} * E[R]
from data on factorsF
and test asset excess returnsR
. These notions of SDF coefficients minimize pricing errors:argmin_{d} (E[R] - Cov[R,F] * d)' * W * (E[R] - Cov[R,F] * d)
, withW=I
, i.e., the identity, andW=V[R]^{-1}
, respectively. Optionally computes the corresponding heteroskedasticity and autocorrelation robust standard errors (accounting for a potential model misspecification) using the [@newey1994automatic] plug-in procedure to select the number of relevant lags, i.e.,n_lags = 4 * (n_observations/100)^(2/9)
.FGXFactorsTest()
: Computes the three-pass procedure of [@feng2020taming], which evaluates the contribution to cross-sectional pricing of any new factors on top of a set of control factors. The third step is a OLS regression of average returns on the covariances between asset returns and the new factors, as well as the control factors selected in either one of the first two steps. The first stwo steps consists in (i) a Lasso regression of average returns on the ovariances between asset returns and all control factors and (ii) a Lasso regression of the covariances between asset returns and the new factors on the ovariances between asset returns and all control factors. The second selection aims at correcting for potential omitted variables in the first selection. Tuning of the penalty parameters in the Lasso regressions is performed via Cross Validation (CV). Standard errors are computed following Feng Giglio and Xiu (2020) using the [@newey1994automatic] plug-in procedure to select the number of relevant lags, i.e.,n_lags = 4 * (n_observations/100)^(2/9)
. For the standard error computations, the function allows to internally pre-whiten the series by fitting a VAR(1), i.e., a vector autoregressive model of order 1.ChenFang2019BetaRankTest()
: Tests the null hypothesis of reduced rank in the matrix of regression loadings for test asset excess returns on risk factors using the [@chen2019improved] beta rank test. The test applies the [@kleibergen2006generalized] iterative rank test for initial rank estimation whentarget_level_kp2006_rank_test > 0
, with an adjustment tolevel = target_level_kp2006_rank_test / n_factors
. Whentarget_level_kp2006_rank_test <= 0
, the number of singular values aboven_observations^(-1/4)
is used instead. It presumes that the number of factors is less than the number of returns (n_factors < n_returns
). All the details can be found in [@chen2019improved].IterativeKleibergenPaap2006BetaRankTest()
: Evaluates the rank of regression loadings in an asset pricing model using the iterative [@kleibergen2006generalized] beta rank test. It systematically tests the null hypothesis for each potential rankq = 0, ..., n_factors - 1
and estimates the rank as the smallestq
that has a p-value below the significance level, adjusted for the number of factors. The function presupposes more returns than factors (n_factors < n_returns
). All the details can be found in [@kleibergen2006generalized].HACcovariance()
: estimates the long-run covariance matrix of a multivariate centred time series accounting for heteroskedasticity and autocorrelation using the [@newey1994automatic] estimator. If the number of lags is not provided, they are selected using the Newey-West plug-in procedure, wheren_lags = 4 * (n_observations/100)^(2/9)
. The function allows to internally prewhiten the series by fitting a VAR(1). All the details can be found in [@newey1994automatic].
For usage details, type ?FunctionName
in the R console, e.g.:
?TFRP
The intrinsicFRP
R package includes a dataset comprising following
test asset excess returns and risk factors frequently used
in the asset pricing literature:
returns
: Monthly observations from July 1963 to February 2024, containing excess returns data for 25 Size/Book-to-Market portfolios and 17 industry portfolios.factors
: Monthly observations from July 1963 to February 2024, containing data for the Fama-French 5 factors and the momentum factor.risk_free
: Monthly observations from July 1963 to February 2024, containing returns data for the US T-bill, used as proxy for the risk free asset.
This dataset was sourced from the Kenneth French data library and processed so that the observations on the test assets and the factors are not expressed in percentage points, and that returns on the test assets are in excess of the risk-free rate.
Let us ompute various factor risk premia estimates and corresponding 95% confidence intervals for the Fama-French 6 factors and a (simulated) "useless" factor.
# if you already have package `stats` installed you can skip the next line
install.packages("stats")
# import package data on 6 risk factors and 42 test asset excess returns
# remove the first column containing the date
factors = intrinsicFRP::factors[,-1]
returns = intrinsicFRP::returns[,-1]
RF = intrinsicFRP::risk_free[,-1]
# simulate a useless factor and add it to the matrix of factors
set.seed(23)
factors = cbind(
factors,
stats::rnorm(n = nrow(factors), sd = stats::sd(factors[,3]))
)
colnames(factors) = c(colnames(intrinsicFRP::factors[,2:7]), "Useless")
# index set of specific factor models
# Fama-French 3 factor model
ff3 = 1:3
# Fama-French 6 factor model
ff6 = 1:6 # "Mkt-RF" "SMB" "HML" "RMW" "CMA" "Mom"
# model comprising the Fama-French 6 factors and the simulated useless factor
ff6usl = 1:7 # "Mkt-RF" "SMB" "HML" "RMW" "CMA" "Mom" "Useless"
# compute the factor SDF coefficients and their standard errors
# for the Fama-MacBeth two-pass procedure
fm_sdf = intrinsicFRP::SDFCoefficients(
returns,
factors[,ff6usl],
misspecification_robust = FALSE,
include_standard_errors = TRUE
)
# and for the misspecification-robust procedure of Gospodinov Kan and Robottu
gkr_sdf = intrinsicFRP::SDFCoefficients(
returns, factors[,ff6usl],
include_standard_errors = TRUE
)
Visualization of the Fama MacBeth (FM) and the misspecification-robust (GKR) factor SDF coefficients estimators:
# compute tradable factor risk premia and their standard errors
tfrp = intrinsicFRP::TFRP(returns, factors[,ff6usl], include_standard_errors = TRUE)
# compute the GLS factor risk premia of Kan Robotti and Shanken (2013) and their
# standard errors
krs_frp = intrinsicFRP::FRP(returns, factors[,ff6usl], include_standard_errors = TRUE)
# set penalty parameters
penalty_parameters = seq(1e-4, 4e-3, length.out = 1000)
# compute Oracle tradable factor risk premia and their standard errors
# for low factor models, no need for the "one standard deviation" tuning rule
oracle_tfrp = intrinsicFRP::OracleTFRP(
returns,
factors[,ff6usl],
penalty_parameters,
include_standard_errors = TRUE,
one_stddev_rule = FALSE
)
Tuning model score of the Oracle TFRP estimator:
Visualization of the Fama MacBeth (FM), misspecification-robust (KRS), tradable (TFRP) and Oracle TFRP (O-TFRP) factor risk premia estimates:
In this plot, we notice a number of features:
- The point estimates and associated confidence intervals under the misspecification-robust and the tradable risk premia notions for the market, SMB, HML and RMW factors are similar.
- The point estimates (confidence intervals) of the misspecification-robust risk premia for the RMW and CMA factors are mildly larger (wider) than the point estimates (confidence intervals) their corresponding tradable risk premia.
- The point estimates (confidence intervals) of the misspecification-robust risk premia for the Mom and the simulated useless factor are much larger (wider) than the point estimates (confidence intervals) their corresponding tradable risk premia.
- The only factors featuring a confidence interval for misspecification-robust (tradable) risk premium that does not include zero is the market, CMA and Mom (market and Mom) factors.
- The Oracle factor risk premium of the simulated useless factor is exactly zero.
Let us now focus on the factor screening procedures under the Oracle TFRP estimator and the approach of [@gospodinov2014misspecification].
# recover the indices of the factors selected by the Oracle TFRP estimator
which(oracle_tfrp$risk_premia != 0)
# compute the GKR factor screening procedure
intrinsicFRP::GKRFactorScreening(returns, factors[,ff6])
The results are:
# factor indices of the factors selected by the Oracle TFRP estimator
[1] 1 2 3 4 5 6
# results of the GKR factor screening procedure
$sdf_coefficients
[,1]
[1,] 6.401245
[2,] 8.597950
[3,] 12.483170
$standard_errors
[,1]
[1,] 1.600189
[2,] 2.657617
[3,] 4.173861
$t_statistics
[,1]
[1,] 4.000306
[2,] 3.235211
[3,] 2.990797
$selected_factor_indices
[,1]
[1,] 1
[2,] 3
[3,] 6
While the Oracle TFRP only removes the simulated useless factor, the procedure by [@gospodinov2014misspecification] only retains the market factor.
Let us compute the HJ misspecification distance of the Fama-French 3 and 6 factor models.
# compute the HJ misspecification distance of the Fama-French 3 and 6 factor models
intrinsicFRP::HJMisspecificationDistance(returns, factors[,ff3])
intrinsicFRP::HJMisspecificationDistance(returns, factors[,ff6])
The result of the HJ misspecification distance is:
# HJ misspecification test p-value for the Fama-French 3 factor model
$squared_distance
[1] 0.2488529
$lower_bound
[1] 0.1559376
$upper_bound
[1] 0.3417683
# HJ misspecification test p-value for the Fama-French 6 factor model
$squared_distance
[1] 0.1619658
$lower_bound
[1] 0.08727944
$upper_bound
[1] 0.2366521
Since the p-value of both HJ misspecification tests is below the standard thresholds of
Let us compute the iterative [@kleibergen2006generalized] and the [@chen2019improved] beta rank identification tests for the Fama-French 6 factor model.
# compute identification tests of the Fama-French 6 factor model
intrinsicFRP::IterativeKleibergenPaap2006BetaRankTest(returns, factors[,ff6])
intrinsicFRP::ChenFang2019BetaRankTest(returns, factors[,ff6])
The results of the identification tests for the Fama-French 6 factor model are:
# output of the Iteraive Kleibergen Paap (2006) Beta Rank Test
$rank
[1] 6
$q
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 2 3 4 5
$statistics
[1] 108162.2473 10291.4280 4710.5822 1015.9157 376.7892 112.8450
$pvalues
[1] 0.000000e+00 0.000000e+00 0.000000e+00 3.510246e-143 2.062222e-41
[6] 1.340701e-09
# p-value of the Chen Fang (2019) Beta Rank Test
$statistic
[1] 143.1936
$`p-value`
[1] 0
Since the largest p-value of the Iteraive [@kleibergen2006generalized] Beta Rank Test and the p-value of the [@chen2019improved] Beta Rank Test are below the standard thresholds of
For sanity check, let us compute the same identification tests for the unidentified model comprising the Fama-French 6 factors and a (simulated) "useless" factor.
# compute identification tests of unidentified factor model comprising the
# Fama-French 6 factors and the simulated useless factor
intrinsicFRP::IterativeKleibergenPaap2006BetaRankTest(returns, factors[,ff6usl])
intrinsicFRP::ChenFang2019BetaRankTest(returns, factors[,ff6usl])
The results are:
# output of the Iteraive Kleibergen Paap (2006) Beta Rank Test
$rank
[1] 6
$q
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0 1 2 3 4 5 6
$statistics
[1] 128024.51364 11595.52592 5480.53412 1127.71644 475.16366
[6] 190.55705 57.75449
$pvalues
[1] 0.000000e+00 0.000000e+00 0.000000e+00 7.265767e-147 1.346504e-45
[6] 3.134998e-12 1.216120e-02
# p-value of the Chen Fang (2019) Beta Rank Test
$statistic
[1] 41.87344
$`p-value`
[1] 0.188
Since the largest p-value of the Iteraive [@kleibergen2006generalized] Beta Rank Test and the p-value of the [@chen2019improved] Beta Rank Test are above the standard thresholds of
To optimize computational performance, all methods implemented in package intrinsicFRP
are written in C++ and make use of the Armadillo [@sanderson2016armadillo] library for efficient linear algebra calculations. However, for user convenience, the interface of package intrinsicFRP
is entirely implemented in R, with minimal dependencies, including:
Rcpp
[@eddelbuettel2018extending] andRcppArmadillo
[@eddelbuettel2014rcpparmadillo]: They facilitate seamless integration between R, C++, and the armadillo C++ library.graphics
: It provides R functions for creating basic graphics.
You can raise issues, report bugs, seek for further help, or submit your contribution to the R package intrinsicFRP
at the
github repository a91quaini/intrinsicFRP.
For bug reports, you are kindly asked to make a small and self-contained program which exposes the bug.