23 July, 2024
- Introduction
- Number of Items and Sample size
- Assumption Checks
- Item list
- chronic pain variable
- Missing data
- 2PL Model estimation
- ICC,TCC, and information plots
- Model fit
- Assumption 1: Unidimesionality
- Assumption 2: Local independence
- Assumption 3: monotonicity
- Fscore distributions
- References
library(tidyverse) # For data wrangling and basic visualizations
library(psych) # For descriptive stats and assumption checks
library(mirt) # IRT modeling
# devtools::install_github("masurp/ggmirt")
library(ggmirt) # Extension for 'mirt'
library(cowplot) # for the personfitplot
library(difR) # for difMH Performs DIF detection using Mantel-Haenszel method.
library(gridExtra)
par(ask = FALSE) # don't ask before updating plot
set.seed(200720)
The 1PL model, also known as the One-Parameter Logistic Model or the
Rasch model (Rasch 1993), is a widely used item response
theory (IRT) model. It assumes that the probability of a correct
response (
Item Difficulty (
The 2PL (Two-Parameter Logistic) model, like the Rasch model, is a
commonly used IRT model for analyzing item responses in psychometric
measurement (Paek and Cole 2019). In the 2PL model we
relax the restriction on the
The discrimination parameter
In this pipeline we calculate fscores using the EAP (Expected a posteriori with item-pattern scoring) as an excellent alternative (in some case superior) to Maximum Likelyhood estimation (ML). In this particular case ML imposes significant limitations because it cannot provide estimates for respondents with an “all-zero” or with an “all-one” score pattern, which is often the case for the chosen variables.
Alternative estimators to ML are Weighted maximum likelyhood (WLE), and
two Bayesian estimator EAP (default in mirt) and MAP (Embretson &
Reise, 2000). Both EAP and MAP consider that the posterior
distribution of the latent trait
In summary, both ML and EAP should not be used when the respondents that
we expect to assess are likely to endorse all items at once, or exactly
none of them. In these cases the value of
In dichotomous IRT models, both sample size and test lenght (i.e. n. of items selected) can influence the accuracy of item-paramenter estimation, and they do so differently in 1PL, 2PL and 3PL models (Baur & Lukes 2009). In general, the estimation of item parameters tends to be proportionally more biased in smaller samples (N <500) (Finch & French 2019). In addition, shorter tests require a much larger sample to reduce the bias in the estimation of item paramenters - e.g. Baur & Lukes (2009) suggest a minimum of 500 individuals to estimate a 2PL model with 15 items.
However, sample size cannot compensate for the sparcity of information
determined by a short test. Baur & Lukes (2009) showed
that given a sample of N=5000, the correlation between the IRT model
estimate of
IRT models have three basic assumptions (Paek & Cole 2019):
- unidimensionality
- local independence
- monotonicity
The most common way to evaluate dimensionality is through factor analysis. We have done so in the previous step of the analysis.
The second assumption is local independence, which means that we assume that there is no additional relationship between items that is not already accounted for by the IRT model - this is evaluated by looking at the linear relationship that may exist between the residual variance of each possible couple of items.
The third assumption states that the highest response rates to the set
of items should correspond to higher latent trait level. A failure to
meet this assumption would be in the case in which endorsing all items
of would correspond to a lower level of the latent trait than the case
in which a respondent endorses (e.g.) only half the same items. A
monotonic item is typically not endorsed until the latent trait reaches
a certain level
LV_items.irt<-read.csv("C:/Users/d.vitali/Desktop/Github/CRIISP-WP5/data/clean/cfa_w2_all")[-1]
Excluding NA pain responses given by a proxy of the respondent.
sc_df<- LV_items.irt[which(LV_items.irt$r2painlv!= -16),]
varlist.irt<-c("idauniq",
"r2shopa", "r2housewka", "r2walkra", "r2mealsa",
"r2clim1a", "r2lifta", "r2beda",
"r2hlthlm")#, "hlmact")# hlmact has local dependence with r2hlthlm and r2shopa
sc_df<-sc_df[varlist.irt]
# three levels: no pain, acute pain, chronic pain
LV_items.irt$lv3chrpain<-ifelse(LV_items.irt$r2painlv<0,
NA, # unkown level
ifelse(LV_items.irt$r2painfr==0,
0, # no pain
ifelse(LV_items.irt$r2chrpain==1,
2, # chronic
1))) # pain but not chronic or unkown level
# two levels: no pain, chronic pain
LV_items.irt$r2painfr %>% is.na %>% sum
## [1] 261
LV_items.irt$lv2chrpain<-ifelse(LV_items.irt$r2painlv<0,
NA,
ifelse(LV_items.irt$r2painfr==0, #this variable has no NA
0, # no pain
ifelse(LV_items.irt$r2chrpain==1,
1, # chronic
0) # pain but not chronic
)
)
sc_df[sc_df < 0] <- NA
library(naniar)
vis_miss(sc_df[varlist.irt],sort_miss = T)
sc_df_NA<-sc_df
n.items<-length(varlist.irt[-1])
paste("full pattern missing:",sum(rowSums(is.na(sc_df_NA[-1]))==n.items))
## [1] "full pattern missing: 1"
paste("Some item missing:",sum(rowSums(is.na(sc_df_NA[-1]))>0 & rowSums(is.na(sc_df_NA[-1])) < n.items))
## [1] "Some item missing: 19"
[from imputemissing() doc]
Given an estimated IRT model and an estimate of the latent trait (Fscores), we can impute plausible missing data values (Chalmers 2012).
After taking the fit of our 2PL model on the data as it is (i.e. with the missingness):
# Model estimation. Here we indicate that we are estimating a 2PL model, and standard errors for parameters are estimated.
sc.2pl.NA.fit <- mirt::mirt(sc_df[-1],
1, # indicates a unidimensional model will be fit
itemtype = "2PL",
full.scores.SE = T,
full.scores = T,
verbose = T)
## Warning: data contains response patterns with only NAs
## Iteration: 1, Log-Lik: -22778.037, Max-Change: 2.16023Iteration: 2, Log-Lik: -20420.311, Max-Change: 1.98502Iteration: 3, Log-Lik: -19821.865, Max-Change: 0.76721Iteration: 4, Log-Lik: -19662.779, Max-Change: 0.75208Iteration: 5, Log-Lik: -19595.808, Max-Change: 0.41809Iteration: 6, Log-Lik: -19564.447, Max-Change: 0.23134Iteration: 7, Log-Lik: -19548.274, Max-Change: 0.18031Iteration: 8, Log-Lik: -19539.209, Max-Change: 0.11914Iteration: 9, Log-Lik: -19534.028, Max-Change: 0.09662Iteration: 10, Log-Lik: -19530.961, Max-Change: 0.05478Iteration: 11, Log-Lik: -19529.104, Max-Change: 0.05836Iteration: 12, Log-Lik: -19527.991, Max-Change: 0.02434Iteration: 13, Log-Lik: -19527.497, Max-Change: 0.02037Iteration: 14, Log-Lik: -19526.915, Max-Change: 0.01642Iteration: 15, Log-Lik: -19526.496, Max-Change: 0.01281Iteration: 16, Log-Lik: -19525.616, Max-Change: 0.00536Iteration: 17, Log-Lik: -19525.492, Max-Change: 0.00439Iteration: 18, Log-Lik: -19525.397, Max-Change: 0.00356Iteration: 19, Log-Lik: -19525.225, Max-Change: 0.00243Iteration: 20, Log-Lik: -19525.200, Max-Change: 0.00257Iteration: 21, Log-Lik: -19525.173, Max-Change: 0.00202Iteration: 22, Log-Lik: -19525.112, Max-Change: 0.00100Iteration: 23, Log-Lik: -19525.107, Max-Change: 0.00081Iteration: 24, Log-Lik: -19525.103, Max-Change: 0.00073Iteration: 25, Log-Lik: -19525.089, Max-Change: 0.00050Iteration: 26, Log-Lik: -19525.088, Max-Change: 0.00038Iteration: 27, Log-Lik: -19525.088, Max-Change: 0.00028Iteration: 28, Log-Lik: -19525.087, Max-Change: 0.00027Iteration: 29, Log-Lik: -19525.087, Max-Change: 0.00038Iteration: 30, Log-Lik: -19525.087, Max-Change: 0.00024Iteration: 31, Log-Lik: -19525.087, Max-Change: 0.00015Iteration: 32, Log-Lik: -19525.087, Max-Change: 0.00011Iteration: 33, Log-Lik: -19525.087, Max-Change: 0.00013Iteration: 34, Log-Lik: -19525.087, Max-Change: 0.00009
We can then use Maximum A Posteri (MAP) estimation to estimate the latent trait scores for individuals with missing responses. The fscores (factor scores or latent scores) estimated with MAP can be used to impute the missing item responses based on the relationship between the latent trait and the observed responses. This approach assumes that individuals with similar trait scores have similar item response patterns.
This means that we can use the irt model parameters (a,b) to estimate the probability of each them being scored in any given response pattern. Given an estimated latent score for a respondent we can estimate the probability of on item being endorsed: e.g. if the respondent have endorse the item for “difficulty lifting a 10lb weight” and that of “difficulty shopping for groceries”, this methods estimates the probability of the respondent endorsing the item about experiencing “difficulty doing work”.
Thus:
For the task of imputing missing data Chalmers suggest the (Maximum (mode) a posteriori with item-pattern scoring). Both EAO and MAP methods are Bayesian.
Lscores <- fscores(sc.2pl.NA.fit, method = 'MAP') # "MAP" for the maximum a-posteriori (i.e, Bayes modal)
# see: https://philchalmers.github.io/mirt/html/imputeMissing.html
fulldata <- imputeMissing(sc.2pl.NA.fit, Lscores) %>%as.data.frame()
fulldata$idauniq<-sc_df$idauniq
fulldata<-fulldata[c(ncol(fulldata),1:n.items)]
In comparison with the original dataset there is one row that cannot be inputed
paste("full pattern missing:",sum(rowSums(is.na(fulldata[-1]))==n.items))
## [1] "full pattern missing: 1"
paste("Some item missing:",sum(rowSums(is.na(fulldata[-1]))>0 & rowSums(is.na(fulldata[-1]))<n.items))
## [1] "Some item missing: 0"
fulldata$idauniq[which(rowSums(is.na(fulldata[-1]))>0)]
## [1] 121320
sc_df<-fulldata
sc_df<-na.omit(sc_df)
Here we indicate that we are estimating a 2PL model, and standard errors for parameters are estimated.
sc.2pl.fit <- mirt::mirt(sc_df[-1],
1,
itemtype = "2PL",
full.scores.SE = T,
full.scores = T,
verbose = T)
## Iteration: 1, Log-Lik: -22791.160, Max-Change: 2.15955Iteration: 2, Log-Lik: -20431.035, Max-Change: 2.00044Iteration: 3, Log-Lik: -19831.083, Max-Change: 0.77327Iteration: 4, Log-Lik: -19672.871, Max-Change: 0.48385Iteration: 5, Log-Lik: -19605.147, Max-Change: 0.32563Iteration: 6, Log-Lik: -19574.213, Max-Change: 0.23128Iteration: 7, Log-Lik: -19557.208, Max-Change: 0.15839Iteration: 8, Log-Lik: -19548.167, Max-Change: 0.11597Iteration: 9, Log-Lik: -19543.083, Max-Change: 0.07018Iteration: 10, Log-Lik: -19540.153, Max-Change: 0.05800Iteration: 11, Log-Lik: -19538.395, Max-Change: 0.04815Iteration: 12, Log-Lik: -19537.315, Max-Change: 0.04327Iteration: 13, Log-Lik: -19536.568, Max-Change: 0.02581Iteration: 14, Log-Lik: -19536.041, Max-Change: 0.01686Iteration: 15, Log-Lik: -19535.656, Max-Change: 0.01215Iteration: 16, Log-Lik: -19535.232, Max-Change: 0.00957Iteration: 17, Log-Lik: -19535.041, Max-Change: 0.00674Iteration: 18, Log-Lik: -19534.894, Max-Change: 0.00497Iteration: 19, Log-Lik: -19534.580, Max-Change: 0.00320Iteration: 20, Log-Lik: -19534.541, Max-Change: 0.00254Iteration: 21, Log-Lik: -19534.510, Max-Change: 0.00224Iteration: 22, Log-Lik: -19534.399, Max-Change: 0.00094Iteration: 23, Log-Lik: -19534.395, Max-Change: 0.00094Iteration: 24, Log-Lik: -19534.391, Max-Change: 0.00072Iteration: 25, Log-Lik: -19534.382, Max-Change: 0.00045Iteration: 26, Log-Lik: -19534.382, Max-Change: 0.00039Iteration: 27, Log-Lik: -19534.381, Max-Change: 0.00030Iteration: 28, Log-Lik: -19534.380, Max-Change: 0.00019Iteration: 29, Log-Lik: -19534.380, Max-Change: 0.00017Iteration: 30, Log-Lik: -19534.380, Max-Change: 0.00012Iteration: 31, Log-Lik: -19534.380, Max-Change: 0.00010
below are our estimated item-specificic coefficients described in the
equation above (where a =
coef(sc.2pl.fit, IRTpars = T, simplify = T)
## $items
## a b g u
## r2shopa 4.539 1.417 0 1
## r2housewka 4.134 1.059 0 1
## r2walkra 4.458 1.974 0 1
## r2mealsa 3.715 1.863 0 1
## r2clim1a 2.955 1.209 0 1
## r2lifta 2.792 0.783 0 1
## r2beda 2.641 1.858 0 1
## r2hlthlm 2.341 0.548 0 1
##
## $means
## F1
## 0
##
## $cov
## F1
## F1 1
this is the estimate of each person level of the latent variable
fscores(sc.2pl.fit, method = "EAP", full.scores = T, full.scores.SE = T) %>% head()
## F1 SE_F1
## [1,] -0.6194326 0.7176575
## [2,] -0.6194326 0.7176575
## [3,] 0.5092272 0.3802625
## [4,] -0.6194326 0.7176575
## [5,] 0.8185876 0.3090807
## [6,] -0.6194326 0.7176575
Like in a Rasch model, the item characteristic curves depict the
relationship between the probability of a correct response to an item
and the underlying latent trait (
Discrimination Power (a or
Difficulty Level (b or
ggmirt::tracePlot(sc.2pl.fit, facet = F, legend = T,theta_range = c(-1, 3.5)) + scale_color_brewer(palette = "Set1")
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
By looking at all the item characteristic’s curves, it is possible to
evaluate the
Similarly to ICC, item information curves can be used to assess the
quality of each item in terms of its contribution to score estimation
precision in IRT models. The item info plot visualizes how the
information varies across the range of the latent trait. The curves have
a bell-shaped pattern with the peak representing the region of
Steep IICs: Steep IICs indicate that an item is highly informative
and discriminative in a narrow range of the latent trait (
Flat IICs: Flat IICs indicate that an item provides relatively
consistent and stable information across a broader range of the latent
trait. These items are less good at discrimitating a specific level of
Moderate IICs: Some items may have IICs with intermediate steepness, meaning they provide moderate levels of information over a moderate range of abilities. These items offer a balance between the precision of measurement and coverage of ability levels.
ggmirt::itemInfoPlot(sc.2pl.fit, legend = T, theta_range = c(-1, 3.5)) + scale_color_brewer(palette = "Set1")
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
The test Characteristic curve (TCC) summarises each items’ information characteristics in a single line. Bell-shaped, the TCC shows how the information captured by the items varies across different levels of the latent trait.
-
Height of the TCC: A higher test information curve indicates greater measurement precision and reliability, meaning that the test is more accurate in estimating individuals’ latent trait levels. The peak of the curve represents the region where the test provides the highest amount of information
-
Width of the TCC: The width of the test information curve (in combination with the dotted line representing the standard error) represents the range of the latent trait over which the test provides reasonably good measurement precision. The dotted line (SE) crosses the TCC showing at which levels of the latent trait the the measures of the latent scores are eccessively biased by measurement error.
ggmirt::testInfoPlot(sc.2pl.fit,theta_range = c(-1, 3.5), adj_factor = .7) # adj_factor modifies the scale of the SE on the y axis
The scale characteristic curve represent the relationship between the
construct we are trying to measure (pain impact) and the Fscores
(T(
p.scc<-ggmirt::scaleCharPlot(sc.2pl.fit, theta_range = c(-1,3))
p.scc
In IRT, one commonly used fit index is the M2 statistic, also known as the weighted root mean square residual (WRMR). The M2 statistic assesses the fit of the observed item response patterns to the expected response patterns based on the fitted IRT model. Lower values of the M2 statistic indicate better model fit.
mirt::M2(sc.2pl.fit)
## M2 df p RMSEA RMSEA_5 RMSEA_95 SRMSR
## stats 95.75073 20 7.198575e-12 0.02045418 0.01644532 0.02465418 0.0208496
## TLI CFI
## stats 0.9975401 0.998243
The output provides various fit statistics (M2, RMSEA, SRMSR, TLI, CFI) that indicate the goodness of fit for the IRT model.
. M2: M2 is an absolute indicator of fit of the model to the observed data. Lower values indicate better model fit. . df: The degrees of freedom (df) associated with the model fit. . p: The p-value associated with the M2 statistic
Hu and Bentler (1999) empirically examine various cutoffs for many of these measures, and their data suggest that to minimize Type I and Type II errors under various conditions, one should use a combination of one of the above relative fit indexes, such as the CFI or IFI, with values greater than approximately .95, in combination with the SRMR (good models < .08) or the RMSEA (good models < .06). (from Newsom 2023). RMSEA_5 and RMSEA_95: RMSEA_5 and RMSEA_95 represent respectively the lower and upper bound of the 90% confidence interval for RMSEA. TLI stands for Tucker-Lewis Index - this is relative and compare the chi-square for the model to a null model based on variables that are uncorrelated. CFI or Comparative Fit Index is a non-centrality fit parameter and uses a chi-square equal to the df for the model as having a perfect fit (as opposed to chi-square equal to 0 seen for TLI). RMSEA is another non centrality parameter.
Hu and Bentler (1999) suggested that an RMSEA smaller than .06 and a CFI and TLI larger than .95 indicate relatively good model fit
With itemfit() we can evaluate the goodness of fit at item level. The
fit index is
Items with lower S_X2 values and higher p.S_X2 values are considered to have better fit to the model.
mirt::itemfit(sc.2pl.fit)
## item S_X2 df.S_X2 RMSEA.S_X2 p.S_X2
## 1 r2shopa 8.729 5 0.009 0.120
## 2 r2housewka 12.165 5 0.013 0.033
## 3 r2walkra 6.061 5 0.005 0.300
## 4 r2mealsa 5.643 5 0.004 0.342
## 5 r2clim1a 4.908 5 0.000 0.427
## 6 r2lifta 7.329 5 0.007 0.197
## 7 r2beda 4.484 5 0.000 0.482
## 8 r2hlthlm 10.002 5 0.011 0.075
Person fit assess the fit of individual response patterns to the model. Better person fit statistics indicate that each respondent endorsed items using patterns that align with the IRT model’s predictions. E.g. individuals with higher levels of pain impact (our latent trait) should appear as more likely to endorse the items that are endorsed by respondents with less degree of pain impact. Similarly, individuals with lower levels of pain impact should appear as unlikely to endorse items that are more specific to greater levels of pain impact. Therefore, by assessing person fit we look at individuals’ responses and we expect thees to align with the model’s expectations based on their estimated latent levels of the abilities.
In Rash model, we can use fit statistics such as infit and outfit to assess person fit. In a 2PL model it is more helpful to look at “Zh” measure from Drasgow, Levine and Williams (1985) for unidimensional and multidimensional models. A |Zh| score greater than 3 may indicate the possibility of aberrant response patters (Paek and Cole 2019). Less relevant fot a 2PL model, infit and outfit values close to 1 indicate good person fit, while values significantly higher or lower than 1 suggest potential misfit.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p.fit<-cbind(1:nrow(person_fit),sc_df[-1],mirt::personfit(sc.2pl.fit, method = "EAP"))
#p.fit[order(p.fit[,15]),]
paste0("N. respondents with aberrant patterns: ", sum(person_fit$Zh < -3), " of ", nrow(person_fit), " (", round(sum(person_fit$Zh < -3)/nrow(person_fit)*100,1), "%)")
## [1] "N. respondents with aberrant patterns: 23 of 9054 (0.3%)"
paste0("N. respondents with aberrant patterns: ", sum(person_fit$Zh < -2 & person_fit$Zh >= -3), " of ", nrow(person_fit), " (", round(sum(person_fit$Zh < -2 & person_fit$Zh >= -3)/nrow(person_fit)*100,1), "%)")
## [1] "N. respondents with aberrant patterns: 85 of 9054 (0.9%)"
The output of the following indicates the percentages of individuals falling within and beyond the specified threshold for standardized infit and outfit values. These percentages provide information to assess the fit the IRT model to the data. It is desireable to detect a small percentage of individuals beyond the thresholds for both infit and outfit to considere the model acceptable.
t0<-table(person_fit$Zh > 3 | person_fit$Zh < -3) %>% prop.table()
prop_misfit<-rbind(t0)*100
colnames(prop_misfit)<-c("% within threshold", "% beyond threshold")
rownames(prop_misfit)<-c("zh")
prop_misfit %>% round(2)
## % within threshold % beyond threshold
## zh 99.75 0.25
We evaluated dimensionality via factor analysis in the previous step of the analysis.
The latent trait is presumed to be responsible for the associations between the items. After fitting the model, there should not be any further associations between items. These residual associations are called local dependences. The residual correlations between pain of items should be negligible (less than .1) - see Kline (2016). To check the local independence assumption we can use the residuals() function. This function calculates residuals and local dependence indices, which can help identify potential issues of local dependence among the items.
LD-X2 test
residuals(sc.2pl.fit, type = "LD")
## LD matrix (lower triangle) and standardized values.
##
## Upper triangle summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.033 -0.020 -0.007 -0.006 0.007 0.025
##
## r2shopa r2housewka r2walkra r2mealsa r2clim1a r2lifta r2beda
## r2shopa NA 0.007 -0.029 0.021 -0.019 0.005 -0.029
## r2housewka 0.439 NA -0.017 -0.009 -0.016 0.007 0.010
## r2walkra 7.663 2.499 NA -0.018 0.025 -0.023 0.007
## r2mealsa 3.899 0.804 2.974 NA -0.027 -0.033 -0.024
## r2clim1a 3.138 2.415 5.645 6.639 NA 0.013 -0.006
## r2lifta 0.269 0.463 4.770 9.765 1.513 NA -0.005
## r2beda 7.783 0.869 0.469 5.221 0.362 0.245 NA
## r2hlthlm 1.793 0.465 5.063 0.680 1.503 0.209 0.193
## r2hlthlm
## r2shopa -0.014
## r2housewka 0.007
## r2walkra -0.024
## r2mealsa -0.009
## r2clim1a 0.013
## r2lifta 0.005
## r2beda 0.005
## r2hlthlm NA
This indices is proposed by (Chen & Thissen 1997) is a test for local independence between item pairs. It is calculated via the residuals() function type “LD”.
For (e.g.) 9 items, the result is a 9x9 matrix with the lower triangle showing the signed LD-X2 values and the upper triangle showing the signed Cramer V correlation coefficient values. These correlation coef range between -1 and 1 and absolute small scores for both set of signed coefficients should be preferable (Paek & Cole 2019). Marais and Andrich (2008) suggested a critical residual correlation value of 0.1, but a value of 0.3 has also often been used (see e.g. La Porta et al., (2011); Das Nair et al., (2011); Ramp et al. (2009); Røe, et al. (2014)).
LD<-residuals(sc.2pl.fit, type = "LD") %>% as.table %>% round(2)
## LD matrix (lower triangle) and standardized values.
##
## Upper triangle summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.033 -0.020 -0.007 -0.006 0.007 0.025
##
## r2shopa r2housewka r2walkra r2mealsa r2clim1a r2lifta r2beda
## r2shopa NA 0.007 -0.029 0.021 -0.019 0.005 -0.029
## r2housewka 0.439 NA -0.017 -0.009 -0.016 0.007 0.010
## r2walkra 7.663 2.499 NA -0.018 0.025 -0.023 0.007
## r2mealsa 3.899 0.804 2.974 NA -0.027 -0.033 -0.024
## r2clim1a 3.138 2.415 5.645 6.639 NA 0.013 -0.006
## r2lifta 0.269 0.463 4.770 9.765 1.513 NA -0.005
## r2beda 7.783 0.869 0.469 5.221 0.362 0.245 NA
## r2hlthlm 1.793 0.465 5.063 0.680 1.503 0.209 0.193
## r2hlthlm
## r2shopa -0.014
## r2housewka 0.007
## r2walkra -0.024
## r2mealsa -0.009
## r2clim1a 0.013
## r2lifta 0.005
## r2beda 0.005
## r2hlthlm NA
LD[abs(LD) < 0.05] <- "-"
paste("Signed cramer Cramer-V, values <.05 are omitted (-)")
## [1] "Signed cramer Cramer-V, values <.05 are omitted (-)"
LD
## r2shopa r2housewka r2walkra r2mealsa r2clim1a r2lifta r2beda
## r2shopa - - - - - -
## r2housewka 0.44 - - - - -
## r2walkra 7.66 2.5 - - - -
## r2mealsa 3.9 0.8 2.97 - - -
## r2clim1a 3.14 2.41 5.65 6.64 - -
## r2lifta 0.27 0.46 4.77 9.76 1.51 -
## r2beda 7.78 0.87 0.47 5.22 0.36 0.25
## r2hlthlm 1.79 0.47 5.06 0.68 1.5 0.21 0.19
## r2hlthlm
## r2shopa -
## r2housewka -
## r2walkra -
## r2mealsa -
## r2clim1a -
## r2lifta -
## r2beda -
## r2hlthlm
Above is the same LD matrix but with values smaller than .05 are omitted.
How can we interpret the LD ? one method is the Standadised LD-X2
**Standardised LD-X2 ** Assuming that LD-X2 is an approximation of a chi-square comparing two items and thus with one degree of freedom, we can transform the LD-X2 matrix as suggested by Paek and Cole (2018) and look for item pairs that show a value greater than 10 in the lower matrix
LD<-residuals(sc.2pl.fit, type = "LD")
## LD matrix (lower triangle) and standardized values.
##
## Upper triangle summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.033 -0.020 -0.007 -0.006 0.007 0.025
##
## r2shopa r2housewka r2walkra r2mealsa r2clim1a r2lifta r2beda
## r2shopa NA 0.007 -0.029 0.021 -0.019 0.005 -0.029
## r2housewka 0.439 NA -0.017 -0.009 -0.016 0.007 0.010
## r2walkra 7.663 2.499 NA -0.018 0.025 -0.023 0.007
## r2mealsa 3.899 0.804 2.974 NA -0.027 -0.033 -0.024
## r2clim1a 3.138 2.415 5.645 6.639 NA 0.013 -0.006
## r2lifta 0.269 0.463 4.770 9.765 1.513 NA -0.005
## r2beda 7.783 0.869 0.469 5.221 0.362 0.245 NA
## r2hlthlm 1.793 0.465 5.063 0.680 1.503 0.209 0.193
## r2hlthlm
## r2shopa -0.014
## r2housewka 0.007
## r2walkra -0.024
## r2mealsa -0.009
## r2clim1a 0.013
## r2lifta 0.005
## r2beda 0.005
## r2hlthlm NA
(abs(LD)-1)/sqrt(2)
## r2shopa r2housewka r2walkra r2mealsa r2clim1a r2lifta r2beda
## r2shopa NA -0.702 -0.687 -0.692 -0.694 -0.703 -0.686
## r2housewka -0.397 NA -0.695 -0.700 -0.696 -0.702 -0.700
## r2walkra 4.711 1.060 NA -0.694 -0.689 -0.691 -0.702
## r2mealsa 2.050 -0.139 1.396 NA -0.688 -0.684 -0.690
## r2clim1a 1.512 1.000 3.285 3.987 NA -0.698 -0.703
## r2lifta -0.517 -0.380 2.665 6.197 0.362 NA -0.703
## r2beda 4.796 -0.092 -0.376 2.984 -0.451 -0.534 NA
## r2hlthlm 0.561 -0.378 2.873 -0.226 0.356 -0.559 -0.571
## r2hlthlm
## r2shopa -0.697
## r2housewka -0.702
## r2walkra -0.690
## r2mealsa -0.701
## r2clim1a -0.698
## r2lifta -0.704
## r2beda -0.704
## r2hlthlm NA
In this case it seems that a pair of items “hlmact” and “r2hlthlm” are affected by local dependence and hlmact was removed reducind the number of items from 9 to 8.
mirt::itemfit(sc.2pl.fit)
## item S_X2 df.S_X2 RMSEA.S_X2 p.S_X2
## 1 r2shopa 8.729 5 0.009 0.120
## 2 r2housewka 12.165 5 0.013 0.033
## 3 r2walkra 6.061 5 0.005 0.300
## 4 r2mealsa 5.643 5 0.004 0.342
## 5 r2clim1a 4.908 5 0.000 0.427
## 6 r2lifta 7.329 5 0.007 0.197
## 7 r2beda 4.484 5 0.000 0.482
## 8 r2hlthlm 10.002 5 0.011 0.075
itemtype <- c(rep('2PL',1), 'spline',rep('2PL',6))
mod2 <- mirt(sc_df[-1], 1, itemtype=itemtype, TOL=2e-4)
## Iteration: 1, Log-Lik: -24819.334, Max-Change: 5.57379Iteration: 2, Log-Lik: -20517.699, Max-Change: 1.92977Iteration: 3, Log-Lik: -19884.307, Max-Change: 1.62057Iteration: 4, Log-Lik: -19699.761, Max-Change: 1.40246Iteration: 5, Log-Lik: -19611.038, Max-Change: 0.94081Iteration: 6, Log-Lik: -19572.966, Max-Change: 0.77103Iteration: 7, Log-Lik: -19553.523, Max-Change: 1.25271Iteration: 8, Log-Lik: -19542.444, Max-Change: 0.31875Iteration: 9, Log-Lik: -19537.856, Max-Change: 0.46807Iteration: 10, Log-Lik: -19535.707, Max-Change: 0.08784Iteration: 11, Log-Lik: -19535.076, Max-Change: 0.15241Iteration: 12, Log-Lik: -19534.774, Max-Change: 5.50290Iteration: 13, Log-Lik: -19533.802, Max-Change: 0.50091Iteration: 14, Log-Lik: -19533.687, Max-Change: 0.01431Iteration: 15, Log-Lik: -19533.666, Max-Change: 0.00211Iteration: 16, Log-Lik: -19533.665, Max-Change: 0.00474Iteration: 17, Log-Lik: -19533.660, Max-Change: 0.00150Iteration: 18, Log-Lik: -19533.659, Max-Change: 0.00053Iteration: 19, Log-Lik: -19533.658, Max-Change: 0.00037Iteration: 20, Log-Lik: -19533.658, Max-Change: 0.00240Iteration: 21, Log-Lik: -19533.657, Max-Change: 0.00047Iteration: 22, Log-Lik: -19533.657, Max-Change: 0.00036Iteration: 23, Log-Lik: -19533.657, Max-Change: 0.00025Iteration: 24, Log-Lik: -19533.657, Max-Change: 0.00022Iteration: 25, Log-Lik: -19533.657, Max-Change: 0.00102Iteration: 26, Log-Lik: -19533.657, Max-Change: 0.00061Iteration: 27, Log-Lik: -19533.656, Max-Change: 0.00055Iteration: 28, Log-Lik: -19533.656, Max-Change: 0.00836Iteration: 29, Log-Lik: -19533.654, Max-Change: 0.00038Iteration: 30, Log-Lik: -19533.654, Max-Change: 0.00024Iteration: 31, Log-Lik: -19533.654, Max-Change: 0.00110Iteration: 32, Log-Lik: -19533.653, Max-Change: 0.00019
print(itemplot(mod2, 2, main = "r2housewka"))
itemtype <- c(rep('2PL',7), 'spline')
mod2 <- mirt(sc_df[-1], 1, itemtype=itemtype, TOL=2e-4)
## Iteration: 1, Log-Lik: -23000.680, Max-Change: 2.01169Iteration: 2, Log-Lik: -20418.686, Max-Change: 1.43286Iteration: 3, Log-Lik: -19859.064, Max-Change: 0.57470Iteration: 4, Log-Lik: -19694.797, Max-Change: 0.64284Iteration: 5, Log-Lik: -19615.554, Max-Change: 0.29048Iteration: 6, Log-Lik: -19576.230, Max-Change: 0.34102Iteration: 7, Log-Lik: -19555.957, Max-Change: 0.25494Iteration: 8, Log-Lik: -19545.732, Max-Change: 0.34889Iteration: 9, Log-Lik: -19540.572, Max-Change: 0.16788Iteration: 10, Log-Lik: -19537.915, Max-Change: 0.16037Iteration: 11, Log-Lik: -19536.391, Max-Change: 0.34810Iteration: 12, Log-Lik: -19535.469, Max-Change: 1.99337Iteration: 13, Log-Lik: -19534.052, Max-Change: 0.29278Iteration: 14, Log-Lik: -19533.717, Max-Change: 9.24277Iteration: 15, Log-Lik: -19532.456, Max-Change: 0.10642Iteration: 16, Log-Lik: -19532.454, Max-Change: 5.54899Iteration: 17, Log-Lik: -19531.839, Max-Change: 3.72112Iteration: 18, Log-Lik: -19531.494, Max-Change: 2.45274Iteration: 19, Log-Lik: -19531.144, Max-Change: 0.00424Iteration: 20, Log-Lik: -19531.079, Max-Change: 0.00390Iteration: 21, Log-Lik: -19531.029, Max-Change: 0.00304Iteration: 22, Log-Lik: -19530.933, Max-Change: 0.00291Iteration: 23, Log-Lik: -19530.907, Max-Change: 0.00196Iteration: 24, Log-Lik: -19530.891, Max-Change: 0.00202Iteration: 25, Log-Lik: -19530.868, Max-Change: 0.00221Iteration: 26, Log-Lik: -19530.860, Max-Change: 0.00185Iteration: 27, Log-Lik: -19530.855, Max-Change: 0.00117Iteration: 28, Log-Lik: -19530.849, Max-Change: 0.00094Iteration: 29, Log-Lik: -19530.847, Max-Change: 0.00076Iteration: 30, Log-Lik: -19530.845, Max-Change: 0.00069Iteration: 31, Log-Lik: -19530.841, Max-Change: 0.00088Iteration: 32, Log-Lik: -19530.840, Max-Change: 0.00050Iteration: 33, Log-Lik: -19530.840, Max-Change: 0.00031Iteration: 34, Log-Lik: -19530.840, Max-Change: 0.00026Iteration: 35, Log-Lik: -19530.840, Max-Change: 0.00026Iteration: 36, Log-Lik: -19530.839, Max-Change: 0.00024Iteration: 37, Log-Lik: -19530.839, Max-Change: 0.00022Iteration: 38, Log-Lik: -19530.839, Max-Change: 0.00020Iteration: 39, Log-Lik: -19530.839, Max-Change: 0.00019
print(itemplot(mod2, 2, main = "r2hlthlm"))
There is no suggestion of pattern departure for r2housewka or r2hlthlm.
The range of
As this is a secondary dataset it very imporant to describe sampling methods and thus to say how participants were selected, what were the inclusion/exclusion criteria, and report the demographic characteristics available of the sample.
This information is important for the interpretation of the
In IRT models, measurement invariance is generally described as a property of the IRM by which the items of the test exhibit the same parameter estimates across different groups of respondents . However, in this context it is helpful to make a distinction between internal and external invariance.
The model parameters attained in any IRT analysis are specific to the
items analysed and the population sample from which the participant
responses are drawn, and thus all the characteristics of our model
refers to and are specific to the sample used. This means that given a
sample
Therefore, it is important to consider:
- what is the target population of this measurement model ? (e.g. general population of England that is 50yo or older). This is important to clarify what is the population to which this Item Response Model aims to be calibrated.
- how well the target population is represented by the sample at hand?
for example:
- Geographical area: does the sample include respondents from different areas of the country or are they from a local area?
- Demographics: how representative are the demographics characteristics of respondents to those of the target population?
- ..
The visualisation of theta scores can often reveal the presence of a zero-inflated distribution. This is particularly evident in this case as a substantial number of respondents endorsed very few to none of the items proposed. If we consider that this set of items aimed to measure different levels of “high impact”, the zero-inflation of this distribution shows that most respondents did not experience “high impact”. If - instead of using the Fscore - we aggregate the items raw scoring we may obscure zero-inflation and thereby biasing the our understanding of the data distribution. By making this zero-inflation evident, the IRT measurement model allows us to think at ways in which we can take into consideration this bias ass we proceed in further analyses (e.g. considering zero-inflated regression models).
The fscores() function is extracting the estimated latent trait scores (factor scores) for each respondent from the fitted IRT model stored in the sc.2pl.fit object.
# Extract latent trait scores into a dataframe for charting
sc_df$LVscore<- mirt::fscores(sc.2pl.fit, method = "EAP")[, "F1"]
median_score<-sc_df$LVscore %>% median()
mean_score<-sc_df$LVscore %>% mean()
latent_scores <- sc_df[c("idauniq","LVscore")]
merge_list<-c("idauniq","lv2chrpain","lv3chrpain","r2painlv","r2painfr","r2chrpain")
latent_scores <- merge(LV_items.irt[merge_list], latent_scores, by = "idauniq", all.x = F)
latent_scores[merge_list][latent_scores[merge_list]<0]<-NA
expl_df<-latent_scores
#table(expl_df$r2chrpain)
# selecting cases that can be discriminated between noPAIN, CP and Acute
expl_df<- expl_df %>% dplyr::filter(expl_df$lv3chrpain>=0)
# Create the histogram plot with annotations
expl_df$lv3chrpain <- factor(expl_df$lv3chrpain, levels = c("0", "1","2"), labels = c("No Pain","Acute Pain", "Chronic Pain"))
ggplot(expl_df, aes(x = LVscore, fill = lv3chrpain)) +
geom_histogram(binwidth = .5,color = "black", position = "dodge") +
labs(x = "Latent Trait Score", y = "Frequency", title = "Histogram of Latent Trait Scores")
ggplot(expl_df, aes(x = LVscore, fill = lv3chrpain)) +
geom_density(alpha = .5)+
scale_fill_discrete(name = "Pain group")+
labs(x = "Latent Trait Score", y = "Density", title = "Density plot of Latent Trait Scores")
library(boot)
##
## Attaching package: 'boot'
## The following object is masked from 'package:lattice':
##
## melanoma
## The following object is masked from 'package:psych':
##
## logit
mean.fun <- function(data, idx) {
array <- data[idx]
mean(array)
}
set.seed(720)
data<-expl_df$LVscore[which(expl_df$lv3chrpain=="Chronic Pain")]
bootstrap <- boot(data, mean.fun, R = 10000)
plot(bootstrap, main = paste0("boostrap mean Fscore for Chronic pain subset"))
bootstrap
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = data, statistic = mean.fun, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.6920469 0.0002710725 0.01799007
boot.ci(boot.out = bootstrap, type = c("norm"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = bootstrap, type = c("norm"))
##
## Intervals :
## Level Normal
## 95% ( 0.6565, 0.7270 )
## Calculations and Intervals on Original Scale
data<-expl_df$LVscore[which(expl_df$lv3chrpain=="Acute Pain")]
bootstrap <- boot(data, mean.fun, R = 10000)
plot(bootstrap, main = paste0("boostrap mean Fscore for Acute pain subset"))
bootstrap
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = data, statistic = mean.fun, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.0930746 0.000118584 0.02180647
boot.ci(boot.out = bootstrap, type = c("norm"))
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = bootstrap, type = c("norm"))
##
## Intervals :
## Level Normal
## 95% ( 0.0502, 0.1357 )
## Calculations and Intervals on Original Scale
data_a<-expl_df[which(expl_df$LVscore < 1),]
table(data_a$r2painlv) %>% addmargins()
##
## 0 1 2 3 Sum
## 5316 809 1200 327 7652
table(data_a$r2painlv) %>% prop.table()
##
## 0 1 2 3
## 0.69472033 0.10572399 0.15682175 0.04273393
data_b<-expl_df[which(expl_df$LVscore >= 1),]
table(data_b$r2painlv) %>% addmargins()
##
## 0 1 2 3 Sum
## 294 127 590 376 1387
table(data_b$r2painlv) %>% prop.table()
##
## 0 1 2 3
## 0.21196828 0.09156453 0.42537851 0.27108868
df_plot<-merge(LV_items.irt[c("idauniq","r2agey","r2painlv","ragender","lv3chrpain")], sc_df, by = "idauniq", all.x = F)
#sum(is.na(df_test$r2chrpain))/nrow(df_test)
#table(is.na(LV_items.irt$r2chrpain),LV_items.irt$r2painlv) %>% prop.table(margin = 1 )
df_plot$lv3chrpain<-factor(df_plot$lv3chrpain, levels = c("0", "1","2"), labels = c("No pain","Acute Pain", "Chronic Pain"))
df_plot$age_group<-cut(df_plot$r2agey, c(0, 49, 69, 89, Inf), c("0-49" ,"50-69","70-89","90+"), include.lowest=TRUE) %>% factor
df_plot$lvLVcat<-factor(ifelse(df_plot$LV<0, "lv<0",
ifelse(df_plot$LV>=1, "lv>1","0<=lv<1")))
count_data <- filter(df_plot,!is.na(lv3chrpain)) %>%
group_by(lv3chrpain, age_group) %>%
summarize(count = n())
## `summarise()` has grouped output by 'lv3chrpain'. You can override using the
## `.groups` argument.
ggplot(filter(df_plot,!is.na(lv3chrpain)), aes(x = age_group, y = LVscore, fill = lv3chrpain)) +
geom_boxplot(alpha = .8) +
xlab("Age group") +
ylab(expression(theta)) +
scale_fill_discrete(name = "Pain group") +
scale_y_continuous(breaks = seq(round(min(df_plot$LVscore),1),
round(max(df_plot$LVscore),1), by = .5))+
facet_grid(. ~ lv3chrpain) +
geom_text(data = count_data, aes(label = paste("N=",count), x = age_group, y = 2.3, group = lv3chrpain),
position = position_dodge(width = 0.75), vjust = -0.5, size = 4)+
theme_minimal()
- Asún, R. A., Rdz-Navarro, K., & Alvarado, J. M. (2017). The sirens’ call in psychometrics: the invariance of IRT models. Theory & Psychology, 27(3), 389-406.
- Baur, T., & Lukes, D. (2009). An evaluation of the IRT models through Monte Carlo simulation. UW-L Journal of Undergraduate Research, 12(1-7).
- Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied psychological measurement, 6(4), 431-444.
- Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software, 48, 1-29.
- Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289.
- Das Nair, R., Moreton, B. J., & Lincoln, N. B. (2011). Rasch analysis of the Nottingham extended activities of daily living scale.
- Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical * Psychology, 38, 67-86.
- Embretson, S. E. & Reise, S. P. (2000). Item Response Theory for Psychologists. Erlbaum.
- Finch, H., & French, B. F. (2019). A comparison of estimation techniques for IRT models with small samples. Applied Measurement in Education, 32(2), 77-96.
- Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives.
- Kang, T., & Chen, T. T. (2008). Performance of the generalized S‐X2 item fit index for polytomous IRT models. Journal of Educational Measurement, 45(4), 391-406.
- Kim, S., Moses, T., & Yoo, H. H. (2015). Effectiveness of item response theory (IRT) proficiency estimation methods under adaptive multistage testing. ETS Research Report Series, 2015(1), 1-19.
- Kim, J.K., Nicewander, W.A. Ability estimation for conventional tests. Psychometrika 58, 587–599 (1993).
- Kline, R. B. (2023). Principles and practice of structural equation modeling. Guilford publications.
- La Porta, F., Franceschini, M., Caselli, S., Cavallini, P., Susassi, S., & Tennant, A. (2011). Unified balance scale: an activity-based, bed to community, and aetiology-independent measure of * balance calibrated with Rasch analysis.
- Lee, W. C., & Ban, J. C. (2009). A comparison of IRT linking procedures. Applied measurement in education, 23(1), 23-48.
- Luo, Y., & Dimitrov, D. M. (2019). A short note on obtaining point estimates of the IRT ability parameter with MCMC estimation in Mplus: How many plausible values are needed?. Educational and * Psychological Measurement, 79(2), 272-287.
- Marais, I., & Andrich, D. (2008b). Effects of varying magnitude and patterns of local dependence in the unidimensional Rasch model.
- Newsom, J. (2023) Structural equation modelling
- Orlando, M. & Thissen, D. (2000). Likelihood-based item fit indices for dichotomous item response theory models. Applied Psychological Measurement, 24, 50-64.
- Orlando, M., & Thissen, D. (2003). Further investigation of the performance of S-X2: An item fit index for use with dichotomous item response theory models. Applied Psychological Measurement, 27(* 4), 289-298.
- Paek, I., & Cole, K. (2019). Using R for item response theory model applications. Routledge.
- Rasch, G. (1993). Probabilistic models for some intelligence and attainment tests. MESA Press, 5835 S. Kimbark Ave., Chicago, IL 60637; e-mail: MESA@ uchicago. edu; web address: www. rasch. org; * tele.
- Ramp, M., Khan, F., Misajon, R. A., & Pallant, J. F. (2009). Rasch analysis of the Multiple Sclerosis Impact Scale MSIS-29.
- Røe, C., Damsgård, E., Fors, T., & Anke, A.. (2014). Psychometric properties of the pain stages of change questionnaire as evaluated by Rasch analysis in patients with chronic musculoskeletal pain.
- Şahin, A., & Anıl, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17(1).
- Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.