-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPCA_FA.Rmd
58 lines (42 loc) · 1.89 KB
/
PCA_FA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: "Principal Components and Factor Analysis"
author: "Bud Talbot"
output:
pdf_document: default
html_notebook: default
html_document: default
---
*18 May 2017*
##Initial Principal Components and Factor Analysis of GCA data.
The X1516GCA_FA data set is all Fall 15 and Spring 16 pre and post GCA scores (2346 case).
```{r}
X1516GCA_FA <- read.csv(file = "GCApost.csv", header = TRUE)
```
First look at the correlaton matrix for GCA items:
```{r}
library(psych)
library(GPArotation)
corPlot(X1516GCA_FA)
```
One way to determine the number of factors is to compare the solution to a set of simulated random data with properties similar to the GCA data set (a parallel analysis). Running this parallel analysis also produces the scree plot:
```{r}
fa.parallel((X1516GCA_FA))
```
The parallel analysis suggest 3 components and 8(!) factors and the scree plot shows 3-4 components with eigenvalue > 1, so run PCA (descriptive model) with 3 factors, varimax rotation
```{r}
principal(X1516GCA_FA, nfactors=3, rotate = "varimax")
```
Now compare to PCA with 4 factors:
```{r}
principal(X1516GCA_FA, nfactors=4, rotate = "varimax")
```
4 factors only accounts for 4% more variance than the 3 factor PCA. But look at communalities for GCA3 in each solution (0.081 in 3 factor, 0.75 in 4 factor)?
Now compare the PCAs to a factor analysis (structural model) specifying 3 factors, varimax rotation, do not impute values for missing, use minimum residual factoring method (default) and view loading matrix
```{r}
fa(X1516GCA_FA, nfactors = 3, rotate = "varimax")
```
And finally run the FA with 4 factors:
```{r}
fa(X1516GCA_FA, nfactors = 4, rotate = "varimax")
```
The 4 factor FA only accounts for 1% more variance than the 3 factor FA, and is likely harder to interpret. Also the communalities (variance accounted for in each item by all factors in the solution) does not increase much for 4 factors versus 3.