Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEGLM Adding one of the fixed effects as dummy gives different results than TWFE. #523

Open
YavuzMehmet2 opened this issue Aug 20, 2024 · 0 comments

Comments

@YavuzMehmet2
Copy link

YavuzMehmet2 commented Aug 20, 2024

Hi all,

First of all, thank you for a fantastic package, I will make sure to explicitly cite it in my papers!

I'm working with a two-way fixed effects (TWFE) logit model in R using the fixest package. My model includes interaction terms between some fixed effects and a categorical variable called group. To report the coefficients of each fixed effect term, I've included one of the fixed effects as a regular control variable. However, I've noticed that this approach leads to differences in standard errors and p-values compared to the standard TWFE model.

When I include one of the fixed effects (e.g., wave) as a regular control variable instead of as a fixed effect, the standard errors and p-values differ, although the coefficients remain the same.

library(fixest)

# Simulate data
set.seed(12345)
n <- 1000  
data <- data.frame(
  outcome = rbinom(n, 1, 0.5),
  group = factor(sample(c("Group1", "Group2", "Group3"), n, replace = TRUE)),
  event1 = rbinom(n, 1, 0.3),
  event2 = rbinom(n, 1, 0.2),
  event3 = rbinom(n, 1, 0.4),
  event4 = rbinom(n, 1, 0.25),
  age_group = factor(sample(2:4, n, replace = TRUE)),
  gender = factor(sample(c("Male", "Female"), n, replace = TRUE)),
  education = factor(sample(c("Low", "Medium", "High"), n, replace = TRUE)),
  ethnicity = factor(sample(c("Ethnic1", "Ethnic2"), n, replace = TRUE)),
  income_level = factor(sample(1:4, n, replace = TRUE)),
  perception = factor(sample(1:3, n, replace = TRUE)),
  previous_vote = rbinom(n, 1, 0.4),
  struggle = factor(sample(1:3, n, replace = TRUE)),
  news_type = factor(sample(c("News1", "News2", "News3", "NoNews"), n, replace = TRUE)),
  location_type = factor(sample(c("Urban", "Rural"), n, replace = TRUE)),
  city = factor(sample(1:10, n, replace = TRUE)),
  wave = factor(sample(paste0("Wave", 1:5), n, replace = TRUE)),
  weight = runif(n, 0.5, 2)
)

# Display results in a table, running the models directly inside etable
etable(
  feglm(
    outcome ~ i(group, "Group1") + 
      i(group, event1, ref = "Group1", ref2 = "0") + 
      i(group, event2, ref = "Group1", ref2 = "0") + 
      i(group, event3, ref = "Group1", ref2 = "0") + 
      i(group, event4, ref = "Group1", ref2 = "0") + 
      i(age_group, ref = "2") + gender + education + 
      i(ethnicity, ref = "Ethnic1") + income_level + 
      i(perception, ref = "2") + as.factor(previous_vote) + 
      i(struggle, ref = "2") + as.factor(news_type) + location_type | city + wave,
    data = data, 
    family = binomial("logit"), 
    weights = data$weight,
    cluster = ~city + wave,
    ssc = ssc(adj = FALSE, cluster.adj = FALSE)
  ),
  feglm(
    outcome ~ i(group, "Group1") + i(wave, ref = "Wave5") + 
      i(group, event1, ref = "Group1", ref2 = "0") + 
      i(group, event2, ref = "Group1", ref2 = "0") + 
      i(group, event3, ref = "Group1", ref2 = "0") + 
      i(group, event4, ref = "Group1", ref2 = "0") + 
      i(age_group, ref = "2") + gender + education + 
      i(ethnicity, ref = "Ethnic1") + income_level + 
      i(perception, ref = "2") + as.factor(previous_vote) + 
      i(struggle, ref = "2") + as.factor(news_type) + location_type | city,
    data = data, 
    family = binomial("logit"), 
    weights = data$weight,
    cluster = ~city + wave,
    ssc = ssc(adj = FALSE, cluster.adj = FALSE)
  )
)

Variance contained negative values in the diagonal and was 'fixed' (a la Cameron, Gelbach & Miller 2011).
                                      feglm(outcome ~.. feglm(outcome ~...1
Dependent Var.:                                 outcome             outcome
                                                                           
group = Group2                         -0.2409 (0.2054)    -0.2409 (0.2411)
group = Group3                          0.1775 (0.2043)     0.1775 (0.2193)
event1 x group = Group2                -0.0144 (0.2281)    -0.0144 (0.2793)
event1 x group = Group3               -0.5420* (0.2496)   -0.5420. (0.3037)
event2 x group = Group2                 0.4310 (0.3427)     0.4310 (0.3645)
event2 x group = Group3                -0.3085 (0.2689)    -0.3085 (0.3193)
event3 x group = Group2                 0.2083 (0.2124)     0.2083 (0.2618)
event3 x group = Group3                -0.0349 (0.2878)    -0.0349 (0.3135)
event4 x group = Group2                -0.2356 (0.2701)    -0.2356 (0.3173)
event4 x group = Group3                -0.0930 (0.2075)    -0.0930 (0.2846)
age_group = 3                          -0.0434 (0.2539)    -0.0434 (0.2769)
age_group = 4                           0.1120 (0.2049)     0.1120 (0.2359)
genderMale                              0.0899 (0.1181)     0.0899 (0.1707)
educationLow                           -0.0650 (0.1390)    -0.0650 (0.1930)
educationMedium                        -0.0057 (0.1607)    -0.0057 (0.2067)
i(factor_var=ethnicity,ref="Ethnic1") -0.1301. (0.0769)    -0.1301 (0.1324)
income_level2                          -0.1411 (0.1032)    -0.1411 (0.1836)
income_level3                          -0.0132 (0.1947)    -0.0132 (0.2517)
income_level4                          -0.2512 (0.2213)    -0.2512 (0.2510)
perception = 1                        -0.1894* (0.0793)    -0.1894 (0.1572)
perception = 3                         -0.0761 (0.1769)    -0.0761 (0.2251)
as.factor(previous_vote)1               0.1698 (0.1279)     0.1698 (0.1768)
struggle = 1                           -0.1089 (0.1668)    -0.1089 (0.2044)
struggle = 3                           -0.0745 (0.2287)    -0.0745 (0.2482)
as.factor(news_type)News2               0.0170 (0.1297)     0.0170 (0.1911)
as.factor(news_type)News3              -0.0712 (0.2071)    -0.0712 (0.2337)
as.factor(news_type)NoNews              0.0709 (0.2078)     0.0709 (0.2248)
location_typeUrban                     -0.0313 (0.1415)    -0.0313 (0.1784)
wave = Wave1                                                0.2194 (0.1759)
wave = Wave2                                               -0.0798 (0.1669)
wave = Wave3                                                0.0673 (0.1890)
wave = Wave4                                                0.2126 (0.3022)
Fixed-Effects:                        -----------------   -----------------
city                                                Yes                 Yes
wave                                                Yes                  No
_____________________________________ _________________   _________________
S.E.: Clustered                         by: city & wave     by: city & wave
Observations                                      1,000               1,000
Squared Cor.                                    0.03632             0.03632
Pseudo R2                                      -0.02464            -0.02464
BIC                                             2,047.4             2,047.4
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1`

Questions:

  1. Is the difference in standard errors and p-values between the two models due to the adjustment of degrees of freedom?

  2. Is this a statistical problem (e.g. incidental parameter problem) or purely a software/package related issue? If it is just a software issue, how can I fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant