Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PS2_Q2] Linear regression in Python #8

Open
frederickz opened this issue Jan 27, 2020 · 2 comments
Open

[PS2_Q2] Linear regression in Python #8

frederickz opened this issue Jan 27, 2020 · 2 comments

Comments

@frederickz
Copy link

I am pretty interested in adapting regression method into the initial guesses. Is sklean a good package? Thanks.

@rickecon
Copy link
Owner

@zhuyuming96 . Python has three primary packages for running linear regressions, other than doing the linear algebra approach of inv(X'X)(X'Y). You can find examples of how to use the first two methods in QuantEcon's linear regression notebook section on Simple Linear Regression and on Endogeneity. For the third method, I have some examples in my classification (discrete choice) notebook from MACS 30150.

  1. The statsmodels.api package is probably the most similar to STATA. See Simple Linear Regression section of QuantEcon notebook on "Linear Regression in Python".
import statsmodels.api as sm

# Create object that sets up the regression
reg1 = sm.OLS(endog=df1['logpgp95'], exog=df1[['const', 'avexpr']], missing='drop')
# Actually estimate the coefficients
results = reg1.fit()
# Print STATA-like regression output
print(results.summary())
  1. Similar to statsmodels.api is the linearmodels.iv package for instrumental variables models. See Endogeneity section of QuantEcon notebook on "Linear Regression in Python".
from linearmodels.iv import IV2SLS

iv = IV2SLS(dependent=df4['logpgp95'],
            exog=df4['const'],
            endog=df4['avexpr'],
            instruments=df4['logem4']).fit(cov_type='unadjusted')

print(iv.summary)
  1. scikit-learn also has some regression commands, but it is more about point estimates and prediction than about standard errors. It is harder to get the standard errors and do hypothesis testing. See classification (discrete choice) notebook from Dr. Evans' MACS 30150 class.
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression

LinReg = LinearRegression()
LinReg.fit(X_train, y_train)
y_LinReg_pred = LinReg.predict(X_test)

LogReg = LogisticRegression()
LogReg.fit(X_train, y_train)
y__LogReg_pred = LogReg.predict(X_test)

@frederickz
Copy link
Author

Thank you so much, especially all the resources you provide! These really help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants