linear models for classification: adult solution gives weird coefficients!
3.2: grid-searching pipeline non-deterministic bad result? 3.1: bank campaign data too big for this grid-search?
handle-unknown=ignore for cross-validating one-hot encoder!
Adult classification solution 1.3 using columntransformer - actually not? leave for day 2? 2.2 cross validation instead of crossval score? Cross validation slides too complicated. Linear regression: delete most of the notebook. Remove most of the linear regression slides. Host the slides so I can link to them. Handle unknown in onehot in cv. Didn't get to imbalanced data. Sgdclassifier exercise not interesting without explanation of sgd. CSV iterator typo on slide change open office slides to rst? Bad error message on changing in init (clone breaks weird). Text data notebook and slides redundant. Many slides and notebooks redundant?! Clean up writing own estimator
make slides work offline? add power transformer, ranking transformer? better preprocessing? make sure the requirements include imblearn xgboost replace boston housing by ames?
For 1/4 preprocessing: sync notebook and slides! story! exercise for imputation
For 2/4 Exercise for review: use column transformer! BROKEN? gridsearch: use data that makes it instantaneous? no notebooks for forests?!?! elasticnet to linear models? linear models for classification notebook bleak
For 3/4 No solutions at all? evaluation metrics notebook comes from book, use updated book code? it's kinda weird, make sure to sync with slides 03-model-evaluation.html#28 -> update report with averages!
For 4/4 structure? efficient parameter tuning? text data: make sure it runs quickly enough!
1/4 preprocessing and imputation need some syncing all need check for images
.format is the enemy
EXERCISE & NOTEBOOK FOR FORESTS!!!