- Added "project config" files for consolidating and storing most of the necessary parameters for tailoring general functionality to individual schools' needs (PR #44 #50 #56 #57)
- Extended PDP template notebooks to cover model training and inference (PR #42 #46 #48 #59 #60 #61 #64)
- Improved structure and accessibility of client-facing data output files (PR #43 #53 #58)
- Improved modeling dataset standardization and cleanup with cleaner dtypes, consistent column names, and fewer opportunities for accidental data leakage (PR #38 #45 #49)
- Extended data schemas to cover post-raw transformations of the datasets (PR #36)
- Added new features, including "pre-cohort" and "num courses in study area" (PR #39)
- Added functionality for modeling-adjacent tasks, such as splitting datasets and computing sample weights (PR #41)
- Fixed various bugs and weirdness in PDP synthetic data generation (PR #47 #52 #68)
- Updated key dependencies and added support for PY3.12 (PR #40)
- Ported school-agnostic code from private repo, with some refactoring of structure and modest code quality improvements (PR #1 #2 #3 #6 #10)
- Set up Python packaging with
uv
and updated CI workflows (PR #5 #8 #13 #17 #29 #30 #32) - Extended and improved featurization functionality, including better course grade handling, term- and year-level features, "term diff" features over time (PR: #4 #7 #11 #12 #15 #20 #21 #22 #23)
- Extended and improved target variable functionality, including a new "failure to retain" target and higher-level
make_labeled_dataset()
entry points for each target for developer convenience (PR #24 #33) - Refactored and better generalized PDP raw data schemas (PR #19 #28)
- Added functionality for generating synthetic PDP and "sample platform" data (PR #9)
- Added generalized "pairwise association" function for comparing variables of various data types (PR #31)
- Added template notebooks for the data assessment/EDA and modeling dataset prep steps of the SST process (PR #26)
- Various minor bugfixes