- Remove group_col_name (dw_ek_borger) in training data in split trainer (
3b3ee4a
)
- Allow passing custom populate registry fn to hparam search (
244231d
)
- Preprocess additinoal data in selective cv (
f50d22a
)
- Add selective cross validator trainer (
fd9b713
)
- Examine LDL sampling (
1ae827b
)
- F20/f25 filter wokrs (
5eaf0b7
)
- Figure output formats (
91f80ca
)
- Add pathchwork (
a5d4d3f
) - Boxplot added (
b80ca1b
) - Add density plot (
b5d810d
) - Update plot (
c8653ea
) - Update plot (
74e787a
) - Add wip plot code (
265947f
) - Add new eval plpts (
703d8e7
) - Add eval (
de8dafa
) - Add model training (
3cb7517
) - Add feature generation (
c199724
)
- Better parameters for plot axis/title texts (#877) (
9e5c735
) - Rewind eval split args (
963190f
) - Better params for plot texts (
1c4f2f4
) - Remove old test (
87a5536
) - Remvoe old files (
5cd3072
) - Update feature gen args (
33e30ed
)
- Add runpath suggester and remove run path from projectinfo (
faf7d2d
)
- Upload pred df to mlflow (
af15191
)
- Unify naming classification model steps (
31fc24a
)
- Study creation before parallelization (
a4126af
)
- Update perf table (
354021a
)
- Subset tfidf columns (
c28353a
)
- Add dataset logging (
ae662d3
)
- #834: T2d add MCC and F1 to results table (
a306f8b
) - Create descriptive_stats_by_outcome.py (
fe9d23e
)
- Generate features with tsflattener v2 (
759fcc0
)
- Migrate to v2 (
c18450c
)
- Add pse keyword count embedding (
6cbdb42
)
- Feature gen for sczbp text experiment (
13a3622
)
- Lightgbm suggester (
7b9569a
) - Joblib based hyperparameter search (
4df2dec
) - #789: CVD, hyperparam tune for layer 2 (
075313d
) - Lightgbm suggester (
feb6a68
)
- Base table works (
d5d6645
)
- Minor bugs (
29c3a19
)
- #787: Add LightGBM (
fef71b9
)
- #780: Terminallogger pretty print should print non-flattened cfg (#783) (
7abd94a
) - #780: Terminallogger pretty print should print non-flattened cfg (
356ca28
) - Fix c/b plot (
2594b54
)
- #772: Use rich for pretty printing in terminallogger (#777) (
c924a95
) - #772: Use rich for pretty printing in terminallogger (
48cff23
)
- #775: Log outcome column as string, not Index (
a4ff66b
)
- Do not init mlflow until first logging operation (
9085833
)
- Test effect of interval (
4dbafb5
)
- Mlflow logger on overtaci (
a0eef17
)
- #700: Pretrain and finetune a sequential model for T2D (
470b0bc
)
- Use namedtemporaryfile (
19fcf49
)
- #717: Remove-PredictionTimeFilterer (#719) (
a6e3a2d
) - Remove
PredictionTimeFilterer
(bd3e3f0
) - #717: Remove-PredictionTimeFilterer (
0bcf8bd
)
- Runtime_checkable (
6048d8a
)
- Match suggester regex in hparam tuning (
be443f4
) - Hyperparameter tuning with optuna (
679dffb
) - #699: Optuna-hparam-optimization (
7773099
)
- Test on Ovartaci (
57364ce
)
- #702: Allow filtering when generating patient slices (
5970ec0
)
- #709: Support temporary uuid in quarantine filter (
1d3058a
)
- Correct splitting of text data for sentence transformers (
7040115
)
- Update (
fd53ac7
)
- Decrease joins (
cd5fb9d
)
- Circular import (
5ae8d43
)
- #671: Improve-error-when-cannot-parse-registered-config (#672) (
4aa5cff
) - #671: Improve-error-when-cannot-parse-registered-config (
15ca60b
)
- Literal type hint parsing. Fixes #667. (
19dd234
)
- Pretty-annotation-extraction-in-confection-placheholder-configs (
2ab73a6
)
- Imports (
f817aad
)
- Update gain table to be more flexible (
9e8cc4d
) - Update perf by ppr (
8d1084c
) - Update perf by ppr table (
5711773
) - Update performance table (
16183d5
) - Add calibration curve (
f5719cf
) - Map vocab to tfidf indxs in gain table (
279f404
) - Save vocabs from text models (
a084da1
)
- Comment instruction (
3cbb09b
) - Add decimal on roc_auc plot (
a54c283
) - Tfidf matching in gain table (
6a954c0
) - Lint (
cd63ad1
) - Gain table flexibility (
ab70e4d
) - Add arg to aggr eval (
db8f828
)
- Use SplitIDs to generate splits in (issue #593) (
590cce6
)
- Fill defaults from function signatures into .cfg (issue #426) (#638) (
cae3021
) - Fill defaults from function signatures into .cfg (issue #426) (
a821915
) - Fill defaults from function signatures into .cfg (issue #426) (
f96b839
)
- RegionalFilter should not load at init (
7fa1a34
)
- Registry testing project-specific registered functions (#637) (
fa09dfe
) - Only check registered functions in common (
a27bfd8
) - Move sczbp specific estimator steps to own registry (
bb9981e
)
- Typo (
7dd3ced
)
- Validated frame (
07b5f03
)
- Sanitise dict keys for mlflow (
fc5eac9
) - Remove unused extensions (
a50d5e7
) - Disable graphite user pager (
dea1c75
)
- Joint interface for loading different split ids (
b6b2be2
)
- Add cleanlab processing step (
498403b
) - Add missforest imputer (
05100b3
) - Synthetic data augmenter estimator step (
8712caf
) - Simple imputer estimator step (
1f72094
) - Standardscaler that infers numeric columns (
4be5abc
) - Imblearn pipeline and constructor (
c81f98b
)
- Snyk syntax (
751c764
)
- Replace disallowed symbols in config before logging to mlflow (
a49faa2
)
- Analysis of time from first contact to outcome (
14380bd
)
- First pretrain (
22e7eef
)
- Finish apply (
e850710
)
- Type ignore (
6685920
)
- Generate new checkpoint (
558666e
)
- Convert
patient_slice_getters
to classes (d2771d4
) - Convert
patient_slice_getters
to classes (ec5fe84
)
- Pretrain with
mlflow
(#563) (a0d21ba
) - Install invoke (
e5377a7
) - Finetune trainer from pretrained checkpoint (#560) (
eec5816
) - Pretraining with mlflow (
7033842
) - Create finetuning-trainer from pretrained checkpoint (
62dbb8d
)
- Filter based on insufficient lookahead (#541) (
9624405
) - Filter based on insufficient lookahead (
21e545a
) - Filter based on insufficient lookahead (
ab51a6e
)
- Improvements (
1562df8
)
- Add SplitDataset generic (
d4b99ea
)
- Add support for mlflow experiment tracker (#432) (
9922000
) - Add support for mlflow experiment tracker (
7048cb0
) - Add support for mlflow experiment tracker (
db8db5b
)
- Actually apply all filters (
5aaa13c
)
- Minor rename and documentation (
428d2ff
)
- Speed up cohort definers (
3a7bfab
) - Use lazyframes for filter prediction times (
b9972b2
) - Speed up cohort definers (
fac749a
)
- Add washout as an arg during feature gen (
82ec387
)
- Overwrite log on new run (
c60e5d5
)
- Add multilogger to registry (
972b929
)
- Add disk logger (
fcd1b46
)
- Run linter (
6c30f8e
)
- Add snyk workflow (#493) (
72381b3
) - Add snyk workflow (
1114d13
) - Load geographical split (
45917e2
) - Make geographical split ids file (
f206bce
) - Add dataloaderfilterer (
3e2768c
) - update of full eval pipeline (
a408542
) - Add full eval pipeline (
08fa21c
) - Add auroc by time to event (
2026ee3
) - Main performance figure (
d0ba595
) - Finish adaptning robustness (
2199f01
) - Adapt robustness eval (
7f9fcff
)
- Arg updates (
066471b
) - Update run arg type (
460b851
) - Run full eval pipeline (
5beefdf
) - Naming and etc. (
88be891
) - Robustness figs (
2290be0
)
- Updated based on new pre-commit (
e58adfb
)
- Log text representation of config (#486) (
5866cd2
) - Log text representation of config (
8fc4b91
) - Log text representation of config (
dc1585a
)
- Small improvements to docstrings (
f659c96
) - Small improvements to docstrings (
f2e019b
) - Add relevant docstring (
987f4d8
)
- Add RegexColumnBlacklist to common (#472) (
a34066a
) - Add RegexColumnBlacklist to common (
12f30f9
) - Add cfg (
3fe31ed
) - Add RegexColumnBlacklist to common (
a67b1bc
)
- Better scaffolding cfg generation (
349a5f4
)
- Train cvd with v2 (#455) (
6f66569
) - Add xgboost defaults (
51bde84
) - Add vertical concatenator (
607cf3a
)
- Add todo (
01fe0e7
)
- Add xgboost defaults (
2ead123
)
- Prefix conut validator (
c42178c
)
- More info on config update. (
5e546e9
)
- Override type (
c2638a6
)
- Comment readability (
590a946
)
- Ran pre-commit (
7ba17a4
) - Update test to match changes in test patients (
814b525
) - Ran pre-commit (
06d849d
) - Ensure mapping data is only loaded once (
389adca
) - Ran pre-commit (
9c7f7db
) - Updated behrt embedder with filtering (
9b3ca61
) - Update output of create config to ensure correct type (
c336a34
) - Convert output of resolve config to configschema (
2771693
)
- Ensure that evaluation can run with different outcome col names (#418) (
c236d58
) - Support spaces within column names (
abfb810
) - Add support for spaces in header titles (
6109822
)
- Filter cols by lookbehind combination filter (
92b6998
)
- Change BinaryClassificatinoPipeline to take sklearn Pipeline instead of Sequence[ModelStep] (
aa5615e
)
- Change BinaryClassificatinoPipeline to take sklearn Pipeline instead of Sequence[ModelStep] (
b23e6e0
)
- Pass pred time uuid to binaryclassification task (
078cb92
)
- Invalid imports (
4d27c03
)
- Add multilogger (
dc52e55
)
- Collect lazyframe and return pl.series (
ffca056
)
- Performance by lookahead (
671f4e9
) - Different lookaheads (
d7538bf
) - Add script for train test on diff lookaheads (
41a29d8
) - Add script for train test on diff lookaheads (
ce8f59f
) - Add shap (
4029cd8
)
- Refactor tasks structure (
12f2632
) - Allow training from overtaci remote desktop (
7e9b1d7
) - Remove warnings (
66c529e
) - Added hotfix for wandb folder during debugging (
52fb95e
) - Error made by pl lightning when saving hp (
e1f507d
) - Added callbacks (
983908a
) - Removed hotfix for behrt embedder (
7112773
) - Fix based on pr comments (
4e71238
) - Undo edit (
af04d12
) - Removed todo comment (
17c463a
)
- Allow list of data dirs for multirtun (
d77186e
) - Update fa subset feature fns (
7197efe
) - Allow list of data dirs in cfg (
064f8cb
) - Remove redundant quatation marks (
f61209f
)
- Pydantic requires types to be callable. Removed subscripting of pd.Series. (
95d89a6
)
- Add cls token to behrt embedder (
568224f
)
- Error-handling (
c5f8997
)
- Add smoking data (
6547223
)
- Improve documentatin (
0fdca6f
)
- Add todo (
5eab729
)
- Merge multiple feature sets (
6526bdd
) - Test xgboost assumption (
5a3222a
) - Add test of xgboost hyperparams assumption (
0768f62
)
- Misc (
b63d22d
) - Misc (
f8e35a0
) - Correct checking whether dfs can be joined (
7630572
) - Move feature merging to data_loader (
a31c9ca
)
- Add devcontainer.json (
b9230b4
) - Allow levels of granularity in diagnosis mapping (
a06fd75
) - Add subsetting script (
dcd10ee
)
- Update train val descriptive comp script (
b1f3e72
)
- Comment test (
7dbeb53
)
- Extract runs to functions, to avoid instantiation on import (
afc94cb
)
- Renames (
5ea1fe5
)
- How to install cuda enabled pytorch on overtaci (
25608d2
)
- Create plot when training xgboost hba1c only (
cd52ec8
)
- Change typehint for patient colnames (
22d9317
) - Do not import get_best_eval_pipeline unless main (
d5da51f
) - Fixed mutable default error in config (
a2d8294
) - Source subtype filtering works (
1259203
)
- Add overwrite eval warnings (
6d5657f
)
- Added fine-tuning script (
bef7c88
)
- Ran precommit (
6e6bf80
) - Don't run bf16 on tests (
5a624ca
) - Ran pre-commit (
11536be
) - Ran pre-commit (
42b8e29
) - Added description of how to create checkpoint (
453c5a9
) - Updated the checkpoint (
b04f37c
) - Ran pre-commit (
7e40ed1
) - Added test for multilabel (
1f29927
) - Based on review from @MartinBernstorff (
643c1e4
) - Ran pre-commit (
b2a98ec
) - Remove .conftest antipattern (
5e5a693
) - Ran pre-commit (
fab110a
) - Ran pre-commit (
ff41ad2
)
- Added test documentation (
448ce4d
)
- Add tasks.json (
84546d1
) - Add vscode dev task (
8a349b8
) - Create diagnosis mapping (icd10->caliber) (
e27af96
)
- Delete unused file (
1e4cab7
)
- Add procedure codes (
002d488
)
- Define cohort (
7434bac
)
- Gradient accumulation fix OutOfMemoryError? (
7cfc47b
) - Lr scheduler linear with warm-up (
0f2d433
) - Pretrain version (
5e9091c
) - Ready for training (
1218794
) - Expand test to cover model checkpointing (
1f0bece
) - Lightning module saves hyperparams (
59e7ac6
) - Adapt sequence training script to pytorch lightnign (
005cbdf
) - Initial changes to pytorch lightning module (
d919009
) - Added training script for sequence model (
7d2c2bb
)
- Configs should be initialised with factories (
7ddbf9b
) - Ruff (
626e0fc
) - Fully transitioned to pl (
d3a8d32
) - Replaced print with logging statements (
769530f
) - Make sure parameters is actually moved to the gpu (
131e1f0
)
- Typo (
0dfeb69
)
- Add dev container (
84301b8
) - Add corr plot (
d0c116f
) - Add feature outcome corrs (
4376b19
) - Add hist (
708ca45
) - Descriptive stats (
8c0b038
)
- Time from first pos pred to next hba1c (
b0d805d
) - Ned script for retraining model with new cv (
eba749f
) - Add feature importance table (
423cc23
) - Add baseline table one (
8695e29
) - Adding eval plots (
5b81f26
) - Add new eval branch (
5e8aea3
) - Wip new eval structure (
8728545
)
- Lint (
5bbdd7c
) - Missing path arg (
f4fe5e1
) - Spelling in comments (
58de461
) - Lint (
0eeb957
) - Lint (
f084a83
) - Various minor changes (
4a8f6de
) - Missing return type annotation (
cd2f8f9
) - Formatting (
b8521e4
) - Delete old eval folder (
5e5cc6a
) - Eval paths (
6f1f1a1
) - Selected runs (
a448ab4
) - Configs (
80d1907
)
- Implement classifierchain (
097cc0a
)
- Unpack dataframe to series in eval df (
dcf9b93
)
- Main test passes 🥳 (
622adf7
) - Add missing methods from PSYCOPModule to BEHRTForMaskedLM (
79f8a26
) - Update trainer to match checkpoint savers (
ab9a234
) - Add wandb logger (
7eef76a
) - Flesh out trainining (
40b1032
) - Add dataclass-based vocab (
25f7d3e
) - Implemented masking task (
b0ffbf4
) - Embedder skeleton (
81479c7
)
- Fix error from static type checks (
13f5a38
) - Updated format of the mask function to allow for testing (
d49dadc
) - Make sure that the tests test the outer masking_fn (
679cd54
) - Renamed PsycopModule -> TrainableModule (
fadc9a1
) - Remove testing assumption from Logger (
861c162
) - Updated logger to handle allow logging configs seperately (
604dc2e
) - Moved logger interface to its own script (
4c83ff4
) - Added vocab_size (
3789834
) - Added type hints (
370aebe
) - Forward pass in embedding module works (
5cd38d9
) - Added patient dataset (
2a00c32
) - Added behrt embedder (
a2bbd8b
)
- Rename cohort definition to cvd_definition (
27c5e59
) - Minor examples (
f2d99ad
) - Cvd outcome definition (
89dcf49
) - Add cvd filters (
9dcfe33
)
- Remove use of hba1c in cvd filters (
395487e
) - Unneeded newline handling (
311d15d
) - Strip lines of whitespace before generating dataframes (
628db49
)
- First version (
2fc715c
)
- Possibly unbound variable (
864f59b
)
- Point to patient object tests (
d727664
)
- Add tfidf (
fba845a
)
- Config of last model (
250d854
) - Naming (
5bca53b
) - Configurations for new tfidf feat set (
fe33f0f
) - Update configurations of model train and eval (
dd0f83c
) - Reconfigure text lookbehinds (
f6151ad
) - Text specs (
a0ac8bf
)
- Parse date of birth to all patients (
c469997
)
- Train new tfidf model and encode text (
1fdd3ce
)
- Don't shadow python builtin (
c05d307
)
- Get patients from sql (
d9ebba4
)
- Rename from merge (
38257e4
) - Type checking block for circular imports (
a33a8de
) - Typo in shak codes (
e2cd184
)
- Convert getters to properties (
cbe130b
) - Handle lookahead-based outcome resolution (
489003f
) - Remove patient_ids and fix downstream type consequences (
2918452
) - Misc. (
bd3d6ea
) - First working unpacker (
c65219b
) - First stab at unpacking to patient dfs (
74ba12a
) - Filter prediction sequences (
402eee9
)
- Rename patient id in tests (
da79654
) - Spelling errors (
114b624
) - Missing type import (
575689b
) - Downstream type fixes (
1cd438b
)
- Add comments explaining eq (
d99c0be
)
- Filename check earlier for feature-gen (
17b3404
)
- Cohort creation for the cancer project (
4a408b9
)
- Correct type hints for aggregation (
ac9fc29
) - Reconfigure lab tests (
caebc33
) - Replaced unsued function (
75f241c
)
- Type annotation (
a18cc64
) - Wandb back to offline (
b67d4f7
) - Config (
f5e2c73
) - Wandb config (
761ea6a
) - Lookbehind combi config (
09919f4
) - Update configs (
6ad6fef
) - Correct type hints for aggregation (
d6f4311
) - Param changes (
650912d
) - Add missing arg (
500be32
) - Minor changes to params (
dd1dcea
)
- Add new dir param and user prompt (
fe46dbf
)
- Broken tests due to missing arg (
74cd065
) - Add arg to general function (
6395109
) - Update general function (
81506ee
) - Instructions in README.md (
c2de501
)
- Ignore type check (
17a3b45
)
- First stab at chunked feature gen (
0a31a61
) - Add loader for embedded text (
40c8271
) - Train sentence transformer code (
055a572
) - Sentence transformer embedding ready to train (
40ce08a
) - Sentence transformer embedding (
9b998f8
) - Vis qc (
ec8ff15
)
- Misc (
df49d9b
) - Ignore old import erros (
d6a2105
) - Reinstate 'prefixes_to_describe' param (
761c6f0
) - Remove old param (
4240440
) - Minor changes and typos (
2341abd
) - Typo in requirements (
d97aefe
) - Text feat specs resolve mltp to mean (
9dd1065
) - Paths (
b008a95
) - Change chunking pipeline (
781d442
) - Updating scz_bp feature gen (
8b1ee14
) - Move chunk tests (
df7834d
) - Move chunk tests (
295e4c0
) - Don't modify prediction_times_df in PredictionTimeFilterer (
e1eae8c
) - Type hint for ColNames (
0454b42
) - Chunked feature gen (
d8e2ac7
) - Set wandb to offline during feature gen (
5ce32be
) - Print time taken for sentence embedding (
ac4e8a2
)
- Multilabel classification (
5285b2c
)
- Simple qc of text (
eff9bfa
)
- To polars|pandas method for EvalDataset + fixed threshold (
52fe185
) - Add loader for first visit to psychiatry (
9ee9727
)
- Set pythonpath for interactive session (
ded552d
)
- Change prefix for supplementary outputs (
9b2a15d
)
- Only print failed checks if there are any (
ba645ec
) - Only do feature description of columns matching prefix (
dc45ab7
)
- Add birthdays as default (
17bdc01
)
- Add loader for therapeutic leave (
a9f95ec
)
- Correct types for aggregation funcs in t2d specify features (
6796f0a
)
- Add docs to eventcolumns (
e6585a2
) - Explain sequence columns (
be1f89d
) - Define behaviour if lookbehind is none (
26a7f0c
) - Add docstring (
fe5fd2e
)
- Add INP and TCH rules (
4a3d02a
)
- Add plot code (
8c0b347
) - Remove name and build-system to avoid pip install -e . (
63e871b
) - Migrate to requirements.txt (
cab38a5
)
- Add correct new models (
fe9d4bb
) - New best models (
2db0f03
) - Adding more flexibility (
014ba4e
) - Adding more flexibility (
fca7264
)
- Eval pipeline works (
a663436
) - Add typehints to feature specs (
bf42de0
) - Turn wandb off for now in main feature_gen script (
2ea9ef3
) - Cancer project initial setup (
de8f9cf
)
- Minor change (
41e07b2
) - Update readable feature names (
a4073c3
) - Update readable feature names (
114cd24
)
- Add careml to monorepo (
534400a
)
- Move markdown handling to common (
bdfeafa
)
- Misc. (
482db66
)
- Guard for newly optional configs (
e9ff39e
)
- Remove project specific md code (
945a0fd
)
- Simplify feature describer (
a9f9f7b
)
- Patchwork grid of size 1 (
b159a10
)
- Increase size of axis labels in t2d pn theme (
71b8dd0
) - Increase size of patchwork subpanel labels (
384e06d
) - Make HbA1c only configurable (
d2854a8
) - Adopt boolean dataset to featuremodifier (
5188047
) - Ignore static type checks on Ovartaci (
840c015
) - Allow disabling of column name checks (
ad519be
) - Boolean cols in place (
8b968e5
) - Use native polars column selection (
ef25f17
)
- Imports (
e31e18d
)
- Improve docs wording (
173807e
)
- Correct lookbehind selection (
3807a94
)
- Implement full supplementary generation (
530d972
) - Switch to TDD for md_object generation (
35b4787
) - Create required wandb folder when initialising wandb in WandbHandler (
41037d9
) - Misc. (
0a54195
) - Eval run on test_set (
76644ee
)
- Align plot and table for median warning days (
bda3eed
)
- Pin wandb version to avoid failing on tests (
2a92dda
)
- Generate a publication-ready performance_by_ppr table (
32c20ed
)
- Add thousand separator to conf matrix (
fc9b6dc
) - Add thousand separator to conf matrix (
4f98c0b
) - Add thousand separator to plotnine conf matrix (
d27d0f7
) - Add lines to sens by time to event (
b739267
) - First stab at sens by time to event plot (
91571f9
) - Add full performance figure (
45c1d6e
)
- Do not check for venv for tests, conflicts with CI (
672d43f
) - Handle uneven number of plots in patchwork_grid (
a833eb7
)
- Convert auroc to plotnine (
80f5cbf
)
- Incorrect path (
925c94c
)
- Create plotnine confusion matrix (
5045b67
)
- Improve docs (
e6a1230
)
- Autofix when creating pr (
76470cd
)
- Add action (
0490672
)
- First robustness plot (
659d30d
)
- Split ci after bootstrap (
f3e4f6f
)
- Fix typo (
5b8e3be
)
- Create pipeline and unified interface for evaluating the best run (
d4fd7f3
)
- Better explain utility func (
46396a1
)
- Add ci to timedelta plots (
ce8c63f
)
- Handle only one true class (
5a90247
)
- Increase x-axis text size for base plots (
b5ddf0b
)
- Missing polars requirement (
8e277e1
)
- Allow str_to_pl_df (
4cd53ac
)
- Allow custom splits for training (
6e0bf71
)
- Do not support multiclass in calc_performance (
781692b
) - Assign sql cache if on local (
2365b65
) - Assign sql cache if on local (
d57c9fd
)
- Add logging and choose sfi types (
d5f8e23
) - Create example scripts (
76e063a
) - Initial text model pipelines (
1934db0
) - Add tests (
d7a8bab
) - Initial simple preprocessing pipeline for all sfis (
f941a4d
) - Add include_sfi_name in load_text_split (
4605c88
) - Include_sfi_name arg (
58baf9a
) - Fit and load tfidf, bow, and lda models (
3d33d9b
)
- Preprocess to one regex (
c716653
) - Remove symbols again (
1210b7e
) - Based on HLasses comments (
32da48f
) - Insert model type in filename (
1457387
) - Add doc strings to preprocessing functions (
4e27650
) - Remove log.info and small fixes (
84f3cc3
) - Ruff fixes (
ea9c564
) - Return vectorizer and matrix + clean-up (
e1c48a0
) - Query string (
cb7424c
) - Naming and doc string update (
141e52a
) - General clean-up and change corpus in fit functions to list (
22b6a9e
) - Change ngram default and clean-up (
387f845
) - Small fixes to logging (
c3a3f53
) - Remove old comments (
4b88514
) - Change view name (
a9bb0fc
) - Move save_text_model_to_dir to utils (
469df3b
) - Move save_text_model_to_dir to utils (
26a80d2
) - Renaming in preprocessing (
c381768
) - Remove stop_words arg and return models (
3d29012
) - Change arg path to path_str (
f781a74
) - Enable multiple splits when loading data + add n_rows arg (
8ae2d2e
) - Remove Path from arg (
29b442b
)
- Add feature descriptions for text features (
84c696a
)
- Add readme link (
217e550
)
- Remove unreasonably high or low bmi values (
07f52c2
)
- Make sql query executable (
e006490
) - Str turned into list of characters instead of list of words (
0fae478
)
- Add unpack args to skema 2 wo nutrition (
95c35c8
)
- Support new pipe annotation (
a1bde17
)
- Correct types (
5cb0d5d
)
- Add skema_2_without_nutrition again (
685c5cb
)
- Cruft github action (
c8f6278
) - Bug in cruft action (
ec8267a
) - Remove psycop-ml-utils, no longer exists (
d8fbb65
)
- Add more glc loaders (
b765e77
) - Add type 1 diabetes loaders (
b682984
) - Make sql loader verbose (
602f4f3
) - Add caching to sql_load (
a68c15d
) - Ibid (
46da732
) - Add support for keeping code col when loading diagnoses (
51ca63e
) - Add t2d diagnosis loading (
6b8231c
) - Add ogtt (
f6c07a9
) - Update current blood sugar measurements (
5e8051a
)
- Lacking prefix on loading glc (
d9bdbcb
) - Inappropriate matching (
e2409ed
) - Poetry formatted dependencies (
125500a
)
- Disable cache (
0242114
)
- Add option for which timestamp to get when loading physical visits (
ef369b8
)
- Drop duplicates in the output_df (
636cc48
) - Don't load duplicate visits (
5028b1d
) - Physical visits should only load physical visits (
b7c50cf
) - Did not rename to timestamp before returning (
f43522c
)
- Loader names still too long (
3321b88
)
- Loader names too long for wandb (
cc14da2
)
- ValueError correction (
595479e
)
- Adjust function for saving integrity checks (
de2577e
) - Restructure overarching description func (
54c24a2
)
- Better function description (
7eb9e54
)
- Add arg for choosing timestamp and add warning (
159a176
)
- Make naming scheme consistent (
c125b48
) - Attempted rename of unspecified df (
c266bd8
) - Revert logic (
ad110ee
) - Quarantine_df and quarantine_days can be left as None (
f130370
)
- Allowed types works again (
dbe75ca
) - All arg names now congruent, visit_types takes a list of visit types instead of string (
e63e9d4
)
- Add text loaders (
9c7d959
)
- Use acute outpatient visits as well (
659af23
) - Typo, and use newest data (
bbbc8f5
) - Use end dates for all contacts (
d8940c1
) - Use end times for all diagnosis loading (
4d9e600
)
- Remove try/except to avoid debugger getting stuck on it (
3884ab8
)
- Move all str operations into the if statement (
91f9174
)
- Move logs next to their dataset (
e0ed033
)
- Improve quarantine docs (
1b23f19
)
- Name wandb project_name-feature-generation (
b601d80
)
- Improve logging in flatten_dataset (
63f252f
) - Enable minimum specificaitons (
669e3ed
) - Enable minimum specificaitons (
523cfd1
) - Log rows dropped by PredictionTimeFilterer (
7e02d8e
) - Add moves loader (
0521dd0
) - First stab at loader (
f9048b8
)
- Add pred_time_uuid if not specified when filtering (
acca5b9
)
- Avoid groupby in filter_prediction_times (
a66e361
)
- Add rows dropped logging (
33ba525
) - Allow filtering based on quarantine dates (
3deb052
) - Improve logging - debug to file, info to stdout (
aff10a9
) - Move wandb init earlier so wandb_alerts can cover values_df loading (
6c153b1
) - Generate full feature set (
9ba907a
) - Wrap as much of main as possible in wandb exception (
3b085af
) - Allow timestamps only return from visit loaders for use as pred_times (
f9534e0
) - Migrate some loaders to logging. (
f81fd92
) - More explicit logging (
7969210
) - Init changes (
f257daa
)
- Use lookbehind instead of interval days (
7e14ad5
) - Only one feature cache per project (
cb0b8b0
) - Unused input args (
fa14461
) - Wandb util was missing text kwarg (
64c1729
)
- Infer CPU cores from logical cores (
309e9d2
)
- Add wandb alert on exception (
3ff6e37
)
- Improve create_flattened_dataset docs (
637edfe
) - Misc. docs (
4eac2ba
) - Fix github test badge (
dffeedc
)
- Add n_hba1c_within_n_lookahead_days (
e84b591
) - Add outcome (
cd39dd6
) - Add birth year as a predictor (
7b186d2
) - Allow exclusion of specific atc codes (
75619a1
)
- Date of birth col name should respect output prefix (
6ec6535
) - Incorrect column name when adding age as predictor (
cdbf25c
) - Errors in sql loaders after refactor (
28c9f63
) - Correct type hinting in load_diagnoses (
f2d5c5b
)
- Speccify that n_rows = None returns all rows. (
a4720a8
)
- Shuffle feature specs to even out compute vs. IO load (
0db9f0f
) - Tweak n_workers for more performance (
3eeee4d
) - Segment feature loading for more parallelisation (
9ee5c87
) - Rotate feature addition for debugging (
76af9c7
) - Parallelise temporal predictor loading (
8d53f16
) - Only create one subprocess per values loader (
1a3e5de
) - Parralelise groupspec combination creation (
9ccba2a
)
- At groupspec init, iterate over values_loader and check that they exist in the loader registry (
04dfd7e
)
- More explanation in error message (
b784991
) - Bettee valueerror message formatting (
7b3b994
) - Better valueerror message (
d92f798
) - Find invalid loaders (
ba2d4c5
)
- Allow load_medications to concat a list of medications (
d78f465
)
- Remove original functions (
da59110
)
- Improve docs (
9aad0af
)
- Full run (
142212f
) - Rename resolve_multiple registry keys to their previous one (
3fd3f35
) - Reimplement (
c99585f
) - Use lru cache decorator for values_df loading (
4006818
) - Add support for loader kwargs (
127f821
) - Move values_df resolution to AnySpec object (
714e83f
) - Make date of birth output prefix a param (
0ed1198
) - Ensure that dfs are sorted and of same length before concat (
84a4d65
) - Use pandas with set_index for concat (
b93290a
) - Use pandas with set_index for concat (
995da41
) - Speed up dask join by using index (
3402281
) - Require feature name for all features, ensures proper specification (
6af454a
) - First stab at adapting generate_main (
7243130
) - Add exclusion timestamp (
b02de1a
) - Improve dd.concat (
429da34
) - Handle strs for generate_feature_spec (
7d54488
) - Convert to dd before concat (
06101d8
) - Add n hba1c (
3780d84
) - Add n hba1c (
614245e
)
- Coerce by default (
60adb99
) - Output_col_name_override applied at loading, not flattening (
95a96ce
) - Typo (
01240ed
) - Incorrect attribute addressing (
a6e82b5
) - Correctly resolve values_df (
def67cd
) - MinGroupSpec should take a sequence of name to permute over (
f0c8140
) - Typo (
61c7241
) - Remove resolve_multiple_fn_name (
617d386
) - Old concat resulted in wrong ordering of rrows. (
3759f71
) - Set hba1c as eval (
89fe6d2
) - Typos (
6eac440
) - Correct col name inference for static predictors (
dfe5dc7
) - Misc. fixes (
45f8348
) - Generate the correct amount of combinations when creating specs (
c472b3c
) - Typo resulted in cache breaking (
fdd47d7
) - Correct col naming (
bc74ae3
) - Do not infer feature name from values_df (
150569f
) - Misc. errors found from tests (
3a1b5db
) - Revert falttened dataset to use specs (
e4fada7
) - Misc. errors after introducing feature specs (
0308eca
) - Correctly merge dataframes (
a907885
) - Cache error because of loss off UUID (
89d7f6f
) - New bugs in resolve_multiple (
5714a39
) - Rename outcomespec appropriately (
41fa220
) - Lookbehind_days must be iterable (
cc879e9
)
- Move pd->dd into subprocesses (
dc5f38d
)
- Remove shak_code + operator check (
f97aee8
)
- Ignore cat_features (
2052505
) - Failing test (
f8190b4
) - Incorrect 'latest' and handling of NaN in cache (
dc33f7e
)
- Check for value column prediction_times_df (
5356464
) - Change variable name (
990a848
) - More flex loaders (
bcad700
)
- Use wandb to monitor script errors (
67ae9b9
)
- Duplicate loading when pre_loading dfs (
7f864dc
)
- Add variance to resolve multiple functions (
8c471df
)
- Add vairance resolve multiple (
7a64c5b
)
- Deleted_irritating_blank_space (
a4cdfc5
)
- Auto inferred cat features (
ea0d946
) - Auto inferred cat features error (
f244715
) - Resolves errors caused from auto cat features (
667a905
)
- Incorrect function argument (
33e0a3e
) - Expanded test to include outcome, now passes locally (
640e7ec
) - Passing local tests (
6ed4b2e
) - First stab at bug fix (
339d793
)
- Add parents to wandb dir init (
5eefe3a
)
- Add BMI loader (
b6681ea
)
- Refactor feature spec generation (
17e9f16
) - Align arguments with colnames in SQL (
09ae5f7
) - Refactor feature specification (
373b0f0
)
- Hardcoded file suffix (
0101acc
)
- Mismatched version in .tomll (
292979b
)
- Pass value_col only when necessary (
dc1019f
) - Pass value_col (
4674e4a
) - Don't remove NaNs, might be informative. (
1ad5d81
) - Remove parquet default argument except in top level functions (
ec3a98b
) - Align .toml and release version (
80adbde
) - Failing tests (
b5e4321
) - Incorrect feature sets path, linting (
605ccb7
) - Handle dicts for duplicate checking (
34524c0
) - Check for duplicates in feature combinations (
63ad162
) - Remove duplicate alat key which prevented file saving (
f0c3e00
) - Incorrect argumetn (
b97d54b
) - Linting (
7406288
) - Use suffix instead of string parsing (
cfa96f0
) - Refactor dataset loading into a separate function (
bca8cbf
) - More migration to parquet (
f1bc2b7
) - Mark hf embedding test as slow, only run if passing --runslow to pytest (
0e03395
)
- Wandb not logging on overtaci. (
3baab57
)
- Use dask for concatenation, increases perf (
4235f5c
)
- Use pypi release of psycopmlutils (
5283b05
)
- First release to pypi (
c29aa3c
)
- Add test for chunking logic (
199ee6b
)
- First release! (
95a557c
) - Add automatic release (
a5023e5
) - Update dependencies (
34efeaf
) - First rename (
879bde9
) - Init commit (
cdcab07
)