Skip to content

Machine learning tools for running repeated nested cross(-dataset)-validation.

Notifications You must be signed in to change notification settings

LudvigOlsen/generalize

Repository files navigation

Generalize

Author: Ludvig R. Olsen ( r-pkgs@ludvigolsen.dk )

The ultimate goal of training machine learning models is to generalize to new, unseen data. This package contains tools for measuring model performance across multiple datasets via cross-dataset-validation (aka. leave-one-dataset-out).

Under development!

  • Not generalized enough for general usage (ironic, I know)
  • Relies on an old version of scikit-learn, needs updating
  • Linear regression is not currently working
  • Help strings are likely not up-to-date

Main functions and classes

Function Description
nested_cross_validate() Run (repeated) nested cross-validation.
train_full_model() Train model on all data and save to disk.
evaluate_univariate_models() Evaluate prediction potential of every predictor separately.
PipelineDesigner Design a scikit-learn pipeline for use in cross-validation.
ROCCurve, ROCCurves ROC curve containers with various utility methods.
select_samples() Utility for selecting samples based on (collapsed) labels.

About

Machine learning tools for running repeated nested cross(-dataset)-validation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages