diff --git a/README.org b/README.org index ff92029..1b3f35d 100644 --- a/README.org +++ b/README.org @@ -17,6 +17,8 @@ framework in R *** Algorithm 1: cross-validation for comparing train on same and other +See examples in [[https://cloud.r-project.org/web/packages/mlr3resampling/vignettes/ResamplingSameOtherCV.html][vignette]] and data viz for [[https://tdhock.github.io/2023-12-13-train-predict-subsets-regression/][regression]] and [[https://tdhock.github.io/2023-12-13-train-predict-subsets-classification/][classification]]. + A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a group (such as geographic region, year, etc), then @@ -45,6 +47,8 @@ List of 1 *** Algorithm 2: cross-validation for comparing different sized train sets +See examples in [[https://cloud.r-project.org/web/packages/mlr3resampling/vignettes/ResamplingVariableSizeTrainCV.html][vignette]] and data viz for [[https://tdhock.github.io/2023-12-26-train-sizes-regression/][regression]] and [[https://tdhock.github.io/2023-12-27-train-sizes-classification/][classification]]. + How many train samples are required to get accurate predictions on a test set? Cross-validation can be used to answer this question, with variable size train sets. This is implemented in @@ -62,11 +66,15 @@ List of 4 $ train_sizes : int 5 #+end_src -*** More Usage Examples +*** More Usage Examples and Discussion + +The examples linked below have examples with larger data sizes than +the examples in the CRAN vignettes linked above. -See https://tdhock.github.io/blog/2023/R-gen-new-subsets/ and [[https://cloud.r-project.org/web/packages/mlr3resampling/][vignettes on CRAN]]. +* https://tdhock.github.io/blog/2023/R-gen-new-subsets/ +* [[https://tdhock.github.io/blog/2023/variable-size-train/]] -*** Related work +** Related work mlr3resampling code was copied/modified from Resampling and ResamplingCV classes in the excellent [[https://github.com/mlr-org/mlr3][mlr3]] package.