vignettes/mlr3learners_lightgbm_multiclass_gpu.Rmd

---
title: "mlr3learners.lightgbm: Multiclass Classification GPU Example"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    keep_md: true
vignette: >
  %\VignetteIndexEntry{mlr3learners_lightgbm_multiclass_gpu}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

# Install the lightgbm R package with GPU support.

Before you can use the `mlr3learners.lightgbm` package with GPU acceleration, you need to install the `lightgbm` R package according to [its documentation](https://github.com/microsoft/LightGBM/blob/master/R-package/README.md) (this is necessary since lightgbm is neither on CRAN nor installable via `devtools::install_github`).
You can compile the GPU version on Linux/ in docker with the following commands: 

```{bash}
git clone --recursive --branch stable --depth 1 https://github.com/microsoft/LightGBM
cd LightGBM && \
Rscript build_r.R --use-gpu
```

Then you can install the `mlr3learners.lightgbm` package:

```{r}
install.packages("devtools")
devtools::install_github("kapsner/mlr3learners.lightgbm")
```

```{r}
library(mlr3)
library(mlr3learners.lightgbm)
```

# Create the mlr3 task

```{r}
task = mlr3::tsk("iris")
```

To have independent validation data and test data, we further create a list `split`, containing the respective row indices.

```{r}
set.seed(17)
split = list(
  train_index = sample(seq_len(task$nrow), size = 0.7 * task$nrow)
)
split$test_index = setdiff(seq_len(task$nrow), split$train_index)
                            
table(task$data()[split$train_index, task$target_names, with = F])
table(task$data()[split$test_index, task$target_names, with = F])
```

# Instantiate the lightgbm learner 

Then, the `classif.lightgbm` class needs to be instantiated: 

```{r}
learner = mlr3::lrn("classif.lightgbm")
```

# Configure the learner 

In the next step, some of the learner's parameters need to be set. E.g., the parameters `num_iterations` and `early_stopping_round` can be set here. Please refer to the [LightGBM manual](https://lightgbm.readthedocs.io) for further details these parameters. Almost all possible parameters have been implemented here. You can inspect them using the following command: 

```{r eval=FALSE}
learner$param_set
```

In order to use the GPU acceleration, the parameter `device_type = "gpu"` (default: "cpu") needs to be set. According to the [LightGBM parameter manual](https://lightgbm.readthedocs.io/en/latest/Parameters.html), 'it is recommended to use the smaller `max_bin` (e.g. 63) to get the better speed up'. 

```{r}
learner$param_set$values = mlr3misc::insert_named(
  learner$param_set$values,
    list(
      "objective" = "multiclass",
      "device_type" = "gpu",
      "max_bin" = 63L,
      "early_stopping_round" = 10,
      "learning_rate" = 0.1,
      "seed" = 17L,
      "metric" = "multi_logloss",
      "num_iterations" = 100,
      "num_class" = 3
      )
  )
```

# Train the learner 

The learner is now ready to be trained by using its `train` function. 

```{r results='hide', message=FALSE, warning=FALSE, error=FALSE}
learner$train(task, row_ids = split$train_index)
```

# Evaluate the model performance 

Basic metrics can be assessed directly from the learner model: 

```{r}
learner$model$current_iter()
```

The learner's `predict` function returns an object of mlr3's class `PredictionClassif`. 

```{r}
predictions = learner$predict(task, row_ids = split$test_index)
head(predictions$response)
```

The predictions object includes also a confusion matrix:

```{r}
predictions$confusion
```

Further metrics can be calculated by using mlr3 measures:

```{r}
predictions$score(mlr3::msr("classif.logloss"))
```

The variable importance plot can be calculated by using the learner's `importance` function:

```{r}
importance = learner$importance()
importance
```