-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
1,150 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
title: | ||
"Test 1" | ||
description: > | ||
This vignette describes how to train a LightGBM model for binary classification. | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{Test 1} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE | ||
, comment = "#>" | ||
, warning = FALSE | ||
, message = FALSE | ||
) | ||
``` | ||
|
||
## Introduction | ||
|
||
Welcome to the world of [LightGBM](https://lightgbm.readthedocs.io/en/latest/), a highly efficient gradient boosting implementation (Ke et al. 2017). | ||
|
||
```{r setup} | ||
library(lightgbm) | ||
``` | ||
|
||
This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the `bank` dataset (Moro, Cortez, and Rita 2014). You will use the two input features "age" and "balance" to predict whether a client has subscribed a term deposit. | ||
|
||
## The dataset | ||
|
||
The dataset looks as follows. | ||
|
||
```{r} | ||
data(bank, package = "lightgbm") | ||
bank[1L:5L, c("y", "age", "balance")] | ||
# Distribution of the response | ||
table(bank$y) | ||
``` | ||
|
||
## Training the model | ||
|
||
The R package of LightGBM offers two functions to train a model: | ||
|
||
- `lgb.train()`: This is the main training logic. It offers full flexibility but requires a `Dataset` object created by the `lgb.Dataset()` function. | ||
- `lightgbm()`: Simpler, but less flexible. Data can be passed without having to bother with `lgb.Dataset()`. | ||
|
||
### Using the `lightgbm()` function | ||
|
||
In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the `lightgbm()` function. | ||
|
||
```{r} | ||
# Numeric response and feature matrix | ||
y <- as.numeric(bank$y == "yes") | ||
X <- data.matrix(bank[, c("age", "balance")]) | ||
# Train | ||
fit <- lightgbm( | ||
data = X | ||
, label = y | ||
, num_leaves = 4L | ||
, learning_rate = 1.0 | ||
, nrounds = 10L | ||
, objective = "binary" | ||
, verbose = -1L | ||
) | ||
# Result | ||
summary(predict(fit, X)) | ||
``` | ||
|
||
It seems to have worked! And the predictions are indeed probabilities between 0 and 1. | ||
|
||
### Using the `lgb.train()` function | ||
|
||
Alternatively, you can go for the more flexible interface `lgb.train()`. Here, as an additional step, you need to prepare `y` and `X` by the data API `lgb.Dataset()` of LightGBM. Parameters are passed to `lgb.train()` as a named list. | ||
|
||
```{r} | ||
# Data interface | ||
dtrain <- lgb.Dataset(X, label = y) | ||
# Parameters | ||
params <- list( | ||
objective = "binary" | ||
, num_leaves = 4L | ||
, learning_rate = 1.0 | ||
) | ||
# Train | ||
fit <- lgb.train( | ||
params | ||
, data = dtrain | ||
, nrounds = 10L | ||
, verbose = -1L | ||
) | ||
``` | ||
|
||
Try it out! If stuck, visit LightGBM's [documentation](https://lightgbm.readthedocs.io/en/latest/R/index.html) for more details. | ||
|
||
```{r, echo = FALSE, results = "hide"} | ||
# Cleanup | ||
if (file.exists("lightgbm.model")) { | ||
file.remove("lightgbm.model") | ||
} | ||
``` | ||
|
||
## References | ||
|
||
Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." In Advances in Neural Information Processing Systems 30 (Nip 2017). | ||
|
||
Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems 62: 22–31. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
title: | ||
"Test 10" | ||
description: > | ||
This vignette describes how to train a LightGBM model for binary classification. | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{Test 10} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE | ||
, comment = "#>" | ||
, warning = FALSE | ||
, message = FALSE | ||
) | ||
``` | ||
|
||
## Introduction | ||
|
||
Welcome to the world of [LightGBM](https://lightgbm.readthedocs.io/en/latest/), a highly efficient gradient boosting implementation (Ke et al. 2017). | ||
|
||
```{r setup} | ||
library(lightgbm) | ||
``` | ||
|
||
This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the `bank` dataset (Moro, Cortez, and Rita 2014). You will use the two input features "age" and "balance" to predict whether a client has subscribed a term deposit. | ||
|
||
## The dataset | ||
|
||
The dataset looks as follows. | ||
|
||
```{r} | ||
data(bank, package = "lightgbm") | ||
bank[1L:5L, c("y", "age", "balance")] | ||
# Distribution of the response | ||
table(bank$y) | ||
``` | ||
|
||
## Training the model | ||
|
||
The R package of LightGBM offers two functions to train a model: | ||
|
||
- `lgb.train()`: This is the main training logic. It offers full flexibility but requires a `Dataset` object created by the `lgb.Dataset()` function. | ||
- `lightgbm()`: Simpler, but less flexible. Data can be passed without having to bother with `lgb.Dataset()`. | ||
|
||
### Using the `lightgbm()` function | ||
|
||
In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the `lightgbm()` function. | ||
|
||
```{r} | ||
# Numeric response and feature matrix | ||
y <- as.numeric(bank$y == "yes") | ||
X <- data.matrix(bank[, c("age", "balance")]) | ||
# Train | ||
fit <- lightgbm( | ||
data = X | ||
, label = y | ||
, num_leaves = 4L | ||
, learning_rate = 1.0 | ||
, nrounds = 10L | ||
, objective = "binary" | ||
, verbose = -1L | ||
) | ||
# Result | ||
summary(predict(fit, X)) | ||
``` | ||
|
||
It seems to have worked! And the predictions are indeed probabilities between 0 and 1. | ||
|
||
### Using the `lgb.train()` function | ||
|
||
Alternatively, you can go for the more flexible interface `lgb.train()`. Here, as an additional step, you need to prepare `y` and `X` by the data API `lgb.Dataset()` of LightGBM. Parameters are passed to `lgb.train()` as a named list. | ||
|
||
```{r} | ||
# Data interface | ||
dtrain <- lgb.Dataset(X, label = y) | ||
# Parameters | ||
params <- list( | ||
objective = "binary" | ||
, num_leaves = 4L | ||
, learning_rate = 1.0 | ||
) | ||
# Train | ||
fit <- lgb.train( | ||
params | ||
, data = dtrain | ||
, nrounds = 10L | ||
, verbose = -1L | ||
) | ||
``` | ||
|
||
Try it out! If stuck, visit LightGBM's [documentation](https://lightgbm.readthedocs.io/en/latest/R/index.html) for more details. | ||
|
||
```{r, echo = FALSE, results = "hide"} | ||
# Cleanup | ||
if (file.exists("lightgbm.model")) { | ||
file.remove("lightgbm.model") | ||
} | ||
``` | ||
|
||
## References | ||
|
||
Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." In Advances in Neural Information Processing Systems 30 (Nip 2017). | ||
|
||
Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems 62: 22–31. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
--- | ||
title: | ||
"Test 2" | ||
description: > | ||
This vignette describes how to train a LightGBM model for binary classification. | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{Test 2} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE | ||
, comment = "#>" | ||
, warning = FALSE | ||
, message = FALSE | ||
) | ||
``` | ||
|
||
## Introduction | ||
|
||
Welcome to the world of [LightGBM](https://lightgbm.readthedocs.io/en/latest/), a highly efficient gradient boosting implementation (Ke et al. 2017). | ||
|
||
```{r setup} | ||
library(lightgbm) | ||
``` | ||
|
||
This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the `bank` dataset (Moro, Cortez, and Rita 2014). You will use the two input features "age" and "balance" to predict whether a client has subscribed a term deposit. | ||
|
||
## The dataset | ||
|
||
The dataset looks as follows. | ||
|
||
```{r} | ||
data(bank, package = "lightgbm") | ||
bank[1L:5L, c("y", "age", "balance")] | ||
# Distribution of the response | ||
table(bank$y) | ||
``` | ||
|
||
## Training the model | ||
|
||
The R package of LightGBM offers two functions to train a model: | ||
|
||
- `lgb.train()`: This is the main training logic. It offers full flexibility but requires a `Dataset` object created by the `lgb.Dataset()` function. | ||
- `lightgbm()`: Simpler, but less flexible. Data can be passed without having to bother with `lgb.Dataset()`. | ||
|
||
### Using the `lightgbm()` function | ||
|
||
In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the `lightgbm()` function. | ||
|
||
```{r} | ||
# Numeric response and feature matrix | ||
y <- as.numeric(bank$y == "yes") | ||
X <- data.matrix(bank[, c("age", "balance")]) | ||
# Train | ||
fit <- lightgbm( | ||
data = X | ||
, label = y | ||
, num_leaves = 4L | ||
, learning_rate = 1.0 | ||
, nrounds = 10L | ||
, objective = "binary" | ||
, verbose = -1L | ||
) | ||
# Result | ||
summary(predict(fit, X)) | ||
``` | ||
|
||
It seems to have worked! And the predictions are indeed probabilities between 0 and 1. | ||
|
||
### Using the `lgb.train()` function | ||
|
||
Alternatively, you can go for the more flexible interface `lgb.train()`. Here, as an additional step, you need to prepare `y` and `X` by the data API `lgb.Dataset()` of LightGBM. Parameters are passed to `lgb.train()` as a named list. | ||
|
||
```{r} | ||
# Data interface | ||
dtrain <- lgb.Dataset(X, label = y) | ||
# Parameters | ||
params <- list( | ||
objective = "binary" | ||
, num_leaves = 4L | ||
, learning_rate = 1.0 | ||
) | ||
# Train | ||
fit <- lgb.train( | ||
params | ||
, data = dtrain | ||
, nrounds = 10L | ||
, verbose = -1L | ||
) | ||
``` | ||
|
||
Try it out! If stuck, visit LightGBM's [documentation](https://lightgbm.readthedocs.io/en/latest/R/index.html) for more details. | ||
|
||
```{r, echo = FALSE, results = "hide"} | ||
# Cleanup | ||
if (file.exists("lightgbm.model")) { | ||
file.remove("lightgbm.model") | ||
} | ||
``` | ||
|
||
## References | ||
|
||
Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." In Advances in Neural Information Processing Systems 30 (Nip 2017). | ||
|
||
Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems 62: 22–31. |
Oops, something went wrong.