Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add R code to AFT tutorial #5486

Merged
merged 1 commit into from
Apr 4, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions doc/tutorials/aft_survival_analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ Interval-censored :math:`[a, b]` |tick| |tick|
Collect the lower bound numbers in one array (let's call it ``y_lower_bound``) and the upper bound number in another array (call it ``y_upper_bound``). The ranged labels are associated with a data matrix object via calls to :meth:`xgboost.DMatrix.set_float_info`:

.. code-block:: python
:caption: Python

import numpy as np
import xgboost as xgb
Expand All @@ -105,10 +106,29 @@ Collect the lower bound numbers in one array (let's call it ``y_lower_bound``) a
y_upper_bound = np.array([ 2.0, +np.inf, 4.0, 5.0])
dtrain.set_float_info('label_lower_bound', y_lower_bound)
dtrain.set_float_info('label_upper_bound', y_upper_bound)

.. code-block:: r
:caption: R

library(xgboost)

# 4-by-2 Data matrix
X <- matrix(c(1., -1., -1., 1., 0., 1., 1., 0.),
nrow=4, ncol=2, byrow=TRUE)
dtrain <- xgb.DMatrix(X)

# Associate ranged labels with the data matrix.
# This example shows each kind of censored labels.
# uncensored right left interval
y_lower_bound <- c( 2., 3., -Inf, 4.)
y_upper_bound <- c( 2., +Inf, 4., 5.)
setinfo(dtrain, 'label_lower_bound', y_lower_bound)
setinfo(dtrain, 'label_upper_bound', y_upper_bound)

Now we are ready to invoke the training API:

.. code-block:: python
:caption: Python

params = {'objective': 'survival:aft',
'eval_metric': 'aft-nloglik',
Expand All @@ -118,6 +138,19 @@ Now we are ready to invoke the training API:
bst = xgb.train(params, dtrain, num_boost_round=5,
evals=[(dtrain, 'train'), (dvalid, 'valid')])

.. code-block:: r
:caption: R

params <- list(objective='survival:aft',
eval_metric='aft-nloglik',
aft_loss_distribution='normal',
aft_loss_distribution_scale=1.20,
tree_method='hist',
learning_rate=0.05,
max_depth=2)
watchlist <- list(train = dtrain)
bst <- xgb.train(params, dtrain, nrounds=5, watchlist)

We set ``objective`` parameter to ``survival:aft`` and ``eval_metric`` to ``aft-nloglik``, so that the log likelihood for the AFT model would be maximized. (XGBoost will actually minimize the negative log likelihood, hence the name ``aft-nloglik``.)

The parameter ``aft_loss_distribution`` corresponds to the distribution of the :math:`Z` term in the AFT model, and ``aft_loss_distribution_scale`` corresponds to the scaling factor :math:`\sigma`.
Expand Down