Action rules for survival analysis

This is a JAVA project that can be used for generating action rules based on survival datasets. You can find here also R project, whose purpose is to generate HTML reports with obtained results.

Java project

Java project is placed in directory survival-action-rules. It uses Gradle.

Building applications from sources

To use this project, you must first add JAR file with project adaa.analytics.rules into directory survival-action-rules/libs. You can download it from this repository on GitHub - it is located in "releases": https://github.com/adaa-polsl/survival-action-rules/releases/tag/v0.1.

The description of application building will be described with the example of IDE Eclipse.

From the Eclipse Marketplace you need to install Buildship Gradle Integration 3.0.
Import the survival-action-rules project as a Gradle project.
Then open the Gradle Tasks view and double-click on the build view. After this, the application should be built.
You can start the application with the proper arguments. The main class of the application is pl.polsl.survival.actionrules.ExperimentExecutor. The argument is a path to an XML file, which contains the configuration of the experiment.

The same results can be achieved using IntelliJ IDEA 2019.1.4.

Import the survival-action-rules project as Gradle project.
Build the application by selecting Build -> Build Project.
Specify the path to the configuration file ./datasets/example-config.xml by selecting Run -> Edit Configurations and filling in the Program arguments field.
Start the application by selecting Run -> Run... -> ExperimentExecutor.

Input data

Input data, on the basis of which the action rules are generated, are given to the program in ARFF format. The startup configuration is done using an XML file, which is given as an argument for the main function of the program. The file consists of 2 parts: parameters and data sets. In one file, the user can specify multiple versions of parameters' combinations and multiple data sets. After starting the program, each set is evaluated with each parameter configuration. The basic structure of the file:

<?xml version="1.0"?>
<experiment>
  <parameter_sets>
    <parameter_set name="param_1"> ... </parameter_set>
    <parameter_set name="param_1"> ... </parameter_set>
    ...
  </parameter_sets>
  <datasets>
    <dataset> ... </dataset>
    <dataset> ... </dataset>
    ...
  </datasets>
</experiment>

The following information can be defined for a given parameter set:

name of the parameter set,
minimum number of examples covered by the rule - parameter min_rule_covered, takes a numeric value,
a parameter that determines whether the target rule should have a better, worse or any other survival curve than the left rule. This is the target_survival_curve parameter. It can have the following values:
- any - the survival curve can have any position.
- better - right rule should be the rule of improvement
- worse - right rule should be worsened rule.
max_common_examples_percentage specifying the maximum percentage of examples common to both rules,
max_rule_covering_percentage specifying the maximum rule coverage.

An example section of parameters in an XML file is presented below:

<parameter_set name ="better-target-rule">
  <param name = "min_rule_covered">10</param>
  <param name = "target_survival_curve">better</param>
  <param name = "max_common_examples_percentage">0.1</param>
  <param name = "max_rule_covering_percentage">0.5</param>
</parameter_set>

The second part of the XML file contains information about the data sets:

<dataset>
  <label>survival_status</label>
  <survival_time>survival_time</survival_time>
  <out_directory>./result_dir/</out_directory>
  <stable_attributes>
    <stable_attribute>a</stable_attribute>
    <stable_attribute>b</stable_attribute>
    ...
  </stable_attributes>
  <training><train>
    <in_file>dataset.arff</in_file >
  </train></training>
</dataset >

The user can define the following information here:

label - the name of the attribute that indicates the survival status,
survival_time - the name of the attribute that indicates the survival time,
out_directory - path to the directory where the results are to be saved for a given dataset,
stable_attributes - a set of stable attributes,
in_file - path to the ARFF file with the dataset.

You can find example XML config file in the datasets directory.

Output data

After the experiment, the results are saved in text files and CSV files. They are placed in a directory specified by the parameter out_directory in the configuration file. Subdirectories are created in this file, each for a different type of parameter configuration. The subdirectory names respond to the parameter_set tags in an XML file. The results of the experiment are saved in 4 files:

result_<datasetName>.txt - a file contains basic information about the data set and general statistics about the generated rules. There are here, too, the rules themselves.
eachRuleStatistic_<datasetName>.csv - a file with detailed indicators for each rule.
estimator_<datasetName>.csv - a file with Kaplan-Meier estimator values. For the left and right side of each rule, there are survival probabilities for the given times.
coveredExamples_<datasetName>.txt - a file with example ids, which are covered by the left and right rule of each generated action rule, respectively. The first example has the identifier 0.

Data availability

Medical datasets:

actg320 (HIV-infected patients): ftp://ftp.wiley.com/public/sci_tech_med/survival
BMT-Ch (bone marrow transplant): https://github.com/adaa-polsl/GuideR/blob/master/datasets/bmt/bone-marrow.arff
cancer (advanced lung cancer patients): survival R package
follic (follicular cell lymphoma patients): randomForestSRC R package
GBSG2 (node-positive breast cancer patients): TH.data R package
hd (Hodgkin's disease patients): randomForestSRC R package
lung (early detection of lung cancer): https://www.stats.ox.ac.uk/pub/datasets/csb
Melanoma (malignant melanoma patients after radical operation): riskRegression R package
mgus (patients with monoclonal gammopathy of undetermined significance): survival R package
pbc (primary biliary cirrhosis of the liver): survival R package
std (occurrence of sexually transmitted diseases): KMsurv R package
uis (drug abuse reduction): quantreg R package
wcgs (occurrence of coronary heart disease): epitools R package
whas1 (myocardial infarction patients, 1st book edition): ftp://ftp.wiley.com/public/sci_tech_med/survival
whas500 (myocardial infarction patients, 2nd book edition): ftp://ftp.wiley.com/public/sci_tech_med/survival
zinc (esophageal cancer): NestedCohort R package

Predictive maintenance datasets:

FD001 (Turbofan Engine Degradation): https://github.com/hankroark/Turbofan-Engine-Degradation/tree/master/CMAPSSData
FD002 (Turbofan Engine Degradation): https://github.com/hankroark/Turbofan-Engine-Degradation/tree/master/CMAPSSData
FD003 (Turbofan Engine Degradation): https://github.com/hankroark/Turbofan-Engine-Degradation/tree/master/CMAPSSData
FD004 (Turbofan Engine Degradation): https://github.com/hankroark/Turbofan-Engine-Degradation/tree/master/CMAPSSData
maintenance: https://github.com/square/pysurvival/blob/master/pysurvival/datasets/maintenance.csv
PUL (Predictive Useful Life): https://www.kaggle.com/tiagotgoz/predictive-useful-life-based-into-telemetry

The above datasets are available in the datasets directory.

R package and report generation

A package surv.action.rules written in R is located in the r-package directory. It is used to generate HTML files with a visualization of the results obtained during experiments with data.

In order to use this package, you need to have installed:

R
RStudio

After opening a project in RStudio you should install and build a project (on the right, in the top corner there should be the Install and Restart button). Then just run the generate_reports() function. Reports for the results in the results directory will be created.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
datasets		datasets
r-package/surv.action.rules		r-package/surv.action.rules
survival-action-rules		survival-action-rules
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Action rules for survival analysis

Java project

Building applications from sources

Input data

Output data

Data availability

R package and report generation

About

Releases 2

Packages

Contributors 3

Languages

adaa-polsl/survival-action-rules

Folders and files

Latest commit

History

Repository files navigation

Action rules for survival analysis

Java project

Building applications from sources

Input data

Output data

Data availability

R package and report generation

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages