Keywords: random forest, decision tree, prediction
This project provides a Java implementation of random forests [1, 2]. Random forests use training sets to build decision trees. Given an input (e.g. a person with age, gender, medical background, symptoms) the result (e.g. a disease) of which is unknown, random forests are able to predict the corresponding result.
The parameters that will be used to build random forests. The default values are :
int minSamplesSplit = 2;
int maxDepth = Integer.MAX_VALUE;
double minImpurityDecrease = 1e-07;
int minSampleLeaf = 1;
int maxFeatures = Integer.MAX_VALUE;
int nbTrees = 10;
Long seed = null;
Return a builder to setup the parameters of the random forest. The available functions to update the default values are :
// Builder example
Parameter p = new Parameter.Builder()
.nbTrees(200)
.maxFeatures(3)
.build();
Constructor of the random forest.
Train the random forest using a list of tuples D
.
This function only takes into account the getters of D
that are annotated
with a Feature
which is either ORDERED
or CATEGORICAL
.
The getter of the target (or result) must be annotated by Target
with a
type which is either CONTINUOUS
or DISCRETE
.
// Annotation example
@Feature(FeatureType.ORDERED)
public Integer getAge() {
return age;
}
@Target(TargetType.DISCRETE)
public Integer getSurvived() {
return survived;
}
Predict the result R
according to the data D
.
Get the list of features sorted by decreasing importance.
A usage example about Titanic survivors is available at broceliande-example.
[1] Leo Breiman. Random Forests. Machine Learning. vol. 45, p. 5-32. 2001.
[2] Gilles Louppe. Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502, 2014.