Parallel Random Forests

A parallelized version of random forests learning algorithm.

Based on Weka's implementation of Breiman random forest construction.
Support continuous features, which are repeatedly used during split.
Support using Infogain / Gini impurity as split criteria.
2x speedup over Weka Random Forests (for high dimensional dataset).
Scalable speedup by OpenMP and Open mpi parallelization.

Configuration

In 'Classifier.h', change following variables:

  NUM_TREES               // Number of trees to construct
  RANDOM_FEATURE_SET_SIZE // Number of random features to be considered for finding the best split candidates

In 'TreeBuilder.h', change following variables:

  MIN_NODE_SIZE           // Minimum size of a node that can be considered as a leaf
  MIN_NODE_SIZE_TO_SPLIT  // Minimum size of a node that can be further split

Dataset and testing

Sentiment analysis of 50000 movie reviews from IMDb (25000 for training, 25000 for testing).
Used top 10/50/200/1000 words with highest frequencies of occurrences, achieved the same accuracies.
Test environment: Ubuntu Gnome 16.04, vlsci clusters (for distributed execution on clusters)

Terms of use for the dataset

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011).
Learning Word Vectors for Sentiment Analysis.
The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
Dataset		Dataset
.gitignore		.gitignore
ArffImporter.cpp		ArffImporter.cpp
ArffImporter.h		ArffImporter.h
BasicDataStructures.h		BasicDataStructures.h
Classifier.cpp		Classifier.cpp
Classifier.h		Classifier.h
Helper.c		Helper.c
Helper.h		Helper.h
LICENSE		LICENSE
Main.c		Main.c
Makefile		Makefile
README.md		README.md
TreeBuilder.cpp		TreeBuilder.cpp
TreeBuilder.h		TreeBuilder.h
submitToSpartan.sh		submitToSpartan.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Random Forests

Configuration

Dataset and testing

Terms of use for the dataset

About

Releases

Packages

Languages

License

YSZhuoyang/parallel-random-forests

Folders and files

Latest commit

History

Repository files navigation

Parallel Random Forests

Configuration

Dataset and testing

Terms of use for the dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages