Ensembles

What is this project?

This project contains an implementation of Random Forest and Gradient Boosting algorithms, visualized in a Flask web server.

Installation

First you need to get the Docker container. There are to ways to do that:

Clone this repository and execute build.sh script to build the Docker container yourself:
```
scripts/build.sh
```
Pull the container from dockerhub.com and tag it.

Choose ml_server-amd64 or ml_server-arm64 according to your architecture.
```
docker pull antonyfrolov/ensembles:ml_server-amd64
docker tag antonyfrolov/ensembles:ml_server-amd64 ml_server
```

To run the Docker container execute run.sh script:

scripts/run.sh 5000

Then connect to port 5000 (click).

If port 5000 is not available, you can change it by passing it as an argument to run.sh.

scripts/run.sh <port>

Make sure you have permissions to execute the scripts:

chmod +x scripts/build.sh scripts/run.sh

Application interface

Model creation page

This page is a simple form for choosing the model and its parameters. (For Random Forest leave the learning rate field blank)

Training page

Next page allows you to upload a training dataset and an optional validation dataset. You can also choose a fraction of your training dataset to be used as validation data by specifying a float number in a corresponding field.

The default name for target feature is TARGET, but you can specify another one in Target feature field.

Auto-preprocessing treats all float features and integer features with more then two unique values as numeric, binary integer features as binary and all other features as categorical. You can specify types of features yourself in the fields above. To do that provide lists of feature names separated by ', '.

Main page

On the main page you can find model parameters as well as names of training and validation datasets.

You can create a new model, train existing model again or proceed to evaluation page through the links.

To make a prediction first load a test dataset and click Predict! button. All columns of the test dataset should be exactly the same as the columns from the training one, except for the target column which must not be included. The prediction will be downloaded as a .txt file.

Evaluation page

The evaluation page consists of two graphs:

Training and validation RMSE for each number of estimators.
Total training time for each number of estimators.

Also there you can find best train and validation RMSE values and total training time.

Data samples

You can find some data samples for training and making predictions in the data folder.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
data		data
scripts		scripts
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
experiments.ipynb		experiments.ipynb
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensembles

What is this project?

Installation

Application interface

Model creation page

Training page

Main page

Evaluation page

Data samples

About

Releases

Packages

Languages

antony-frolov/ensembles

Folders and files

Latest commit

History

Repository files navigation

Ensembles

What is this project?

Installation

Application interface

Model creation page

Training page

Main page

Evaluation page

Data samples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages