Ship ML Experiment DDG

Predict Gross Tonnage of the biggest cruiseships

This experiment consists of two parts: a Jupyter Notebook and a collection of Python files. The notebook is the place where all thoughts are expressed and it shows the build-up of the final product, which is visible in the collection of Python files.

All files can be run in the following environment called 'ddg':

conda env create -f environment.yml

Three models are compared: Linear regression, Random Forest and Gradient Boosting. The latter two are ensembles of models.

Jupyter Notebook: ship.ipynb

I think the notebook speaks for itself. It consists of the following parts:

Explore the dataset and prepare the data
Create new features
Separate the data into train, validation and test set
Test feature importance
Find correct algorithm and Hyperparameter tuning
Test the algorithm

Thoughts and reasons behind choices are explained briefly but can be further discussed during the demo.

Collection of Python files

The Python files will be executed through one main.py file. In the command line you can paste the following:

python main.py

This will print out the train, validation and test error of the final used model and a special bonus.

The collection of files are:

main.py : runs it all, rules it all
helperfunctions.py: some messy functions which I wanted to separate but did not take enough space for own files. Contains function to split the dataset and to show the reslts
tuning.py performs the tuning of the RF an the GB models.
cleanupfeatures.py: file that contains the functions to clean up the features and perform feature selection.
bonus.py: comparison of a simple equation to model performance using RMSE

Extensive feature search is not done in the Python files. These files only use the results of this process. The extensive feature search is clearly explained in the Jupyter Notebook and I am happy to explain it during the demo.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
__pycache__		__pycache__
.gitignore		.gitignore
bonus.py		bonus.py
cleanup_features.py		cleanup_features.py
compare_regressors.py		compare_regressors.py
environment.yml		environment.yml
helperfunctions.py		helperfunctions.py
main.py		main.py
readme.md		readme.md
ship.ipynb		ship.ipynb
tuning.py		tuning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ship ML Experiment DDG

Predict Gross Tonnage of the biggest cruiseships

Jupyter Notebook: ship.ipynb

Collection of Python files

About

Releases

Packages

Languages

sinievanderben/DDG

Folders and files

Latest commit

History

Repository files navigation

Ship ML Experiment DDG

Predict Gross Tonnage of the biggest cruiseships

Jupyter Notebook: ship.ipynb

Collection of Python files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages