A template folder structure is attached. Your submission must confrom with that structure. The structure is explained below. All timings are approximate for a mid range (core-i5, 8GB ram, no GPU) laptop. You need to split your data as follows, 80% for training, 10% for valiadation and 10% for testing. Your data should be in *.zip format as in the template. If you need it unzipped to execute your programs you can do that within your program (using python module like zipfile) in Tmp folder. But remove these files (also from your program) once you are done.
Data\data.zip :- Original unprocessed raw data. This file is the only file that you will put manually. All the following files should be generated by data.py file. Do not upload this file during submission as you already gave them to me.
Data\Train\Under_10_min_training\data.zip :- A subset of training data where a 5 epoch training takes less than 10 min
Data\Train\Under_90_min_tuning\data.zip :- A subset of training data where a 10 epoch training takes less than 90 min. This subset should be used for each hyperparameter combination during tuning.
Data\Train\Best_hyperparameter_80_percent\data.zip :- 80 percent training data. This should be used for training with optimal hyperparameter settings. This learned model must be saved to use separately with test data.
Data\Validation\3_samples\data.zip :- A 3 sample set for validation
Data\Validation\Validation_10_percent\data.zip :- 10 percent validation data. This should be used to evaluate each hyperparameter combination during tuning.
Data\Test\Test_10_percent\data.zip :- 10 percent test data. This should be used to evaluate performance of your saved model.
tuning_results.txt :- Performance for each hyperparameter combination during tuning
hyperparameter.txt :- Optimal hyperparameters after tuning
model.h5 :- Your saved model in HDF5 format
Results.docx :- Tuning and test results in table format.
script.bat :- Install any dependencies for your program.
data.py :- All data preprocessing code
train.py :- All training and model saving code
tune.py :- All tuning, hyperparameter search and validation code. This will call training module from train.py
test.py :- All model loading and testing code
Lib\ :- Any other code, library you need
Tmp\ :- Created runtime. All temporary data should reside in this folder. Deleted at the end of execution.
Execution order is given below. We will assume that current directory is your project folder. Following script is for windows. For linux the python command will not contain .exe extention. Command line arguments in the same line with python command are input files. Following files are output files.
###############################################################################################################################################################
md Tmp
script.bat
python.exe data.py .\Data\data.zip .\Data\Train\Best_hyperparameter_80_percent\ .\Data\Validation\Validation_10_percent\ .\Data\Test\Test_10_percent\ .\Data\Train\Under_10_min_training\ .\Data\Train\Under_90_min_tuning\ .\Data\Validation\3_samples\
python.exe tune.py .\Data\Train\Under_90_min_tuning\data.zip .\Data\Validation\Validation_10_percent\data.zip .\tuning_results.txt .\hyperparameter.txt
python.exe train.py .\Data\Train\Best_hyperparameter_80_percent\data.zip .\hyperparameter.txt .\model.h5
python.exe test.py .\Data\Test\Test_10_percent\data.zip .\model.h5
rd Tmp /s /q
###############################################################################################################################################################