Skip to content

Latest commit

 

History

History
119 lines (101 loc) · 5.25 KB

README.md

File metadata and controls

119 lines (101 loc) · 5.25 KB

Source code for "Biodiversity Image Quality Metadata Augments Convolutional Neural Network Classification of Fish Species" (Manuscript) presented at MTSR2020.

Organization:
├── HGNN
│   ├── analyze
│   │   ├── analyze_experiments.ipynb                              Confusion matrices
│   │   ├── analyze_trial.ipynb                                    
│   │   ├── calculate_random_accuracy.ipynb                        
│   │   ├── correlation_of_features.ipynb                          
│   │   ├── flashtorch_modified.py                                 
│   │   ├── obtain_misclassified_examples.ipynb                    Extract misclassified and look at quality factors
│   │   └── plot_network.ipynb                                     
│   ├── dataPrep
│   │   ├── Segment_images.ipynb                                   
│   │   ├── copy_image_subset.ipynb                                Useful for subsets (called "suffix" for some reason)
│   │   ├── create_cleaned_metadata.ipynb                          metadata -> cleaned_metadata
│   │   ├── generate_augmented_data.ipynb                          rotate and such
│   └── train
│       ├── CNN.py                                                 Where the rubber meets the road
│       ├── CSV_processor.py                                       Parse a experiment-wide result and metadata file
│       ├── ConfigParserWriter-QualityExperiments.ipynb            Generate config for the quality experiments
│       ├── configparser_all_graded_images.ipynb                   Generate config for the whole corpus experiment
│       ├── cifar_nin.py                                           
│       ├── configParser.py                                        Parse an individual experiment config
│       ├── dataLoader.py                                          Train/test manager
│       ├── resnet_cifar.py                                        
│       └── train.py                                               Run everything
├── metadata
│   ├── HDR-IMG_2020.01.27.txt                                     All INHS teleost taxa
│   ├── hdrwebform_20200728.csv                                    Manually curated quality metadata
│   ├── high_qual_metadata.tsv                                     Taxonomically balanced high quality subset
│   ├── low_qual_metadata.tsv                                      Taxonomically balanced low quality subset
│   └── orig_cleaned_metadata.tsv                                  Fish with at least 10 individuals

Install dependencies: Using pip

virtualenv .
pip install -r pip_requirements.txt

or using Conda

conda create -n hgnn
conda activate hgnn
conda install conda_requirements.yaml

Install myhelpers:

pip install git+ssh://git@github.com/hdr-bgnn/myhelpers.git

Setup paths and parameters:

Alter the config.default.ini to create a config.ini in the HGNN directory

[general]
experimentsPath = /home/elhamod/HGNN/experiments/
dataPath = /home/elhamod/projects/HGNN/data
experimentName = BestModel
fine_csv_fileName = metadata.csv

[dataPrep]
image_path = INHS_cropped
output_directory_name = 52
cleaned_species_csv_fileName = cleaned_metadata.csv
species_csv_fileName = metadata.csv
numberOfImagesPerSpecies = 50

[augmented]
applyHorizontalFlip = True
applyHorizontalFlipProbability = 0.1
applyTranslation = True
applyTranslationAmount = 0.1
applyColorPCA = True
colorPCAperturbationMagnitude = 0.1
numOfAugmentedVersions = 10
cuda = 0
    

[train]
modelFinalCheckpoint = finalModel.pt
statsFileName = stats.csv
timeFileName = time.csv
epochsFileName = epochs.csv
paramsFileName = params.json


# cleaned up metadata file that has no duplicates, invalids, etc
cleaned_fine_csv_fileName = cleaned_metadata.csv

# Saved file names.
statistic_countPerFine = count_per_fine.csv
statistic_countPerFamilyAndGenis = count_per_family_genus.csv

experimentsFileName = experiments.csv

[metadatatableheaders]
fine_csv_fileName_header = fileName
fine_csv_scientificName_header = scientificName
fine_csv_Coarse_header = Genus
fine_csv_Family_header = Family

[dataloader]
testIndexFileName = testIndex.csv
valIndexFileName = valIndex.csv
trainingIndexFileName = trainingIndex.csv
paramsFileName = params.json
  • Have your data with metadata.csv in a directory

  • Create a config file using HGNN/train/ConfigParserWriter-*

  • train a model using train.py. e.g.: python3 train.py --cuda=7 --name="learningRateTest" --experiments="/home/elhamod/HGNN/experiments/" --data="/data/BGNN_data"

  • analyze the data using jupyter notebooks under HGNN/analyze/ .e.g Analyze trial, Analyze experiments.

  • Once trained and analyzed, an experiment folder with name / will have been created, with a "models" folder that has all the trained model, a "results" folder with the analysis, and a "datasplits" folder with the indexes of images used for train/val/test.