Skip to content

Developed at least 8 models based on at least 7 different datasets to address depression problem.

Notifications You must be signed in to change notification settings

ziqinyeow/Depression-ML-Problem

Repository files navigation

Abstract and Introduction

There are eight notebooks here, and each notebook will address different types of problems based on different datasets. Although some datasets are not really reliable, it's worth trying and researching. For example, we approach this depression detection problem using the ensemble technique by combining all the best models trained in each notebook.

Notebook 1 - ML/Structured/Tabular/CSV
Notebook 2 - ML/Structured/Tabular/CSV
Notebook 3 - DL/Unstructured/Image/TIFF
Notebook 4 - DL/Unstructured/Image/PNG
Notebook 5 - DL/Unstructured/Image/PNG
Notebook 6 - DL/Unstructured/Text/CSV
Notebook 7 - DL/Unstructured/Text/CSV
Notebook 8 - DL/Unstructured/Text/CSV

Production Deliverables

Web App
Web GitHub

Notebook 1 - Link

Tabular 5-class classification problem - 10 questions to classify the person is normal or having mild, moderate, severe or extremely severe depression.

Steps including

  1. Data analysis
  2. Feature Engineering
  3. Feature Selection
  4. Data Preparation
  5. Model Experiment
  6. Model Evaluation
  7. Model Export

6 types of models were being built

  1. Naive Bayes (acc: 0.8722)
  2. K Nearest Neighbour (acc: 0.8983)
  3. Support Vector Machine (acc: 0.9576)
  4. Decision Tree (acc: 0.8066)
  5. Random Forest (acc: 0.9017)
  6. Neural Network (acc: 0.9636)

The accuracy and time of each model were compared.

Data source: GitHub

Notebook 2 - Link

Tabular 2-class classification problem - 30 questions to classify the person is depression or non-depression.

1 type of model was being built

  1. Neural Network (acc: 0.893)

Data source: GitHub

Notebook 3 - Link

No implementation yet.

The JAFFE dataset consists of 213 images of different facial expressions from 10 different Japanese female subjects. Each subject was asked to do 7 facial expressions (6 basic facial expressions and neutral) and the images were annotated with average semantic ratings on each facial expression by 60 annotators.

Data source: JAFFE

Notebook 4 - Link

Image binary (2-class) classification problem - Images with 3 color channels (RGB) were trained to perform binary classification on depression and non-depression classes.

Models built - Notebook 4

  1. Self-defined CNN (acc: 0.505)
  2. Efficient Net Fine Tuned (acc: 0.504)

Data source: Kaggle

Note: Dataset was being classified based on self-experienced (eg: Happy -> Non-depression) but not fully reliable

Notebook 5 - Link

The technique performed and data are the same as Notebook 4. The only difference is dataset is not processed based on self-experienced (eg: Happy -> Happy). This notebook is more towards emotion classification based on images.

Models built - Notebook 5

  1. Self-defined CNN (acc: 0.3435)
  2. Efficient Net Fine Tuned (acc: 0.4648)

Notebook 6 - Link

Text classification (28-class) problem - Text loaded from Go Emotions HuggingFace dataset to fit into a pretrained Bert tokenizer and model that classify the text emotion (e.g.: fear, embarrassment, happy...).

This model is a fine-tuned version of microsoft/xtremedistil-l6-h384-uncased on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1234

Data source: HuggingFace

Notebook 7 - Link

Text classification (2-class) problem - Text loaded from HuggingFace dataset (self-pushed dataset from Kaggle) to fit into a pretrained Bert tokenizer and model that classify whether the text is depression or non-depression. \

This model is a fine-tuned version of microsoft/xtremedistil-l6-h384-uncased on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.1606

Accuracy: 0.9565

I have pushed the model to HuggingFace

Data Source:

  1. Data 1
  2. Data 2
  3. Data 3

Data have been preprocess - preprocess notebook and push to HuggingFace

Notebook 8 - Link

Unsupervised representational text generation problem - Approximately 100 rows of texts were collected from multiple data source. Fine-tune a pretrained Distilled GPT2 model from HuggingFace.

This model is a fine-tuned version of distilgpt2. It achieves the following results on the evaluation set:

Loss: 3.3740

This model couldn't be exported after several trials. So we decided to train a model pipeline to be able to generate 1000 suggestions and save them to a CSV file.

Generated Suggestions: See Generated Text

The model was pushed to HuggingFace Hub

Data source: GitHub

Data have been preprocess - preprocess notebook and push to HuggingFace Hub

Conclusion

Based on the work done:

  1. Tabular - Notebook 1, Notebook 2, Notebook 3
  2. Computer Vision - Notebook 4, Notebook 5
  3. Natural Language Processing - Notebook 6, Notebook 7

We chose:

  1. Notebook 1 - model 10
  2. Notebook 2 - model 1
  3. Notebook 3 - ❌
  4. Notebook 4 - ❌
  5. Notebook 5 - ❌
  6. Notebook 6 - model 1
  7. Notebook 7 - model 1
  8. Notebook 8 - generated_suggestions (1000 records)

to build our web application.

Seem's like some models perform inaccurately, especially vision models; we are not going to choose the models to build our web applications. So only tabular and language model will be used in the web application.

We will be using Next.js and some of the amazing tools to build the web application.

Web App
Web GitHub

Reference

https://www.mayoclinic.org/diseases-conditions/depression/diagnosis-treatment/drc-20356013

About

Developed at least 8 models based on at least 7 different datasets to address depression problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published