Autism Spectrum Disorder (ASD) Prediction Using Machine Learning

This project aims to predict Autism Spectrum Disorder (ASD) using machine learning models for both adults and toddlers. By analyzing responses to screening questions, the project leverages Random Forest classifiers to assess ASD risk. Visualizations and performance metrics help in interpreting the models' effectiveness.

Introduction

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition. Early diagnosis is crucial for providing timely intervention and support. This project builds machine learning models, specifically Random Forest classifiers, to predict ASD in both adults and toddlers based on responses to a set of screening questions.

The project also provides a set of visualizations to explore the characteristics of the dataset, as well as the performance of the models.

Dataset

Project tree - The dataset includes two parts:

Adults: Consisting of ASD screening results for adults.
Toddlers: Consisting of ASD screening results for toddlers.

Each dataset includes features like:

Age
Gender
Ethnicity
Family history of ASD
Screening results
Previous use of the screening app
History of jaundice (Jundice)

Visualization

Several visualizations were created to explore the dataset and help understand the distribution and relationships between key features.

Age Distribution

Purpose: To visualize the distribution of ages in the adult and toddler datasets.

Method:
- Create histograms for adults and toddlers to show the frequency of different age groups.
- Annotations: Add labels, titles, and grid lines for better interpretation.
- Saving the Plot: Save the histogram as age_distribution.png.

Gender Distribution

Purpose: To compare the gender distribution between adults and toddlers.

Method:
- Use a horizontal bar plot to represent the count of each gender.
- Annotations: Add labels, titles, and grid lines for better interpretation.
- Saving the Plot: Save the bar chart as gender_distribution.png.

Age vs Result

Purpose: To visualize the relationship between age and the screening result for adults and toddlers.

Method:
- Create a scatter plot with age on the x-axis and result on the y-axis.
- Annotations: Add labels, titles, and grid lines for better interpretation.
- Saving the Plot: Save the scatter plot as age_vs_result.png.

Ethnicity Distribution

Purpose: To analyze the distribution of different ethnicities in the adult and toddler datasets.

Method:
- Count the occurrences of each ethnicity using value_counts().
- Create a vertical bar plot with separate bars for adults (blue) and toddlers (orange).
- Rotate x-axis labels to avoid overlapping.
- Annotations: Add labels, titles, and grid lines for better interpretation.
- Saving the Plot: Save the bar plot as ethnicity_distribution.png

Jundice Distribution

Purpose: To analyze the distribution of jaundice history in the adult and toddler datasets.

Method:
- Count the occurrences of jaundice history (Yes/No).
- Create separate pie charts for adults and toddlers, each representing the proportion of individuals with or without a history of jaundice.
- Annotations: Add titles to each pie chart segment.
- Saving the Plot: Save the pie charts as jundice_distribution.png.

Used App Before Distribution

Purpose: To analyze the distribution of previous app usage in the adult and toddler datasets.

Method:
- Count the occurrences of previous app usage (Yes/No).
- Create a horizontal bar plot with separate bars for adults (blue) and toddlers (orange).
- Annotations: Add labels, titles, and grid lines for better interpretation.
- Saving the Plot: Save the bar plot as used_app_before_distribution.png.

Machine Learning Model

Random Forest Classifier

The Random Forest classifier is used to predict ASD based on the input data. The classifier was trained separately for adults and toddlers, and its performance was evaluated using standard metrics such as accuracy, precision, recall, and F1 score.

Adult Model Performance

Accuracy: 86%
Precision: 89%
Recall: 86%

These results show that the model is effective in predicting ASD for the adult population, with strong precision and recall metrics indicating its reliability in identifying true positives and minimizing false negatives.

Toddler Model Performance

Accuracy, Precision, Recall, F1 Score: 97%

The toddler model achieved perfect results, which may be due to the size and simplicity of the dataset. Further testing is required to confirm the model's robustness in larger and more complex datasets.

Confusion Matrix Analysis

A confusion matrix was generated for both models, providing deeper insight into their performance by visualizing:

True Positives (TP)
True Negatives (TN)
False Positives (FP)
False Negatives (FN)

This analysis helps identify any potential biases or errors in the model's predictions.

Output

Adult-

Toddler-

Feature Importance

Feature importance analysis was conducted to identify which features were most influential in predicting ASD. This analysis can help in understanding key indicators of ASD and may guide future research.

Conclusion

This project demonstrates the potential of using machine learning, particularly Random Forest classifiers, to predict Autism Spectrum Disorder. The models show promising performance, especially for the toddler dataset. However, more data and testing are required to generalize these findings to broader populations.

Future work includes:

Refining the models with larger datasets.
Exploring more complex machine learning algorithms.
Investigating the implications of feature importance in understanding ASD risk factors.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
adult.py		adult.py
adult_data.csv		adult_data.csv
adult_testing_data.csv		adult_testing_data.csv
adult_training_data.csv		adult_training_data.csv
asd.pdf		asd.pdf
autism_prediction_random_forest.py		autism_prediction_random_forest.py
dataset.py		dataset.py
form_gui.py		form_gui.py
matplot.py		matplot.py
toddler_data.csv		toddler_data.csv
toddler_testing_data.csv		toddler_testing_data.csv
toddler_training_data.csv		toddler_training_data.csv
toddlers.py		toddlers.py
train-autism-prediction-rf.py		train-autism-prediction-rf.py
train_test_split.py		train_test_split.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autism Spectrum Disorder (ASD) Prediction Using Machine Learning

Table of Contents

Introduction

Dataset