Skip to content

The goal of this project is to predict if a person has any of three medical conditions.

Notifications You must be signed in to change notification settings

ariesiitr/ICR--Identifying-age-related-conditions

Repository files navigation

Identifying Age-Related Conditions (ICR)

Introduction

The objective of this project is to identify age-related conditions using a dataset provided for the "ICR - Identifying Age-Related Conditions" competition on Kaggle. The analysis involves data preprocessing, exploratory data analysis, feature engineering, and model building to predict the target variable related to age-related conditions.

Since our data was cluttered, we had a lot of opportunities in this project, we worked on various preprocessing and analysis methods and found out the best features to rely on, we also discovered the use of a variety of metrics thereby improving our model accuracy .

Data Description

The dataset contains various features, including medical and demographic information. Key columns include:

  • Id: Unique identifier for each entry.
  • target: The target variable indicating the presence of an age-related condition.
  • Multiple numerical and categorical features representing patient data.

Methodology

Data Preprocessing

  1. Loading Data: The data is loaded from CSV files.
  2. Handling Missing Values: Missing values are addressed by imputation or removal.
  3. Encoding Categorical Variables: Categorical variables are encoded using techniques like one-hot encoding.

Exploratory Data Analysis (EDA)

  • Visualization: Histograms, scatter plots, and correlation matrices are used to understand data distribution and relationships.
  • Summary Statistics: Descriptive statistics provide insights into the central tendency and variability of features.

Feature Engineering

  • Scaling: Numerical features are standardized to ensure uniformity.
  • Interaction Features: New features are created based on interactions between existing features to capture complex relationships.

Model Building

  1. Model Selection: Various models, including Logistic Regression, Decision Trees, and Random Forest, are evaluated.
  2. Hyperparameter Tuning: Grid search and cross-validation are used to optimize model parameters.
  3. Evaluation Metrics: Models are evaluated using metrics such as accuracy, precision, recall, and F1-score.

Results

  • Model Performance: The performance of different models is compared, and the best-performing model is selected.
  • Feature Importance: Key features contributing to the model's predictions are identified.

Visualizations

  • ROC Curves: Receiver Operating Characteristic curves to evaluate the model's ability to distinguish between classes.
  • Confusion Matrix: Visualization of the confusion matrix to understand the classification performance.

Conclusion

The project successfully identifies the most significant features and builds a predictive model for age-related conditions.

Made By:

Pranjal Vanjale

Siddhant Gupta

Arihant Aggarwal

Kuldeep

About

The goal of this project is to predict if a person has any of three medical conditions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •