The objective of this project is to analyze gene expression data to identify pathways and genes involved in bone growth and remodeling. Understanding these pathways can provide insights into bone-related diseases and potential therapeutic targets.
- Successfully loaded various datasets from the Gene Expression Omnibus (GEO).
- Manually downloaded Annotation, Full and Series family metadata files
- Example dataset
- load_data.py
- Cleaned and normalized the data to ensure it was ready for analysis.
- preprocess_data.py
- Performed descriptive statistics and visualizations to understand the data distribution.
- Generated histograms and box plots for numerical columns.
- eda.py
- Conducted correlation analysis to identify relationships between gene expressions.
- Performed statistical tests (t-tests and ANOVA) to identify significant differences in gene expression related to bone growth and remodeling.
- Handled large datasets and ensured sufficient data for statistical tests.
- analysis.py
- Optimization
- Runs quite slow for 1.7 GB of Data, how to speed up?
-
Gene Ontology (GO) Enrichment Analysis:
- Identify overrepresented GO terms in the significant genes.
-
Pathway Analysis:
- Use tools like KEGG or Reactome to map gene expression data to biological pathways.
- Perform and interpret these analyses to uncover pathways involved in bone growth and remodeling.
-
Predictive Modeling:
- Develop predictive models (e.g., regression, machine learning) to identify key genes and pathways involved in bone growth and remodeling.
-
Validation:
- Validate the models using cross-validation or a separate test dataset.
-
Report Writing:
- Document the findings, methods, and results in a detailed report.
-
Presentation Preparation:
- Prepare a presentation summarizing the research, methods, and key findings.
The datasets used in this project can be downloaded from the following link:
- Clone the repository:
git clone https://github.com/saketh-n/bone-io.git cd bone io
- Download the datasets
- Visit the link and download the datasets
- Place the Datasets folder within the same directory as the scripts file
- Install the required dependencies:
pip3 install -r requirements.txt
- Run the scripts for steps 1-4
- STEP 1
python3 load_data.py
- STEP 2
python3 preprocess_data.py
- STEP 3
python3 eda.py
- STEP 4
python3 analysis.py