This project involves reading genotype data from a VCF file, processing it, performing Principal Component Analysis (PCA), and visualizing the results. The primary goal is to analyze and visualize genotype data to gain insights into genetic variations across samples.
- VCF File Reading: Extracts genotype data, sample IDs, and variant IDs from a VCF file.
- Panel File Reading: Reads sample IDs and population codes from a panel file.
- Matrix Creation: Converts genotype data into a matrix format suitable for PCA.
- PCA Analysis: Performs PCA on the genotype matrix to reduce dimensionality and visualize genetic variations.
- DataFrame Creation: Constructs a DataFrame with genotype data, variant IDs, and population codes.
- CSV Export: Saves the resulting matrix to a CSV file.
- Visualization: Plots the PCA results to visualize the distribution of genetic variations.