Apologies for the oversight. Here's the updated README with the license file named license.txt
:
Welcome to the RNA-Seq Data Analysis repository! This repository contains the scripts and documentation for the analysis of RNA sequencing (RNA-Seq) data using various tools such as FastQC, Trimmomatic, MultiQC, HISAT2, HTSeq-Count, and DESeq2. The purpose of this analysis is to gain insights into gene expression patterns and identify differentially expressed genes in the provided RNA-Seq data.
RNA-Seq is a powerful tool for studying gene expression patterns at the transcriptome level. This repository provides a step-by-step analysis pipeline to process and analyze raw RNA-Seq data. The pipeline involves quality control, read trimming, alignment, read counting, and differential gene expression analysis using popular bioinformatics tools.
Before running the analysis pipeline, make sure you have the following tools and software installed on your system:
- FastQC (version 0.12.0): A quality control tool for high throughput sequence data.
- Trimmomatic (version 0.32): A tool for trimming and filtering sequencing reads.
- MultiQC (version 1.14): A tool for aggregating and visualizing results from multiple bioinformatics tools.
- HISAT2 (version 2.1.0): A fast and sensitive alignment tool for mapping RNA-Seq reads to a reference genome.
- HTSeq-Count (version 2.0.3): A tool for counting aligned reads in specific genomic features.
- DESeq2 (version 1.40.2): A powerful R package for differential gene expression analysis.
Make sure to update the versions with the appropriate ones you are using.
The raw RNA-Seq data used in this analysis is not included in this repository. Please ensure you have access to the data and place it in the appropriate input directory before running the pipeline.
The analysis pipeline is divided into several steps:
- Quality Control: FastQC is used to assess the quality of the raw sequencing data.
- Read Trimming: Trimmomatic is employed to remove low-quality bases and adapters from the reads.
- Alignment: HISAT2 is used to align the processed reads to a reference genome.
- Read Counting: HTSeq-Count is used to count the number of reads mapped to each gene.
- Differential Expression Analysis: DESeq2 is used to identify differentially expressed genes between experimental conditions.
To run the RNA-Seq data analysis pipeline, follow these steps:
- Clone this repository to your local machine:
git clone https://github.com/abhijitswain09/NGS-data-analysis.git
- Place the raw RNA-Seq data in the appropriate input directory.
- Install the required dependencies listed in the "Dependencies" section.
- Execute the preprocessing, mapping, and expression count script:
bash ngs.sh
- The results will be generated in the output directory.
- To perform the differential expression analysis, run the
deseq2.R
script using R.
The results of the preprocessing, mapping, and expression count steps will be stored in the output directory. This will include various files and plots providing insights into the gene expression patterns.
The results of the differential expression analysis will be generated by the deseq2.R
script and will be stored separately.
This project is licensed under the MIT License.
If you wish to contribute to this project, feel free to open issues, submit pull requests, or suggest improvements. We welcome your contributions!
Thank you for using this RNA-Seq data analysis pipeline. If you have any questions or encounter any issues, please don't hesitate to contact us.
Happy analyzing!