PreAlignment QC

Jump to bottom

Obi Griffith edited this page May 30, 2018 · 41 revisions

RNA-seq Flowchart - Module 1

1-vi. Pre-Alignment QC

You can use FastQC to get a sense of your data quality before alignment:

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Video Tutorial here:

http://www.youtube.com/watch?v=bz93ReOv87Y

Try to run FastQC on your fastq files:

cd $RNA_HOME/data
fastqc *.fastq.gz

Then, go to the following url in your browser:

http://YOUR_DNS_NAME/workspace/rnaseq/data/
Note, you must replace YOUR_DNS_NAME with your own amazon instance DNS (e.g., ec2-54-187-159-113.us-west-2.compute.amazonaws.com))
Click on any of the *_fastqc.html files to view the FastQC report

Exercise

Investigate the source/explanation for over-represented sequences:

HINT: Try BLASTing them.

PRACTICAL EXERCISE 4

Assignment: Run FASTQC on one of the additional fastq files you downloaded in the previous practical exercise.

Hint: Remember that you stored this data in a separate working directory called ‘practice’.

Run FASTQC on the file 'hcc1395_normal_1.fastq.gz' and answer these questions by examining the output.

Questions

How many total sequences are there?
What is the range (x - y) of read lengths observed?
What is the most common average sequence quality score?
What does the Adaptor Content warning tell us?

Solution: When you are ready you can check your approach against the Solutions

Run MultiQC on your fastqc reports to generate a single summary report across all samples/replicates.

cd $RNA_HOME/data
multiqc .

Previous Section	This Section	Next Section
Data	Data QC	Adapter Trim

NOTICE: This resource has been moved to rnabio.org. The version here will be maintained for legacy use only. All future development and maintenance will occur only at rnabio.org. Please proceed to rnabio.org for the current version of this course.

Table of Contents
Module 0: Authors | Citation | Syntax | Intro to AWS | Log into AWS | Unix | Environment | Resources
Module 1: Installation | Reference Genomes | Annotations | Indexing | Data | Data QC
Module 2: Adapter Trim | Alignment | IGV | Alignment Visualization | Alignment QC
Module 3: Expression | Differential Expression | DE Visualization
Module 4: Alignment Free - Kallisto
Module 5: Ref Guided | De novo | Merging | Differential Splicing | Splicing Visualization
Module 6: Trinity
Module 7: Trinotate
Appendix: Saving Results | Abbreviations | Lectures | Practical Exercise Solutions | Integrated Assignment | Proposed Improvements | AWS Setup