Skip to content

Affymetrix QC pipeline

magosil86 edited this page Nov 19, 2015 · 10 revisions

SNP chip genotype calling pipeline for GWAS using Affymetrix Power Tools.

Background

The Affymetrix QC pipeline implements a workflow for human GWAS analysis using data from the Affymetrix GenomeWideSNP_6 chip/microarray; starting from the raw .CEL file stage.

It uses the GenomeWideSNP_6.birdseed-v2 genotype calling algorithm in Affymetrix Power Tools (APT) version: 1.15.2

Following genotype calling, the Affymetrix QC pipeline converts the birdseed calls to plink format (.map and .ped) files.

Handling of missing rs IDs:
missing rs IDs are replaced with dummy IDs, whereby the nomenclature for the dummy IDs is “rs(chromosome)_position”. If one of the SNPs with a dummy ID has a significant association later in the GWAS analysis, then a search for the rs ID could be carried out using the dummy ID.

Pipeline files needed to run QC on Affymetrix raw data

  1. pipeline_qcaffymetrix.py
  2. pipeline_qcaffymetrix_config.py
  3. pipeline_qcaffymetrix_stages_config.py
  4. AffymetrixUserInput.py

Note: The above files should be visible from the witsGWAS/ directory

Update the AffymetrixUserInput.py in preparation for running the Affymetrix QC pipeline

cd witsGWAS/

Edit the following variables in AffymetrixUserInput.py

emacs witsGWAS/AffymetrixUserInput.py

Note: All variables take values of type string

Variables Value
celfiles_dir Path to the directory housing the .CEL files
projectname name of project as one word (e.g. phase1_h3data)
author project author
affy_qc_lib Path to the Affymetrix library files
affy_annotation Path to the Affymetrix annotation file
plink_phenotypes_cases If phenotype information is available, takes the path to a file listing all the samples to be designated as cases, otherwise is left as an empty string.

A flowchart of tasks in the Affymetrix QC pipeline

flowchart

The flowchart above can be generated by typing the commands below at the unix prompt. (A flowchart.svg file will be generated and stored in the current project folder: projects/projectname-qcaffy-author-timestamp/)

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style flowchart

Side note for WITS cluster users: Need to log into a node first, as flowcharts can't be generated from cream

qsub -I -q medium

Viewing the inputs and expected outputs of each task/job via a pipeline printout

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style print

Running the Affymetrix QC pipeline

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style run

Tip: Running pipelines from within screen sessions minimizes the chances of the pipeline run being interrupted by broken network connections.

Running the Affymetrix QC pipeline in the absence of phenotype data

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --end 'make_celfiles_binary' --style run

Inspecting the pipeline results

The Affymetrix QC pipeline results will be stored in the current project directory: witsGWAS/projects/projectname-qcaffy-author-timestamp/ with the following directory tree structure

Example dataset with results of analysis

See affyqc and affyqc_no-pheno-info in the example_datasets sub-directory

Clone this wiki locally