Affymetrix QC pipeline

SNP chip genotype calling pipeline for GWAS using Affymetrix Power Tools.

Background

The Affymetrix QC pipeline implements a workflow for human GWAS analysis using data from the Affymetrix GenomeWideSNP_6 chip/microarray; starting from the raw .CEL file stage.

It uses the GenomeWideSNP_6.birdseed-v2 genotype calling algorithm in Affymetrix Power Tools (APT) version: 1.15.2

Following genotype calling, the Affymetrix QC pipeline converts the birdseed calls to plink format (.map and .ped) files.

Handling of missing rs IDs:
missing rs IDs are replaced with dummy IDs, whereby the nomenclature for the dummy IDs is “rs(chromosome)_position”. If one of the SNPs with a dummy ID has a significant association later in the GWAS analysis, then a search for the rs ID could be carried out using the dummy ID.

Pipeline files needed to run QC on Affymetrix raw data

pipeline_qcaffymetrix.py
pipeline_qcaffymetrix_config.py
pipeline_qcaffymetrix_stages_config.py
AffymetrixUserInput.py

Note: The above files should be visible from the witsGWAS/ directory

Update the AffymetrixUserInput.py in preparation for running the Affymetrix QC pipeline

cd witsGWAS/

Edit the following variables in AffymetrixUserInput.py

emacs witsGWAS/AffymetrixUserInput.py

Note: All variables take values of type string

Variables	Value
celfiles_dir	Path to the directory housing the .CEL files
projectname	name of project as one word (e.g. phase1_h3data)
author	project author
affy_qc_lib	Path to the Affymetrix library files
affy_annotation	Path to the Affymetrix annotation file
plink_phenotypes_cases	If phenotype information is available, takes the path to a file listing all the samples to be designated as cases, otherwise is left as an empty string.

A flowchart of tasks in the Affymetrix QC pipeline

flowchart

The flowchart above can be generated by typing the commands below at the unix prompt. (A flowchart.svg file will be generated and stored in the current project folder: projects/projectname-qcaffy-author-timestamp/)

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style flowchart

Side note for WITS cluster users: Need to log into a node first, as flowcharts can't be generated from cream

qsub -I -q medium

Viewing the inputs and expected outputs of each task/job via a pipeline printout

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style print

Running the Affymetrix QC pipeline

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style run

Tip: Running pipelines from within screen sessions minimizes the chances of the pipeline run being interrupted by broken network connections.

Running the Affymetrix QC pipeline in the absence of phenotype data

cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --end 'make_celfiles_binary' --style run

Inspecting the pipeline results

The Affymetrix QC pipeline results will be stored in the current project directory: witsGWAS/projects/projectname-qcaffy-author-timestamp/ with the following directory tree structure

Example dataset with results of analysis

See affyqc and affyqc_no-pheno-info in the example_datasets sub-directory

Home

About witsGWAS

Getting started

Installing witsGWAS

Dockerized pipeline

Using the dockerized pipeline

Running pipelines

Extending pipelines

Advanced GWAS topics

Getting help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly