-
Notifications
You must be signed in to change notification settings - Fork 9
Affymetrix QC pipeline
Background
The Affymetrix QC pipeline implements a workflow for human GWAS analysis using data from the Affymetrix GenomeWideSNP_6 chip/microarray; starting from the raw .CEL file stage.
It uses the GenomeWideSNP_6.birdseed-v2 genotype calling algorithm in Affymetrix Power Tools (APT) version: 1.15.2
Following genotype calling, the Affymetrix QC pipeline converts the birdseed calls to plink format (.map and .ped) files.
Handling of missing rs IDs:
missing rs IDs are replaced with dummy IDs, whereby the nomenclature for the dummy IDs is “rs(chromosome)_position”. If one of the SNPs with a dummy ID has a significant association later in the GWAS analysis, then a search for the rs ID could be carried out using the dummy ID.
Pipeline files needed to run QC on Affymetrix raw data
pipeline_qcaffymetrix.py
pipeline_qcaffymetrix_config.py
pipeline_qcaffymetrix_stages_config.py
AffymetrixUserInput.py
Note: The above files should be visible from the witsGWAS/
directory
Update the AffymetrixUserInput.py
in preparation for running the Affymetrix QC pipeline
cd witsGWAS/
Edit the following variables in AffymetrixUserInput.py
emacs witsGWAS/AffymetrixUserInput.py
Note: All variables take values of type string
Variables | Value |
---|---|
celfiles_dir | Path to the directory housing the .CEL files |
projectname | name of project as one word (e.g. phase1_h3data) |
author | project author |
affy_qc_lib | Path to the Affymetrix library files |
affy_annotation | Path to the Affymetrix annotation file |
plink_phenotypes_cases | If phenotype information is available, takes the path to a file listing all the samples to be designated as cases, otherwise is left as an empty string. |
The flowchart above can be generated by typing the commands below at the unix prompt. (A flowchart.svg file will be generated and stored in the current project folder: projects/projectname-qcaffy-author-timestamp/
)
cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style flowchart
Side note for WITS cluster users: Need to log into a node first, as flowcharts can't be generated from cream
qsub -I -q medium
cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style print
cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --style run
Tip: Running pipelines from within screen sessions minimizes the chances of the pipeline run being interrupted by broken network connections.
cd witsGWAS/
rubra pipeline_qcaffymetrix.py --config pipeline_qcaffymetrix_config.py pipeline_qcaffymetrix_stages_config.py AffymetrixUserInput.py --end 'make_celfiles_binary' --style run
The Affymetrix QC pipeline results will be stored in the current project directory: witsGWAS/projects/projectname-qcaffy-author-timestamp/
with the following directory tree structure
See affyqc and affyqc_no-pheno-info in the example_datasets sub-directory
Home | About | Setup guide | Running pipelines | Extending pipelines | FAQ | © 2015 witsGWAS
About witsGWAS
Getting started
Dockerized pipeline
Running pipelines
- How witsGWAS pipelines work
- Affymetrix QC pipeline
- PLINK QC pipeline
- Association testing pipeline
- Timing of pipeline runs
- witsGWAS cheat sheet
Extending pipelines
Advanced GWAS topics
Getting help