-
Notifications
You must be signed in to change notification settings - Fork 9
PLINK QC pipeline
(versions 1.07 and 1.9).
This program implements a QC workflow for human GWAS analysis using PLINK binary files.
Sample QC tasks include checking for:
- discordant sex information
- Individual missingness
- heterozygosity scores
- relatedness
SNP QC tasks include checking:
- minor allele frequencies
- SNP missingness
- differential missingness
- Hardy Weinberg Equilibrium deviations
Assumptions: Case -control status has been specified in the .fam file (phenotype info can be added using the --make-pheno flag in PLINK 1.9)
Pipeline Options: For datasets missing sex info, the sexinfo_available variable in PlinkUserInput should be set to False e.g. sexinfo_available = False
User interaction: To facilitate user interaction, the pipeline tasks have been grouped into smaller sub-pipelines.
pipeline_qcplink_tasks1-5of20.py
pipeline_qcplink_tasks6-14of20.py
pipeline_qcplink_tasks15-20of20.py
Important note: Individuals identified as being duplicates or being closely related IBD > 0.1875 are written to fail_IBD_qcplink.txt but are NOT removed during QC. It is left up to the user's discretion to decide the point at which they would like to remove those individuals from the dataset. At that point they can be removed using the PLINK command below:
plink --bfile qced_plink --remove fail_IBD_qcplink.txt --make-bed --out clean_inds_qcplink
Reference: Anderson, C. et al. Data quality control in genetic case-control association studies. Nature Protocols. 5, 1564-1573, 2010
Pipeline files needed to run tasks 1-5
pipeline_qcplink_tasks1-5of20.py
pipeline_qcplink_tasks1-5of20_config.py
pipeline_qcplink_tasks1-5of20_stages_config.py
PlinkUserInput.py
Note: The above files should be visible from the witsGWAS/
directory
Update the PlinkUserInput.py
in preparation for running tasks 1-5 of the PLINK QC pipeline
cd witsGWAS/
Edit the following variables in PlinkUserInput.py
emacs witsGWAS/PlinkUserInput.py
Variables | Description | Value type |
---|---|---|
projectname | name of project as one word (e.g. RAW_GWA_DATA) | String |
author | project author | String |
sexinfo_available | Specifies whether sex information is available | Boolean |
plink_binary_files | Path to PLINK binaries | String |
The flowchart above can be generated by typing the commands below at the unix prompt. (A flowchart.svg file will be generated and stored in the current project folder: projects/projectname-pipeline-author-timestamp/
)
cd witsGWAS/
rubra pipeline_qcplink_tasks1-5of20.py --config pipeline_qcplink_tasks1-5of20_config.py pipeline_qcplink_tasks1-5of20_stages_config.py PlinkUserInput.py --style flowchart
Side note for WITS cluster users: Need to log into a node first, as flowcharts can't be generated from cream
qsub -I -q medium
cd witsGWAS/
rubra pipeline_qcplink_tasks1-5of20.py --config pipeline_qcplink_tasks1-5of20_config.py pipeline_qcplink_tasks1-5of20_stages_config.py PlinkUserInput.py --style print
A printout of the pipeline tasks will be shown on screen (stdout).
cd witsGWAS/
rubra pipeline_qcplink_tasks1-5of20.py --config pipeline_qcplink_tasks1-5of20_config.py pipeline_qcplink_tasks1-5of20_stages_config.py PlinkUserInput.py --style run
A new folder for the results will be created under the witsGWAS/projects/
directory
Tip: Running pipelines from within screen sessions minimizes the chances of the pipeline run being interrupted by broken network connections.
Pipeline files needed to run tasks 6-14
pipeline_qcplink_tasks6-14of20.py
pipeline_qcplink_tasks6-14of20_config.py
pipeline_qcplink_tasks6-14of20_stages_config.py
PlinkUserInput.py
Note: The above files should be visible from the witsGWAS/
directory
Update PlinkUserInput.py
in preparation for running tasks 6-14 of the PLINK QC pipeline
cd witsGWAS/
Edit the following variables in PlinkUserInput.py
emacs witsGWAS/PlinkUserInput.py
Variables | Description | Value type |
---|---|---|
current_dir | path to the directory holding results from running pipeline_qcplink_tasks1-5of20.py | String |
cut_het_high | Specifies upper heterozygosity cutoff | Float |
cut_het_low | Specifies lower heterozygosity cutoff | Float |
cut_miss | Specifies individual missingness cutoff | Float |
Similar to the flowchart for tasks 1-5, the flowchart above can be generated by typing the commands below at the unix prompt.
cd witsGWAS/
rubra pipeline_qcplink_tasks6-14of20.py --config pipeline_qcplink_tasks6-14of20_config.py pipeline_qcplink_tasks6-14of20_stages_config.py PlinkUserInput.py --style flowchart
Key point: Notice the difference in the keys (task-to-run vs up-to-date task) between the flowchart for tasks 1-5 and the flowchart shown for tasks 6-14. The flowchart image for tasks 1-5 demonstrates that the flowchart was generated before running tasks 1-5 whilst that for tasks 6-14 was generated after running tasks 6-14.
cd witsGWAS/
rubra pipeline_qcplink_tasks6-14of20.py --config pipeline_qcplink_tasks6-14of20_config.py pipeline_qcplink_tasks6-14of20_stages_config.py PlinkUserInput.py --style print
A printout of the pipeline tasks will be shown on screen (stdout).
cd witsGWAS/
rubra pipeline_qcplink_tasks6-14of20.py --config pipeline_qcplink_tasks6-14of20_config.py pipeline_qcplink_tasks6-14of20_stages_config.py PlinkUserInput.py --style run
The results from running tasks 6-14 will be added to the project folder created during the PLINK QC pipeline run for tasks 1-5
Pipeline files needed to run tasks 15-20
pipeline_qcplink_tasks15-20of20.py
pipeline_qcplink_tasks15-20of20_config.py
pipeline_qcplink_tasks15-20of20_stages_config.py
PlinkUserInput.py
Note: The above files should be visible from the witsGWAS/
directory
Update the PlinkUserInput.py
in preparation for running tasks 15-20 of the PLINK QC pipeline
cd witsGWAS/
Edit the following variables in PlinkUserInput.py
emacs witsGWAS/PlinkUserInput.py
Note: All the variables take values of type Float
Variables | Description |
---|---|
cut_geno | Specifies SNP missingness cutoff |
cut_diff_miss | Specifies differential missingness cutoff |
cut_hwe | Specifies HWE P-value cutoff |
cut_maf | Specifies maf cutoff |
Similar to the flowchart for tasks 1-5, the flowchart above can be generated by typing the commands below at the unix prompt.
cd witsGWAS/
rubra pipeline_qcplink_tasks15-20of20.py --config pipeline_qcplink_tasks15-20of20_config.py pipeline_qcplink_tasks15-20of20_stages_config.py PlinkUserInput.py --style flowchart
cd witsGWAS/
rubra pipeline_qcplink_tasks15-20of20.py --config pipeline_qcplink_tasks15-20of20_config.py pipeline_qcplink_tasks15-20of20_stages_config.py PlinkUserInput.py --style print
A printout of the pipeline tasks will be shown on screen (stdout).
cd witsGWAS/
rubra pipeline_qcplink_tasks15-20of20.py --config pipeline_qcplink_tasks15-20of20_config.py pipeline_qcplink_tasks15-20of20_stages_config.py PlinkUserInput.py --style run
The results from running tasks 15-20 will be added to the project folder created during the PLINK QC pipeline run for tasks 1-5
To identify individuals with divergent ancestry:
Use the pipeline_qcplink_tasks15-20of20
sub-pipeline that has the suffix: extended
cd witsGWAS/
rubra pipeline_qcplink_tasks15-20of20_extended.py --config pipeline_qcplink_tasks15-20of20_stages_config_extended.py pipeline_qcplink_tasks15-20of20_config_extended.py PlinkUserInput.py --style run
See plinkqc and plinkqc_no-sex-info in the example_datasets sub-directory
Home | About | Setup guide | Running pipelines | Extending pipelines | FAQ | © 2015 witsGWAS
About witsGWAS
Getting started
Dockerized pipeline
Running pipelines
- How witsGWAS pipelines work
- Affymetrix QC pipeline
- PLINK QC pipeline
- Association testing pipeline
- Timing of pipeline runs
- witsGWAS cheat sheet
Extending pipelines
Advanced GWAS topics
Getting help