Skip to content

This script should take as inputs a vcf, and the out_pairs file from PHASE software, and combine them into a phased vcf

License

Notifications You must be signed in to change notification settings

alisongonpereira/rscript_phase_to_pvcf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This R script is designed to facilitate genetic data analysis by processing VCF files in conjunction with output from the PHASE software. It takes as input a VCF file (Variant Call Format) that only contains the GT field and an output file from PHASE software (out_pairs). The script performs validation checks on these inputs to ensure correct formatting and readability.

Key Features:

  • Input Processing: The script requires a VCF file, which can be prepared using the bcftools command:
bcftools annotate --remove 'FORMAT/[any],FORMAT/[additional],FORMAT/[fields]' multiple_fields.vcf -o gt_only.vcf
  • Phased VCF Generation: The main goal is to generate a phased VCF file from the provided inputs.
  • Quality Control Reporting: Reports samples with imputation and phasing probabilities under a user-defined value, set to 0.8 by default.

This how what GPT4 tried to illustrate the pipeline. I got a bit confused by it, not gonna lie.

image

Reference:

Stephens, M., and Donnelly, P. (2003). "A comparison of Bayesian methods for haplotype reconstruction from population genotype data." American Journal of Human Genetics, 73:1162-1169.
Stephens, M., Smith, N., and Donnelly, P. (2001). "A new statistical method for haplotype reconstruction from population data." American Journal of Human Genetics, 68, 978--989.
Stephens, M., and Scheet, P. (2005). "Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-Data Imputation." American Journal of Human Genetics, 76:449-462.
Li, N., and Stephens, M. (2003). "Modelling Linkage Disequilibrium, and identifying recombination hotspots using SNP data." Genetics, 165:2213-2233.
Crawford et al. (2004). "Evidence for substantial fine-scale variation in recombination rates across the human genome." Nature Genetics, 36: 700-706​

​​​.

About

This script should take as inputs a vcf, and the out_pairs file from PHASE software, and combine them into a phased vcf

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages