gnomAD update #183

Madelinehazel · 2024-01-22T20:20:17Z

Update gnomAD to version 4 in the crg2-hg38 branch.
Include gnomAD_faf95_popmax column.

Look into whether or not there is a GRCh37 version that we can use to update the GRCh37 pipeline.

Please see this document for a summary of a previous gnomAD update. And this associated pull request..

NOTE that you will need to be in branch crg2-hg38, not master, to run the hg38 crg2 pipeline! And for cre, switch to branch hg38 for report generation.

gnomAD is a database of exomes and genomes from (mostly) healthy individuals. We use gnomAD as a control cohort; a variant with a population allele frequency (AF) of 1% or higher is almost certainly not the cause of an extremely rare monogenic disease. The gnomAD AFs allow us to filter down the variants in an individual with rare monogenic disease so that we can more easily identify the variant or variants associated with their phenotype. Here we will be updating the gnomAD SNV/indel annotation source (they also provide SV AFs).

gnomAD AFs are available in a VCF (or per-chromosome VCFs that can be combined). We use vcfanno to add these AFs to the VCF generated by crg2 in this [rule](variant allele frequencies ). vcfanno requires a config that specifies which fields to use from a VCF to annotate another VCF, and any operations that might be applied to these. In crg2-hg38, that config is here.

You will need to:

Download the gnomAD v4 VCFs for both exomes and genomes (https://gnomad.broadinstitute.org/downloads)
Combine chromosome-wise VCFs for exomes, and combine chromosome-wise VCFs for genomes, resulting in one VCF for gnomAD exomes and one for gnomAD genomes.
You will likely need to process the VCF to exclude unwanted fields, normalize, etc as in this script, the key step being the bcftools command. However, we want to keep FAIL variants so you would remove the first part of the command that filters to include only PASS variants.
Check to see that these VCFs have the fields specified in the vcfanno config.
Replace the filenames in the vcfanno config to reflect the v4 VCFs.
Run the pipeline to generate small variant reports.

r-varan self-assigned this Feb 19, 2024

anjalijain22 self-assigned this Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gnomAD update #183

gnomAD update #183

Madelinehazel commented Jan 22, 2024 •

edited

Loading

gnomAD update #183

gnomAD update #183

Comments

Madelinehazel commented Jan 22, 2024 • edited Loading

Madelinehazel commented Jan 22, 2024 •

edited

Loading