You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I went over the gnomAD v4 SV header and selected a few fields that seem relevant for the annotation step. Here is an overview, we can discuss during the meeting next week.
All info fields can be selected for the whole dataset, or for subsets (from gnomADv3), using these prefixes:
controls_and_biobanks_ only samples collected specifically as controls for disease studies, or samples belonging to biobanks (e.g. BioMe, Genizon) or general population studies (e.g., 1000 Genomes, HGDP, PAGE)
non_neuro_ only samples that were not collected as part of a neurologic or psychiatric case/control study, or samples collected as part of a neurologic or psychiatric case/control study but designated as controls
All info fields can be selected for the whole dataset, or for populations (from gnomADv3), using these prefixes:
afr_ African/African American
ami_ Amish
amr_ Latino/Admixed American
asj_ Ashkenazi Jewish
eas_ East Asian
fin_ Finnish
nfe_ Non-Finnish European
mid_ Middle Eastern
sas_ South Asian
oth_ Other (population not assigned)
for 1 and 2: I don't expect us to differentiate at this point, but I'm putting it out there for any future implementations.
For annotation :
use for vcfanno determination of identity CHROM POS REF ALT ##ALT=<ID=CNV,Description="Copy Number Polymorphism"> Seems like CNVs will have to determined on positions?!
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate">
##INFO=<ID=POS2,Number=1,Type=Integer,Description="Start position of the structural variant on CHR2">
##INFO=<ID=END2,Number=1,Type=Integer,Description="End position of the structural variant on CHR2">
ID rename to gnomad4ID the ID of the variant according to gnomADv4
FILTER ##FILTER=<ID=PASS,Description="All filters passed"> maybe only annotate with SVs after filtering?, could add an annotation field saying it is present in gnomad4 but doesn't pass filtering? I made a table for the different type of FILTER value
INFO ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes"> rename to gnomad4AC ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency (biallelic sites only)."> rename to gnomad4AF ##INFO=<ID=MALE_AF,Number=A,Type=Float,Description="MALE allele frequency (biallelic sites only)."> rename to gnomad4AF_MALE ##INFO=<ID=FEMALE_AF,Number=A,Type=Float,Description="FEMALE allele frequency (biallelic sites only)."> rename to gnomad4AF_FEMALE ##INFO=<ID=FREQ_HET,Number=1,Type=Float,Description="Heterozygous genotype frequency (biallelic sites only)."> rename to gnomad4HETF ##INFO=<ID=FREQ_HOMALT,Number=1,Type=Float,Description="Homozygous alternate genotype frequency (biallelic sites only)."> rename to gnomad4HOMF ##INFO=<ID=CN_NONREF_FREQ,Number=1,Type=Float,Description="Frequency of samples with non-reference copy states (multiallelic CNVs only)."> rename to gnomad4CNF ##INFO=<ID=CPX_INTERVALS,Number=.,Type=String,Description="Genomic intervals constituting complex variant."> rename to gnomad4INT ##INFO=<ID=CPX_TYPE,Number=1,Type=String,Description="Class of complex variant."> rename to gnomad4TYPE (the type of variant it is according to gnomADv4), use the same types as here
##CPX_TYPE_INS_iDEL="Insertion with deletion at insertion site."
##CPX_TYPE_INVdel="Complex inversion with 3' flanking deletion."
##CPX_TYPE_INVdup="Complex inversion with 3' flanking duplication."
##CPX_TYPE_dDUP="Dispersed duplication."
##CPX_TYPE_dDUP_iDEL="Dispersed duplication with deletion at insertion site."
##CPX_TYPE_delINV="Complex inversion with 5' flanking deletion."
##CPX_TYPE_delINVdel="Complex inversion with 5' and 3' flanking deletions."
##CPX_TYPE_delINVdup="Complex inversion with 5' flanking deletion and 3' flanking duplication."
##CPX_TYPE_dupINV="Complex inversion with 5' flanking duplication."
##CPX_TYPE_dupINVdel="Complex inversion with 5' flanking duplication and 3' flanking deletion."
##CPX_TYPE_dupINVdup="Complex inversion with 5' and 3' flanking duplications."
##CPX_TYPE_piDUP_FR="Palindromic inverted tandem duplication, forward-reverse orientation."
##CPX_TYPE_piDUP_RF="Palindromic inverted tandem duplication, reverse-forward orientation."
The following seem interesting to me, but should maybe be evaluated first for relevance and performance:
##INFO=<ID=PREDICTED_BREAKEND_EXONIC,Number=.,Type=String,Description="Gene(s) for which the SV breakend is predicted to fall in an exon.">
##INFO=<ID=PREDICTED_COPY_GAIN,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a copy-gain effect.">
##INFO=<ID=PREDICTED_DUP_PARTIAL,Number=.,Type=String,Description="Gene(s) which are partially overlapped by an SV's duplication, but the transcription start site is not duplicated.">
##INFO=<ID=PREDICTED_INTERGENIC,Number=0,Type=Flag,Description="SV does not overlap any protein-coding genes.">
##INFO=<ID=PREDICTED_INTRAGENIC_EXON_DUP,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to result in intragenic exonic duplication without breaking any coding sequences.">
##INFO=<ID=PREDICTED_INTRONIC,Number=.,Type=String,Description="Gene(s) where the SV was found to lie entirely within an intron.">
##INFO=<ID=PREDICTED_INV_SPAN,Number=.,Type=String,Description="Gene(s) which are entirely spanned by an SV's inversion.">
##INFO=<ID=PREDICTED_LOF,Number=.,Type=String,Description="Gene(s) on which the SV is predicted to have a loss-of-function effect.">
##INFO=<ID=PREDICTED_MSV_EXON_OVERLAP,Number=.,Type=String,Description="Gene(s) on which the multiallelic SV would be predicted to have a LOF, INTRAGENIC_EXON_DUP, COPY_GAIN, DUP_PARTIAL, TSS_DUP, or PARTIAL_EXON_DUP annotation if the SV were biallelic.">
##INFO=<ID=PREDICTED_NEAREST_TSS,Number=.,Type=String,Description="Nearest transcription start site to an intergenic variant.">
##INFO=<ID=PREDICTED_PARTIAL_EXON_DUP,Number=.,Type=String,Description="Gene(s) where the duplication SV has one breakpoint in the coding sequence.">
##INFO=<ID=PREDICTED_PROMOTER,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to overlap the promoter region.">
##INFO=<ID=PREDICTED_TSS_DUP,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to duplicate the transcription start site.">
##INFO=<ID=PREDICTED_UTR,Number=.,Type=String,Description="Gene(s) for which the SV is predicted to disrupt a UTR.">
##INFO=<ID=SOURCE,Number=1,Type=String,Description="Source of inserted sequence.">
##INFO=<ID=STRANDS,Number=1,Type=String,Description="Breakpoint strandedness [++,+-,-+,--]">
To be continued...
The text was updated successfully, but these errors were encountered:
gnomAD v4 SV fields for annotation
I went over the gnomAD v4 SV header and selected a few fields that seem relevant for the annotation step. Here is an overview, we can discuss during the meeting next week.
controls_and_biobanks_
only samples collected specifically as controls for disease studies, or samples belonging to biobanks (e.g. BioMe, Genizon) or general population studies (e.g., 1000 Genomes, HGDP, PAGE)non_neuro_
only samples that were not collected as part of a neurologic or psychiatric case/control study, or samples collected as part of a neurologic or psychiatric case/control study but designated as controlsafr_
African/African Americanami_
Amishamr_
Latino/Admixed Americanasj_
Ashkenazi Jewisheas_
East Asianfin_
Finnishnfe_
Non-Finnish Europeanmid_
Middle Easternsas_
South Asianoth_
Other (population not assigned)for 1 and 2: I don't expect us to differentiate at this point, but I'm putting it out there for any future implementations.
CHROM POS REF ALT
##ALT=<ID=CNV,Description="Copy Number Polymorphism">
Seems like CNVs will have to determined on positions?!ID
rename tognomad4ID
the ID of the variant according to gnomADv4FILTER
##FILTER=<ID=PASS,Description="All filters passed">
maybe only annotate with SVs after filtering?, could add an annotation field saying it is present in gnomad4 but doesn't pass filtering? I made a table for the different type ofFILTER
value##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
rename tognomad4AC
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency (biallelic sites only).">
rename tognomad4AF
##INFO=<ID=MALE_AF,Number=A,Type=Float,Description="MALE allele frequency (biallelic sites only).">
rename tognomad4AF_MALE
##INFO=<ID=FEMALE_AF,Number=A,Type=Float,Description="FEMALE allele frequency (biallelic sites only).">
rename tognomad4AF_FEMALE
##INFO=<ID=FREQ_HET,Number=1,Type=Float,Description="Heterozygous genotype frequency (biallelic sites only).">
rename tognomad4HETF
##INFO=<ID=FREQ_HOMALT,Number=1,Type=Float,Description="Homozygous alternate genotype frequency (biallelic sites only).">
rename tognomad4HOMF
##INFO=<ID=CN_NONREF_FREQ,Number=1,Type=Float,Description="Frequency of samples with non-reference copy states (multiallelic CNVs only).">
rename tognomad4CNF
##INFO=<ID=CPX_INTERVALS,Number=.,Type=String,Description="Genomic intervals constituting complex variant.">
rename tognomad4INT
##INFO=<ID=CPX_TYPE,Number=1,Type=String,Description="Class of complex variant.">
rename tognomad4TYPE
(the type of variant it is according to gnomADv4), use the same types as hereThe following seem interesting to me, but should maybe be evaluated first for relevance and performance:
To be continued...
The text was updated successfully, but these errors were encountered: