-
Notifications
You must be signed in to change notification settings - Fork 23
CSQ
slivar expr \
--js js/csq.js \
--info "INFO.gnomad_af < 0.05 && CSQs(INFO.CSQ, VCF.CSQ, ['SIFT']).some(function(csq) { return csq.CONSEQUENCE == 'missense' && csq.SIFT < 0.05 })"
...
The consequence field in the VCF is an unstructured, "|"-delimited field that contains transcript specific information about a variant. Most commonly, it indicates the effect (consequence) of the variant on each transcript--such as missense
, stop_gain
, etc.
slivar
contains javascript in js/csq.js to facilitate working with these. That code can be concatenated with js/slivar-functions.js
or your own javascript to provide the following functionality.
Note that:
- There will be a performance hit for using this so it's best to put it at the end of the --info expression so it will only have a cost when the other expressions pass.
- There will often be multiple CSQs for each variant.
The user must pass INFO.CSQ
(or INFO.BCSQ
or INFO.ANN
) and VCF.CSQ
(or VCF.BCSQ
...) which contains the list of field-names present in the each consequence, to the CSQs
function as follows:
CSQs(INFO.CSQ, VCF.CSQ, [])
where the final argument (here []
) is an array of fields that should be converted from String
to Number
--this will likely include any allele-frequencies along with scores such as SIFT but can be empty if the user does not need to access any of the numeric fields.
CSQs
returns an array of CSQ
objects. A CSQ object is simply a javascript object with keys as defined in the CSQ header and values from that particular variant, so given a CSQ header from VEP
like this:
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as predicted by VEP. Format: \
Consequence|Codons|Amino_acids|Gene|SYMBOL|Feature|EXON|PolyPhen|SIFT|Protein_position|BIOTYPE">
The keys of the CSQ object are (NOTE all caps for keys): ['CONSEQUENCE', 'CODONS', 'AMINO_ACIDS', 'GENE', 'SYMBOL', 'FEATURE', 'EXON', 'POLYPHEN', 'SIFT', 'PROTEIN_POSITION', 'BIOTYPE']
. So we can access, e.g. csq.GENE
.
Given a variant with a CSQ field like:
CSQ=upstream_gene_variant|||ENSG00000223972|DDX11L1|ENST00000456328|||||processed_transcript,\ # newline added for clarity
downstream_gene_variant|||ENSG00000227232|WASH7P|ENST00000488147|||||unprocessed_pseudogene
(NOTE there are multiple transcripts separated by ",") then csq.GENE
would give DDX11L1
for the first transcript and WASH7P
for the 2nd transcript.
It's likely we want to check that some
of the CSQs for each variant meet a criteria. In javascript, we can do this as:
my_csqs.some(check_fn)
where check_fn
is a function that accepts a single CSQ object:
function check_fn(csq) {
return csq.CONSEQUENCE == 'missense' && csq.SIFT < 0.05
}
Or we can put it all into a single expression with an anonymouse function to send to slivar:
slivar expr \
--js js/csq.js \
--info "INFO.gnomad_af < 0.05 && CSQs(INFO.CSQ, VCF.CSQ, ['SIFT']).some(function(csq) { return csq.CONSEQUENCE == 'missense' && csq.SIFT < 0.05 })"
...