-
-
Notifications
You must be signed in to change notification settings - Fork 21
VQL language
VQL is a Domain Specific Language to perform several action on a cutevariant project. The main purpose is to filters variants in the same fashion as a SQL query. VQL language can be run from the user interface vql plugin or directly in command line.
SELECT clause is used to choose which fields are displayed.
- Display two fields from variants tables
SELECT chr, pos FROM variants
- Display annotations fields from variants tables
SELECT chr, pos, annotations.impact FROM variants
# You can omit annotations prefix
SELECT chr, pos, impact FROM variants
- Display genotype field from "boby" sample
SELECT sample['boby'].gt FROM variants
# If you omit the field, by default it will takes gt
SELECT sample['boby'] FROM variants
there is special computed fields which allow to perform operation on genotype
- count_hom : count how many homozygous mutant within samples
- count_ref : count how many homozygous wild within samples
- count_het : count how many heterozygous mutant within samples
- count_var : count how many variation (homozygous or heterozygous) within samples
For instance, with a variant dataset of two sample (boby and raymond), the following lines are equivalents :
SELECT chr, pos FROM variants WHERE sample["boby"].gt = 1 AND sample["raymond"].gt = 2
SELECT chr, pos FROM variants WHERE count_hom = 1 AND count_het=1
If your dataset has been imported with a pedigree, you can also filter with case and control status of samples
- case_count_hom : count how many homozygous mutant within samples
- case_count_ref : count how many homozygous wild within samples
- case_count_het : count how many heterozygous mutant within samples
- control_count_hom : count how many homozygous mutant within samples
- control_count_ref : count how many homozygous wild within samples
- control_count_het : count how many heterozygous mutant within samples
For instance, with 4 affected samples and 6 normal samples, look for all variant which are heterozygous in affected samples and homozygous in normal samples :
SELECT chr, pos FROM variants WHERE case_count_ref = 4 AND control_count_hom = 6
WHERE clause is used to keep variants according a condition rules. Here are the list of accepted operator :
-
field = "value"
-
field = 234
-
field != 324
-
field > 34
-
field < 30
-
field IN ("CFTR", "boby")
-
field IN WORDSET["mylist"] # See WORDSETS
-
field LIKE "test"
-
field NOT LIKE "test"
-
field ~ "\d+" # Regular expression
-
field has "exonic"
-
Filters fields using WHERE clause
SELECT chr,pos FROM variants WHERE pos > 3
- Where clause can be a nested condition of OR/AND statements
SELECT chr,pos FROM variants WHERE (pos > 3 AND pos < 100) OR ( impact = 'HIGH')
- Where clause support regular expression . But this can be computing intensive
SELECT chr,pos FROM variants WHERE transcript ~ "NM.+"
- Has operator looks for a word in a list separated by semicolon ( exonic;intronic)
SELECT chr,pos FROM variants WHERE consequence HAS "exonic"
- You can also filter using inline word set or a created one.
SELECT chr,pos FROM variants WHERE gene IN ["CFTR", "GJB2"]
SELECT chr,pos FROM variants WHERE gene IN WORDSET["genelist"]
CREATE clause is used to create a selection or tables of variants. For instance the following code create a new selection named myselection
CREATE myselection FROM variants WHERE gene = 'CFTR'
# You can now select variants from CFTR gene without a filters
SELECT chr,pos FROM myselection
You can also create a selections using set operation performed on 2 different selections. Actually union, intersection and difference are supported
# Intersection between A and B
CREATE myselection = A & B
# Union between A and B
CREATE myselection = A | B
# Difference between A and B
CREATE myselection = A - B
You can also make an intersection with a bed file :
CREATE myselection FROM variants INTERSECT "/home/boby/panel.bed"
VQL supports importation of different features. Actually, it support only word WORDSETS. For example, using a simple text file containing a list of word :
IMPORT WORDSETS "gene.txt" AS mygenes
SELECT chr, pos FROM variants WHERE gene in WORDSET["mygenes"]
This clause remove selection or wordset accoding to their names.
# Remove selection named myselection
DROP SELECTIONS myselection
# Remove wordset named mygenes
DROP WORDSETS mygenes