Skip to content

VQL language

sacha schutz edited this page Nov 13, 2020 · 28 revisions

VQL is a Domain Specific Language to perform several action on a cutevariant project. The main purpose is to filters variants in the same fashion as a SQL query. VQL language can be run from the user interface vql plugin or directly in command line.

SELECT clause

SELECT clause is used to choose which fields are displayed.

  • Display two fields from variants tables
SELECT chr, pos FROM variants
  • Display annotations fields from variants tables
SELECT chr, pos, annotations.impact FROM variants
# You can omit annotations prefix 
SELECT chr, pos, impact FROM variants
  • Display genotype field from "boby" sample
SELECT sample['boby'].gt FROM variants
# If you omit the field, by default it will takes gt 
SELECT sample['boby'] FROM variants

Special fields

there is special computed fields which allow to perform operation on genotype

  • count_hom : count how many homozygous mutant within samples
  • count_ref : count how many homozygous wild within samples
  • count_het : count how many heterozygous mutant within samples
  • count_var : count how many variation (homozygous or heterozygous) within samples

For instance, with a variant dataset of two sample (boby and raymond), the following lines are equivalents :

SELECT chr, pos FROM variants WHERE sample["boby"].gt = 1 AND sample["raymond"].gt = 2
SELECT chr, pos FROM variants WHERE count_hom = 1 AND count_het=1

If your dataset has been imported with a pedigree, you can also filter with case and control status of samples

  • case_count_hom : count how many homozygous mutant within samples
  • case_count_ref : count how many homozygous wild within samples
  • case_count_het : count how many heterozygous mutant within samples
  • control_count_hom : count how many homozygous mutant within samples
  • control_count_ref : count how many homozygous wild within samples
  • control_count_het : count how many heterozygous mutant within samples

For instance, with 4 affected samples and 6 normal samples, look for all variant which are heterozygous in affected samples and homozygous in normal samples :

SELECT chr, pos FROM variants WHERE case_count_ref = 4 AND control_count_hom = 6

WHERE clause

WHERE clause is used to keep variants according a condition rules. Here are the list of accepted operator :

  • field = "value"

  • field = 234

  • field != 324

  • field > 34

  • field < 30

  • field IN ("CFTR", "boby")

  • field IN WORDSET["mylist"] # See WORDSETS

  • field LIKE "test"

  • field NOT LIKE "test"

  • field ~ "\d+" # Regular expression

  • field has "exonic"

  • Filters fields using WHERE clause

SELECT chr,pos FROM variants WHERE pos > 3 
  • Where clause can be a nested condition of OR/AND statements
SELECT chr,pos FROM variants WHERE (pos > 3 AND pos < 100) OR ( impact = 'HIGH')
  • Where clause support regular expression . But this can be computing intensive
SELECT chr,pos FROM variants WHERE transcript ~ "NM.+"
  • Has operator looks for a word in a list separated by semicolon ( exonic;intronic)
SELECT chr,pos FROM variants WHERE consequence HAS "exonic"
  • You can also filter using inline word set or a created one.
SELECT chr,pos FROM variants WHERE gene IN ("CFTR", "GJB2")
SELECT chr,pos FROM variants WHERE gene IN WORDSET["genelist"] 

CREATE clause

CREATE clause is used to create a selection or tables of variants. For instance the following code create a new selection named myselection

CREATE myselection FROM variants WHERE gene = 'CFTR' 
# You can now select variants from CFTR gene without a filters 
SELECT chr,pos FROM myselection

You can also create a selections using set operation performed on 2 different selections. Actually union, intersection and difference are supported

# Intersection between A and B 
CREATE myselection = A & B 
# Union between A and B 
CREATE myselection = A | B 
# Difference between A and B 
CREATE myselection = A - B 

You can also make an intersection with a bed file :

CREATE myselection FROM variants INTERSECT "/home/boby/panel.bed"  

IMPORT clause

VQL supports importation of different features. Actually, it support only word WORDSETS. For example, using a simple text file containing a list of word :

IMPORT WORDSETS "gene.txt" AS mygenes
SELECT chr, pos FROM variants WHERE gene in WORDSET["mygenes"]

DROP clause

This clause remove selection or wordset accoding to their names.

# Remove selection named myselection
DROP SELECTIONS myselection

# Remove wordset named mygenes
DROP WORDSETS mygenes
Clone this wiki locally