Skip to content

Building genome variation graphs (VGs) with GRAFIMO

ManuelTgn edited this page Dec 3, 2020 · 1 revision

GRAFIMO allows also to build a genome variation graph from the user data. To construct the VG are required a genome reference (in FASTA format) and a VCF file containing the genomic variants to enrich the reference sequence.

GRAFIMO builds the genome variation graph by constructing a VG for each chromosome. As discussed here, this allows a faster and more efficient motif search on the genome variation graph. This building choice is also suggested by VG developers.

GRAFIMO will construct the XG and the GBWT index for each chromosome. The XG and GBWT indexes allow a faster and haplotype-aware motif search on VG.

Note that building the genome VG using 1000 Genomes Project data will take several hours. Please refer to VG's wiki for the fastest construction and indexing methods.

Before attempting to build the VG it is very important to make sure that the chromosome names in the VCF and in the reference FASTA sequence headers match. For example, if in the VCF the chromosome 1 is named 1, the header of chromosome 1 sequence on the reference FASTA file should be >1 and not >chr1.

To build a VG with GRAFIMO, type

grafimo buildvg -l /path/to/reference/genome -v /path/to/my/vcf

By default, GRAFIMO stores the VGs in the current location. GRAFIMO allows to specify another location where the VGs will be stored, by using the -o option. Note that the specified directory must already exist. In order to do this, we call GRAFIMO from command-line with

grafimo buildvg -l /path/to/reference/genome -v /path/to/my/vcf -o /path/to/another/directory

Before building the genome VG, GRAFIMO computes the index of the VCF file using Tabix (the output is the indexed VCF or TBI file). If an index for the VCF file is already present, by default GRAFIMO skips the indexing step. It is possible to reindex the VCF in order to have a fresh index (suggested), by using --reindex option. Let us assume we want to build the VG, reindex the VCF. In order to do this, we call GRAFIMO with

grafimo buildvg -l /path/to/reference/genome -v /path/to/my/vcf --reindex