-
Notifications
You must be signed in to change notification settings - Fork 0
Building genome variation graphs (VGs) with GRAFIMO
GRAFIMO allows also to build a genome variation graph from the user data. To construct the VG are required a genome reference (in FASTA format) and a VCF file containing the genomic variants to enrich the reference sequence.
GRAFIMO builds the genome variation graph by constructing a VG for each chromosome. As discussed here, this allows a faster and more efficient motif search on the genome variation graph. This building choice is also suggested by VG developers.
GRAFIMO will construct the XG and the GBWT index for each chromosome. The XG and GBWT indexes allow a faster and haplotype-aware motif search on VG.
Note that building the genome VG using 1000 Genomes Project data will take several hours. Please refer to VG's wiki for the fastest construction and indexing methods.
Before attempting to build the VG it is very important to make sure that the chromosome names in the VCF and in the reference FASTA sequence headers match. For example, if in the VCF the chromosome 1 is named 1
, the header of chromosome 1 sequence on the reference FASTA file should be >1
and not >chr1
.
To build a VG with GRAFIMO, type
grafimo buildvg -l /path/to/reference/genome -v /path/to/my/vcf
By default, GRAFIMO stores the VGs in the current location. GRAFIMO allows to specify another location where the VGs will be stored, by using the -o
option. Note that the specified directory must already exist. In order to do this, we call GRAFIMO from command-line with
grafimo buildvg -l /path/to/reference/genome -v /path/to/my/vcf -o /path/to/another/directory
Before building the genome VG, GRAFIMO computes the index of the VCF file using Tabix (the output is the indexed VCF or TBI file). If an index for the VCF file is already present, by default GRAFIMO skips the indexing step. It is possible to reindex the VCF in order to have a fresh index (suggested), by using --reindex
option. Let us assume we want to build the VG, reindex the VCF. In order to do this, we call GRAFIMO with
grafimo buildvg -l /path/to/reference/genome -v /path/to/my/vcf --reindex