Lightweight command line implementation of the VMC SHA-512 algorithm to allow rapid digest generation.
The follow input types are currently allowed:
- Fasta
- VCF
- HGVS
Please keep in mind this tool only implements specific aspects of the entire VMC model. A VMC Bundle and JSON file are not generated or validated.
Usage: vmccl [--fasta FASTA] [--vcf VCF] [--hgvs HGVS] [--logfile LOGFILE] [--length LENGTH]
Options:
--fasta FASTA Will return VMC Sequence digest ID of this fasta file.
--vcf VCF Will take input VCF file and updated to include VMC (sequence|location|allele) digest IDs.
--hgvs HGVS Valid HGVS expression to digest into VMC record. Double quotes are required.
--logfile LOGFILE Filename for output log file. [default: VMCCL.log]
--length LENGTH Length of digest id to return. MAX: 64 [default: 24]
--help, -h display this help and exit
Easiest method to run vmccl
is to download the most recent executable corresponding to your computer environment.
- Download
OS | Platform | Release |
---|---|---|
darwin | amd64 | darwin |
linux | amd64 | linux |
- Then run (example)
$> wget <release link> .
$> mv vmccl_linux64 vmccl
$> chmod a+x vmccl
Additional builds and features can be requested here
Please review the example section for best practices instructions on how to run vmccl
.
vmccl
will run the VMC digest algorithm on each record in the fasta file. It will store the results into a file of the same name, with a .vmc
extension added. Subsequent runs of vmccl
will check for the presence of the fasta.vmc
file in the same location as the original fasta file.
The following is the format of the fasta.vmc
file:
Leading Identifier (space separated) | VMC Seq ID | Description line of fasta |
---|---|---|
1 | VMC:GS_jqi61wB_nLCsUMtCXsS0Yau_pKxuS21U | 1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 |
- The importance of using the correct fasta files to generate
VMC_GS
cannot be stressed enough as even a change of a single base will generate a completely different sequence identifier. This is especially important when considering sharingVMC_GA
with other institutions.
Please review the example section for best practices instructions on how to run vmccl
.
At this time to update a VCF file, an accompanying fasta file with a identical Leading Identifier
is required. If a fasta.vmc
file has already been generated vmccl
will look for it in the same location as the original fasta and collect the VMC_GS identifiers.
Note:
- Only VCFs which have ran vt decompose will be accepted.
- If your VCF file contains sequence identifiers not found in the fasta file, the VCF record is printed to the new file without updated annotations.
- If your fasta file contains records not found in the VCF file they are skipped.
- Uses and implementation of the
fasta.vmc
file will change as the seqrepo becomes more widely available, and/whenvmccl
implements a SQL database backend.
An example of annotations added to the VCF file:
Added to the VCF header:
##INFO=<ID=VMCGAID,Number=1,Type=String,Description="VMC Allele identifier">
##INFO=<ID=VMCGLID,Number=1,Type=String,Description="VMC Location identifier">
##INFO=<ID=VMCGSID,Number=1,Type=String,Description="VMC Sequence identifier">
Added annotations to the VCF INFO field:
1 949523 183381 C T . . ALLELEID=181485;CLNDISDB=MedGen:C4015293,OMIM:616126,Orphanet:ORPHA319563;CLNDN=Immunodeficiency_38_with_basal_ganglia_calcification;CLNHGVS=NC_000001.10:g.949523C>T;CLNREVSTAT=no_assertion_criteria_provided;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;CLNVI=OMIM_Allelic_Variant:147571.0003;GENEINFO=ISG15:9636;MC=SO:0001587|nonsense;ORIGIN=1;RS=786201005;VMCGSID=VMC:GS_jqi61wB_nLCsUMtCXsS0Yau_pKxuS21U;VMCGLID=VMC:GL_VMC:GS_UqMzt_PvRNhrFl31m8N7SbCGdDpmAtsp;VMCGAID=VMC:GA_VMC:GS_-sajfzQq1Q_PfOAPMPQRodzFclkX8ksp
Currently vmccl
will only digest simple HGVS substitutions using the genomic (g) prefix. Future release will begin to include a more dynamic range of HGVS types. Please refer to to examples section to see the standard HGVS run.
The best practice method for adding VMC digest IDs to a VCF file are as follows:
-
First create a
fasta.vmc
file that will be used for all/future VCF updates.$> ./vmccl --fasta human_g1k_v37_decoy.fasta $> ls -l *fasta* human_g1k_v37_decoy.fasta human_g1k_v37_decoy.fasta.vmc
- In general creating the
fasta.vmc
file will take longer then adding annotations to a VCF file, so pre-building it will decrease future VCF runtimes. - Please keep in mind the
.fasta
and.fasta.vmc
file will need to be in the same location, orvmccl
will rebuild the.fasta.vmc
file.
- In general creating the
-
Run
vmccl
on your vcf file.
$> ./vmccl --fasta human_g1k_v37_decoy.fasta --vcf clinvar_20180701.vcf.gz
$> ls -l *fasta* *vcf*
human_g1k_v37_decoy.fasta
human_g1k_v37_decoy.fasta.vmc
clinvar_20180701.vcf.gz
clinvar_20180701.vmc.vcf.gz
- Output VCFs files will always be gzip compressed.
As with VCF files, currently a matching fasta file is required. Once ran vmccl
will return the corresponding VMC_GA id.
Ensure double quotes are used.
$> ./vmccl --hgvs "NC_000019.10:g.44908684C>T" --fasta data/NC_000019.10.fasta
$> VMC:GA_AnJl99FJB5tNPupduz8I4R8CCuwCpIY0
Currency fasta generation utilizes parallel process. Future releases will incorporate parallel process for VCF updating.
Please report any bugs or feature requests to the issue tracker
AUTHOR Shawn Rynearson shawn.rynearson@gmail.com
LICENCE AND COPYRIGHT Copyright (c) 2018, Shawn Rynearson shawn.rynearson@gmail.com All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as GO itself.
DISCLAIMER OF WARRANTY BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.)))