Final project for AS.410.712.81.SP17 Advanced Practical Computer Concepts for Bioinformatics at Johns Hopkins University completed May 2017.
Website comparing results from two tools identifying HIV hypermutants: Hyperfreq and Hypermut.
Patients suppressed on antiretroviral therapy for years show persistence of proviruses of HIV-1 in peripheral blood mononuclear cells [1]. However, many of these proviruses are defective and are considered silent with respect to HIV pathogenesis [1]. Recently, Imamichi et al. identified defective viruses capable of transcribing HIV-RNA transcripts [1]. Her results suggest that these defective proviruses are not silent as originally thought and may even actually play a role in HIV pathogenesis by activating host defense pathways [1]. To better understand the population of the defective proviruses, also known as hypermutants, I used Hypermut and Hyperfreq. Hypermut is a commonly used tool available at www.hiv.lanl.gov for identifying HIV hypermutants from a list of sequences in FASTA format [2]. Hyperfreq is a command line tool that finds hypermutants using a Bayesian method [2]. According to Matsen et al., the Bayesian method is supposed to be an improvement on the statistical methods employed with Hypermut [2]. Since none of my colleagues had heard of or used Hyperfreq, I compared Hyperfreq to Hypermut for my final project. In addition, I characterized the open reading frames of hypermutants identified from these tools to determine the prevalence of shortened open reading frames. It is speculated that these truncated open reading frames may be producing viral proteins that are contributing to persistent antibody response despite suppression and eliciting CD4 and CD8 T-cell responses [1].
Installed Hyperfreq and its dependencies following instructions at https://github.com/matsengrp/hyperfreq.
Accessed Hypermut at https://www.hiv.lanl.gov/content/sequence/HYPERMUT/hypermut.html.
Web software (Ajax, Apache) and MySQL were hosted on my school's server.
Below are the scripts and data files used for this project. These are available in the final folder.
+----css | +----hypermutants.css +----js | +----search.js +----hyper_tool.html +----results | +----dates_mysql.py | +----hyperfreq_analysis_leohir.xls | +----hyerfreq_excel_mysql.py | +----hypermut_analysis_leohir.xls | +----hypermut_excel_mysql.py | +----LEOHIR_dates.xlsx | +----LEOHIR_modify.fasta | +----orf.py +----search_hypermutants.cgi +----sequences | +----leohir_all.fasta | +----modify_fasta.py | +----sequences_mysql.py
Sequence files must be in FASTA format. One may need to modify the file, especially if special characters like dashes appear as a result of alignment.
-
Install Hyperfreq.
-
Obtain a FASTA file and extract the reference sequence into another file (for Hyperfreq).
-
Run the following command in the terminal:
hyperfreq analyze patient.fasta –r reference.fasta –o /path/to/file –p GA
- -r reads the file
- -o outputs the results to path
- -p pattern: APOBEC-induced G to A transition
-
Modify the file if necessary and upload into MySQL using hyperfreq_excel_mysql.py.
-
Run the same FASTA file in Hypermut using default options and save the results in Excel. The reference sequence must be the first sequence in the FASTA file.
-
Upload Hypermut results to MySQL using hypermut_excel_mysql.py.
-
Get open reading frames from FASTA file and upload into MySQL using orf.py.
-
For demo of website, go to http://bfx.eng.jhu.edu/sjone215/final/hyper_tool.html, and enter "LEOHIR" into the searchbox to view results.
As of 11/16/2018, the website, which is hosted on my school's server, is still working and can be accessed via the link below:
http://bfx.eng.jhu.edu/sjone215/final/hyper_tool.html
Here are screenshots of the website if the above link is not working:
The user must enter a coded patient name (ex. LEOHIR) and select one of three options, "Hypermut", "Hyperfreq", and "Compare Both". Ideally, the website would have a list of coded patients for the user to select. For "Hypermut" and "Hyperfreq", the user can select an open reading frame (1,2,3) from a drop-down menu.
When either "Hypermut" or "Hyperfreq" is selected, the following results are displayed:
- Total number of sequences
- Total number of hypermutants
- Number of hypermutants by tissue type
- All Predicted Hypermutants
- Open Reading Frame Data for each Hypermutant
- Total number of sequences
- Total number of hypermutants in Hyperfreq
- Total number of hypermutants in Hypermut
- Number of matches
- Fisher p-value
- Hyperfreq and Hypermut Comparison
For this project, I had HIV sequences collected from a variety of tissue sites from one patient "LEOHIR". The HIV sequences and the results from Hypermut and Hyperfreq were uploaded into a MySQL database and visualized on my website. Hyperfreq identified 62 hypermutants while Hypermut identified 60 for this patient. Altogether, there were only three mismatches out of a total of 257 sequences. The p-value for the Fisher's exact test was 0.92, which means that these two tools are comparable at least for one patient. The open reading frames of the hypermutants exhibited a diverse range of lengths, suggesting that these HIV hypermutants are capable of generating peptides. Further experimentation is required to validate the presense of these peptides.
[1] Imamichi, H., Dewar, R. L., Adelsberger, J. W., Rehm, C. A., O’Doherty, U., Paxinos, E. E., ... & Lane, H. C. (2016). Defective HIV-1 proviruses produce novel protein-coding RNA species in HIV-infected patients on combination antiretroviral therapy. Proceedings of the National Academy of Sciences, 113(31), 8783-8788
[2] Rose, PP and Korber, BT. 2000. Detecting hypermutations in viral sequences with an emphasis on G -> A hypermutation.Bioinformatics 16(4): 400-401
[3] Matsen IV, F. A., Small, C. T., Soliven, K., Engel, G. A., Feeroz, M. M., Wang, X., ... & Jones-Engel, L. (2014). A novel bayesian method for detection of APOBEC3-mediated hypermutation and its application to zoonotic transmission of simian foamy viruses. PLoS Comput Biol, 10(2), e1003493