Construct Human + Virus reference
The "GRCh37_selectVirus_9a" human+virus reference used for TCGA-Virus alignment is a concatenation of the human genome (GRCh37) and 18 virus sequences. Source and details of human and virus sequences are listed below:
- GRCh37-lite information
- HPV 6 (NC_001355.1)
- HPV 16 (NC_001526.2)
- HPV 18 (NC_001357.1)
- HPV 31 (J04353.1/PPH31A)
- HPV 33 (M12732.1/PPH33CG)
- HPV 35 (M74117.1)
- HPV 39 (M62849.1/PPHT39)
- HPV 45 (EF202167.1)
- HPV 52 (X74481.1)
- HPV 56 (EF177177.1)
- HPV 58 (D90400.1/PPH58)
- HPV 59 (X77858.1)
- BK Polyoma (NC_001538.1)
- HHV 1 (JQ780693.1)
- HHV 4 (NC_009334.1)
- HHV 5 (AY446894.2)
- Hepatitis B (NC_003977.1)
- Polyomavirus HPyV7 (NC_014407.1)
The combined reference is too large to distribute, but the index file dat/GRCh37_selectVirus_9a.fa.fai is available.
Realignment was performed using BWA v. 0.5.9 with parameters -t 4 -q 5::