This page lists types of files used in genomic analysis. To work with actual (sub-sampled size) example genomic files of the types listed below go to this link
Type | Name | Phase | Notes | Example File Image |
---|---|---|---|---|
FASTA | sequencer file | 1a-from sequencer | includes dictionary & index (.fai) files | |
FASTQ | sequencer file w/quality | 1b-from sequencer | includes base quality PHRED score | |
UBAM | unmapped binary alignment file | 1c-from sequencer (processed) | binary format | No Image |
SAM | sequence alignment file | 2a-align to reference | text format | |
BAM | binary alignment file | 2b-align to reference | binary format, being read with IGV viewer, can include index (.bai) files | |
CRAM | compressed binary alignment file | 2c-align to reference | binary format | no image |
VCF | variant call format | 3a-find variants | plain text | |
GVCF | genomic variant call format | 3b-find variants | contains extra info | |
Other text files - TSV, CSV, BED, BZ2 (compressed text) | text files for genomics | 4-any phase | contains extra info | no image |
- 📘Big List of genomic file types and descriptions - link from The Broad
- 📘IGV (Integrative Genomics Viewer) tool - link from The Broad
- Learning how to work with VCF (Variant Call Format) files link
- 📘General reference 'How sequencing works' - link
- 📘GATK tools (from The Broad) to convert genomic files - link - from/to common formats (i.e. paired FASTQ to unmapped BAM, etc...)
- 📘How to generate a BAM - link & image below from The Broad