Skip to content

FASTQ validation

Nuno Fonseca edited this page May 12, 2021 · 2 revisions

The FASTQ file validator (fastq_info) expects that the fastq files have four lines per sequence:

Sequence Identifier Line
Raw Sequence Line
Plus Line
Quality String Line 

The following checks are performed on each sequence:

  • Sequence Identifier Line - Line starts with an '@' - Should have a unique identifier. - Line is at least 2 characters long ('@' and at least 1 for the sequence identifier)

  • Raw Sequence Line - Minimum length: 1 - Maximum length: 1M - Allowable characters: A C T G U N a c t g u n 0 1 2 3 - U and T are mutually exclusive

  • Plus Line - Must exist - Optional sequence identifier may be specified after the + (it must exactly the same as the one on the Sequence Identifier in the Sequence Identifier Line).

  • Quality String Line - Sequence and quality should have the same length

Paired reads can be provided as two separate files or in a single file (interleaved). Further information about FASTQ format is available at

Clone this wiki locally