ngs qc

The ngs qc subcommand provides tools for checking the quality of next-generation sequencing files. This subcommand is compromised of multiple quality check facets which are configurable for the level of QC you wish to generate. Each quality check facet has a dedicated page within the wiki to explore.

Facet	Computational Load	Facet Type	Link
General Metrics	Light	Record-based	Click here
GC Content Metrics	Light	Record-based	Click here
Template Length Metrics	Light	Record-based	Click here
Genomic Features Metrics	Moderate	Record-based	Click here
Quality Score Metrics	Moderate	Record-based	Click here
Coverage Metrics	Moderate	Sequence-based	Click here
Edit Metrics	Heavy	Sequence-based	Click here

Record-based vs. Sequence-based facets

Every facet within the qc subcommand is either a record-based or sequence-based quality check facet. This is a detail of the implementation and of little relevance to end-users. Briefly, record-based facets are run on every record in the file (though, the facet may choose to ignore certain records based on flags, etc). They are generally quick with a small memory footprint. Sequence-based facets, on the other hand, run for all records in a given sequence at a time. There are two primary reasons why this might be desirable:

The computation depends on records for only the same sequence or is faster if we can assume which record the sequence is from without checking it for every record. Note that ngs qc does not assume input files are coordinate sorted, so we must otherwise parse the matching sequence for every record (which can be time consuming). In a sequence-based facet, records are queried by sequence before being passed into the facet, so the assumption of which sequence the record falls into is safe.
The memory footprint for computing the statistics in a genome-wide fashion would be too great for our tool to run on readily-available hardware (as is defined in our guiding principles).

In general, facets should be designed as Record-based unless they are required to be Sequence-based for one of the reasons above.

Computational load

Each facet has an associated computational load. In this context, a computational load is a mostly-subjective determination regarding how much time a facet adds to the overall run (some facets greatly increase the runtime while others do not). In under normal operating conditions. We provide this classification so one can easily determine which QC facets are feasible to run on their infrastructure.

Light computational loads increase the runtime by less than 1 second per million records.
Moderate computational loads increase the runtime by between 1 and 5 seconds per million records.
Heavy computational loads increase the runtime by more than 5 seconds per million records.

The following quality check facets are currently supported.

Subcommands
- ngs convert
- ngs derive
- ngs generate
- ngs index
- ngs list
- ngs plot
- ngs qc
  - Record-based Facets
  - Sequence-based Facets
    - Coverage metrics
    - Edit metrics
- ngs view
Development
- Common arguments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ngs qc

Record-based vs. Sequence-based facets

Computational load

Clone this wiki locally