-
Notifications
You must be signed in to change notification settings - Fork 2
ngs qc
The ngs qc
subcommand provides tools for checking the quality of next-generation sequencing files. This subcommand is compromised of multiple quality check facets which are configurable for the level of QC you wish to generate. Each quality check facet has a dedicated page within the wiki to explore.
Facet | Computational Load | Facet Type | Link |
---|---|---|---|
General Metrics | Light | Record-based | Click here |
GC Content Metrics | Light | Record-based | Click here |
Template Length Metrics | Light | Record-based | Click here |
Genomic Features Metrics | Moderate | Record-based | Click here |
Quality Score Metrics | Moderate | Record-based | Click here |
Coverage Metrics | Moderate | Sequence-based | Click here |
Edit Metrics | Heavy | Sequence-based | Click here |
Every facet within the qc
subcommand is either a record-based or sequence-based quality check facet. This is a detail of the implementation and of little relevance to end-users. Briefly, record-based facets are run on every record in the file (though, the facet may choose to ignore certain records based on flags, etc). They are generally quick with a small memory footprint. Sequence-based facets, on the other hand, run for all records in a given sequence at a time. There are two primary reasons why this might be desirable:
- The computation depends on records for only the same sequence or is faster if we can assume which record the sequence is from without checking it for every record. Note that
ngs qc
does not assume input files are coordinate sorted, so we must otherwise parse the matching sequence for every record (which can be time consuming). In a sequence-based facet, records are queried by sequence before being passed into the facet, so the assumption of which sequence the record falls into is safe. - The memory footprint for computing the statistics in a genome-wide fashion would be too great for our tool to run on readily-available hardware (as is defined in our guiding principles).
In general, facets should be designed as Record-based unless they are required to be Sequence-based for one of the reasons above.
Each facet has an associated computational load. In this context, a computational load is a mostly-subjective determination regarding how much time a facet adds to the overall run (some facets greatly increase the runtime while others do not). In under normal operating conditions. We provide this classification so one can easily determine which QC facets are feasible to run on their infrastructure.
- Light computational loads increase the runtime by less than 1 second per million records.
- Moderate computational loads increase the runtime by between 1 and 5 seconds per million records.
- Heavy computational loads increase the runtime by more than 5 seconds per million records.
The following quality check facets are currently supported.
-
Subcommands
ngs convert
ngs derive
ngs generate
ngs index
ngs list
ngs plot
-
ngs qc
- Record-based Facets
- Sequence-based Facets
ngs view
- Development