-
Notifications
You must be signed in to change notification settings - Fork 2
GC Content metrics
The GC Content metrics facet reports statistics regarding the GC content for records within the file. The report is delivered at under the gc_content
key within the results.json
file. You can easily examine the output of the general facet by using jq
:
cat results.json | jq .gc_content
A histogram representing 0% to 100% GC content per record is initialized with a counter per percentage point (all bins start at zero). For every record in the file, the following happens:
- If the record is marked as a duplicate record (
0x0400
), a secondary record (0x0100
) theignored_flags
counter is incremented by one and the record is ignored. Note that unmapped records as considered here, as we want to include any non-mapped records that might be introduced due to contamination. - If the sequence length is too short (< 100 nucleobases), this can bias our GC content distribution. Thus, the
ignored_too_short
counter is incremented and the record is ignored. - A random selection of 100 nucleobases is taken from the record for evaluation. The GC content of that selection is calculated as a percentage from 0% to 100%, and the respective bin within the histogram is incremented by one. Further, the
processed
counter is incremented by one.
This facet has the following top-level keys,
Key | Description |
---|---|
histogram |
Contains a histogram representing the number of records that have 0% to 100% GC content. |
records |
Contains metrics related to simple record counting for this facet. Includes details on how many records were processed versus how many were ignored and for what reason. |
nucleobases |
Contains metrics related to simple nucleobase counting for this facet. |
summary |
Contains summary statistics regarding this QC facet, most notably the mean GC content for this file. |
As described above, the histogram spans a range of 0% to 100% GC content for a particular record. The number within each bin represents the number of records that (a) passed the filtering criteria outlined above and (b) had a GC content of that particular percentage.
The records field contains metrics regarding how many records were processed, how many records were ignored, and the reason for each ignored record.
The nucleobases field counts up G/C, A/T, and other nucleobases contained within the file. This is used in the final determination for the mean GC content of the file.
Contains summary statistics for the file:
-
GC Content Percentage (
gc_content_pct
). Mean GC content for all records contained within this file. -
Records ignored because of flags percentage (
ignored_flags_pct
). The percentage of records that were filtered because of disqualifying flags. -
Records ignore because they were too short percentage (
ignored_too_short_pct
). The percentage of records that were ignored because the length of the read was too short.
-
Subcommands
ngs convert
ngs derive
ngs generate
ngs index
ngs list
ngs plot
-
ngs qc
- Record-based Facets
- Sequence-based Facets
ngs view
- Development