Skip to content

General metrics

Clay McLeod edited this page Sep 24, 2022 · 10 revisions

The General Metrics facet reports general statistics about the records contained within the file. The report is delivered at under the general key within the results.json file. You can easily examine the output of the general facet by using jq:

cat results.json | jq .general

Outputs

This facet has the following top-level keys,

Key Description
records Metrics regarding record counts, including total number of records, unmapped records, duplicate records, the designation of records (primary, secondary, supplementary), how many paired records exist, how many read one and read two records exist, how many records are properly paired, how many singleton records exist, how many record's mate is mapped to a different sequence (both unfiltered and high-quality).
summary Contains summary metrics for this facet, including duplication record percentage, the unmapped record percentage, and the percentage of records whose mate is mapped to another sequence (both unfiltered and high-quality).

Records

This section of the general metrics comprises multiple general counting metrics regarding records. Many of these counts are simply cycling through the reads and counting up reads with particular flags. This is similar to the functionality you would get with a samtools flagstat command.

The current set of record metrics collected include:

  • Total (total). The total number of records within the file.
  • Unmapped (unmapped). The total number of records marked as unmapped (0x4) within the file.
  • Duplicate (duplicate). The number of records marked as duplicate (0x400) within the file.
  • Designation (designation). The number of primary, secondary, and supplementary records in the file respectively.
    • If a read is marked as secondary (0x100), then the read is counted as secondary.
    • If a read is marked as supplementary (0x800), then the read is counted as supplementary.
    • Else, the read is counted as primary.

Primary-only metrics

Past this point, only records designated as primary are counted towards the following metrics.

  • Primary mapped (primary_mapped). The number of records that are counted as primary and and are marked as mapped (!0x4).
  • Primary duplicate (primary_duplicate). The number of records that counted as primary and are marked as duplicate (0x400).

Primary and segmented-only metrics

Past this point, only records that are designated as primary and as segmented (0x01) are counted towards the following metrics.

  • Paired (paired). The number of records that are marked as being within a pair ("segmented").
  • Read 1 (read_1). The number of records that are marked as being the first record within a segment.
  • Read 2 (read_2). The number of records that are marked as being the last record within a segment.
  • Proper pair (proper_pair). The number of records that are marked as being mapped (!0x4) and properly aligned (0x2).

Summary

Summary metrics are generally percentages that are of interest to users. The current set of summary metrics collected include:

  • Duplication percentage. The percentage of records that are marked as duplicate in the file.
  • Unmapped percentage. The percentage of records that are marked as unmapped in the file. This also allows one to trivially calculate the mapped percentage.
Clone this wiki locally