Skip to content

Commit

Permalink
Merge pull request #30 from wtsi-hgi/gz3/HSH-201-customizable-tresholds
Browse files Browse the repository at this point in the history
HSH-201: Documented customizable tresholds for sampleQC.
  • Loading branch information
wtsi-mercury authored Dec 20, 2024
2 parents 5bd3ce5 + 9c90bde commit 2a5ec44
Showing 1 changed file with 23 additions and 5 deletions.
28 changes: 23 additions & 5 deletions docs/wes-qc-hail.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,16 +210,34 @@ spark-submit 2-sample_qc/3-population_pca_prediction.py --pca-plot-assigned
4. Identify outliers

Now that we have the predicted populations that each sample belongs to,
we can run sample QC stratified by population and identify outliers within each population for each metric tested:
- number of SNPs
- number of deletions and insertions, insertion/deletion ratе
- heterozygosity rate, heterozygous/homozygous ratio
- number of transitions and transversions, transition/transversion ratio.
we run sample QC stratified by population and identify outliers.

We test the following metrics, calculated by Hail:
* number of SNPs
* heterozygosity rate, heterozygous/homozygous ratio
* number of transitions and transversions, transition/transversion ratio.
* number of deletions and insertions, insertion/deletion ratе

For metric description, see the
[Hail sample_qc()](https://hail.is/docs/0.2/methods/genetics.html#hail.methods.sample_qc)
function description.

```shell
spark-submit 2-sample_qc/4-find_population_outliers.py
```

WES-QC pipeline identifies outliers using the gnomAD function
[`compute_stratified_metrics_filter()`](https://broadinstitute.github.io/gnomad_methods/api_reference/sample_qc/filtering.html#gnomad.sample_qc.filtering.compute_stratified_metrics_filter).
By default, this function designates as outliers any samples
that deviate more than 4 Median Absolute Deviations (MAD)
from the average by any metric.

If you need to adjust this behavior,
modify the `compute_stratified_metrics_filter_args` section in the configuration file.
Any parameters added to this section are transferred to the `compute_stratified_metrics_filter()` function.
For example, you can use the `metric_threshold` dictionary to specify individual thresholds for some metrics.


5. **Filter out samples which fail QC**

The final step in sample QC is filtering the data to remove samples which are identified as failing in the previous script.
Expand Down

0 comments on commit 2a5ec44

Please sign in to comment.