add function to assist Illumina index correction #917
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a function,
guess_barcodes
, toillumina.py
to assist identifying barcodes that are outliers by read count and potential alternative barcode pairs that may make sense. The heuristic followed tries to find a barcode pair that is not used by another sample that 1) has one index match (assuming a laboratory swap impacted only one of the indices) and 2) has a higher read count. If single-barcode matches do not work but there is a higher read count option with two different barcode this is suggested instead. Where colliding pairs exist, the output is cautious and does not suggest alternatives, though outliers are still identified. Outliers are identified based on variance from the assumption of a balanced pool with one negative control, though threshold, number of controls can be set. As an alternative to finding outliers, the user can specify a sample name explicitly or define a readcount threshold below which barcodes will be reassessed. An error is issued if the number of assigned reads is <70% of the pool (configurable). A call toguess_barcodes
has been incorporated into the demultiplexing workflows, both Snakemake and WDL. A separate WDL task to call onlyguess_barcodes
is not included in this PR. Basic tests are included.