Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter whitelisted barcodes within edit distance of another whitelisted barcode #138

Closed
TomSmithCGAT opened this issue Jun 19, 2017 · 1 comment

Comments

@TomSmithCGAT
Copy link
Member

Based on my analysis of droplet scRNA-Seq cell barcodes (e.g blog post here), I think we should add the option to remove cell barcodes within the automatically generated whitelist if they are within an edit distance threshold of another whitelisted barcode with greater frequency. I believe there is sufficient evidence to suggest error barcodes (INDEL or sequencing error) may pass the whitelist threshold. We could merge these barcodes into the true barcode from which they derive but this risks merging two truly different cells. On balance, removing these potential error barcodes seems like the best approach. This is compatible with the current error correction within extract which is restricted to only barcodes not in the whitelist. Thus the steps for whitelist generation and filtering would be:

  1. Parse first 50M reads, extract cell barcodes and generate a whitelist using the knee method
  2. (Optionally) identify all cell barcodes within an edit distance threshold of exactly one whitelisted barcode
  3. (Optionally) Remove whitelisted barcodes within an edit distance threshold of another whitelisted barcode within greater frequency
  4. Parse all reads, extract cell barcodes and filter reads against the whitelist (with optional correction of cell barcodes not in the whitelist)
@TomSmithCGAT
Copy link
Member Author

This is now available on the master branch and will be in the next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant