Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNPUpdate not updating genotypes and base calls #3

Open
sgopalan98 opened this issue Apr 10, 2024 · 3 comments
Open

SNPUpdate not updating genotypes and base calls #3

sgopalan98 opened this issue Apr 10, 2024 · 3 comments

Comments

@sgopalan98
Copy link

My understanding is that the manipulateGTC option currently updates the genotype call and the base calls (1002 and 1003 byte arrays respectively) based on the input alleles in the updates file and the allele combo in the manifest ([A/B] in the manifest). However, the logic in manipulateGTC currently doesn't work in two scenarios:

  1. Indel updates are not performed correctly. (Both genotype call and the base calls)
  2. Base calls are not updated for some SNPs.

Indel update

When there are Indel updates specified in the updates.txt file with AA/BB combination (II,DD), the snpUpdate ignores this case. I assume that this is because of the conditional check in this line in snpUpdate function. This condition checks for only A,T,G,C in the input line. So, indels are ignored.

Base calls

Currently base calls in the GTC file (1003 byte array) is updated based off the alleles mentioned in the input updates file - ref but, this sometimes gives us the wrong base calls as GTC files use the TOP strand alleles combination to generate base calls value for a SNP whereas the base calls generated using the allele combination for the SNP in the BeadPoolManifest might be different.

GTC file documentation reference: https://github.com/Illumina/BeadArrayFiles/blob/develop/docs/GTC_File_Format_v5.pdf (TOC Entry table)

I've managed to find a workaround for this by updating the base calls using the TOP strand combination found in the CSV format of the BeadPoolManifest.

BeadPoolManifest file documentation for reference: https://knowledge.illumina.com/microarray/general/microarray-general-reference_material-list/000001565


I've temporarily addressed both the scenarios and pushed the changes to a fork of this repo - https://github.com/sgopalan98/GThaCk/tree/fixing-bug-manipulate-gtc .

It would be really helpful if you could look at these bugs and find out if there is a better fix for this? Thank you!

@tbrunetti
Copy link
Owner

Can you open a PR for this with your new code so I can take I look? I assume this would be the best path but need to see the changes

@tbrunetti
Copy link
Owner

Sorry PR on this repo, not your fork

@sgopalan98
Copy link
Author

I have raised a PR with the changes. Please let me know if you need any clarifications on anything. Thank you for your response and offering to take a look at the changes, @tbrunetti!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants