-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNPUpdate Bugs #4
base: master
Are you sure you want to change the base?
Conversation
|
||
# Getting base call byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "getting genotype class byte"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for putting together a yaml and a new singularity def; the one I had is a bit outdated
@@ -133,6 +134,7 @@ def query(): | |||
parser = argparse.ArgumentParser(description='Functions and methods for gtc files') | |||
parser.add_argument('method', choices=['manipulateGTCs', 'getIntensities', 'sampleInformation', 'createSampleSheet', 'allCombos']) | |||
parser.add_argument('--bpm', required=True, type=str, help='Full path to bead pool manifest file (.bpm); must be same one used to generate gtc') | |||
parser.add_argument('--bpm-csv',required=True, type=str, help='Full path to bead pool manifest file in CSV form (.csv); must be same one used to generate gtc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea -- I like this esp since Illumina does provide the csv and that also allows us to modify the csv if needed w/o going through the bpm
if data[1002][loc] == 0: | ||
data[1003][loc] = '--'.encode() | ||
else: | ||
if newSnps[0] in ['I', 'D']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES! This is the use case I never added or even thought of b/c at the time I don't think we cared about indel and were exclusively looking at SNPs so I never added the case nor did I ever have a use case that I tested it on, but this makes perfect sense.
@@ -174,6 +208,44 @@ def snpOverride(manifest, overrides): | |||
################################################################################################################### | |||
logger.debug('Preparing to read in bpm file...') | |||
manifest = BeadPoolManifest(bpm) | |||
def get_manifest_csv(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for internal doc and as a reminder to myself; this code chunk is to support the input csv as the manifest instead of the bpm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgopalan98 I think this all looks great and this is how I would expect to approach the indel issue as well; have you tested to see if the gtc output validates ok w/ what is expected? If so, I can merge the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you also want to be added as a collab let me know. Happy to add you to this repo, although I think you all have your own version of this now but happy to add you nonetheless
This PR contains minimal changes intended to address the SNPUpdate bug mentioned in issue #3. I am unsure about the correctness of the code, i.e., whether it passes all test cases.
The basic idea is that we obtain the CSV form of the Bead Pool Manifest file and extract the top strand alleles for each variation. These top strand alleles determine the 'A' and 'B' bases, which are then used to update the 1003 array byte for that SNP. This method does not work for Indels, as top strand alleles do not include 'I' or 'D'. Thus, the ILMN strand in the BPM Binary file is used to assign 'A' and 'B' for them. (I am not fully sure if this is the correct way to update GTCs, but it has worked for the test cases that I have tried so far).
Another code change involves flipping the 'A' and 'B' assignment based on the RefStrand signs (PLUS or MINUS)
The changes to the GitHub workflow file and Singularity definition file are not related to the bug fixes; they are updates to generate Singularity containers and are hardcoded to check out the branch containing this specific change.