-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra logging and hmmsearch fix #87
Conversation
… and put in a new HTML file to inspect the error
…error in clusters_to_clinker_globaligner(). line: alignment.add_link(query_gene, subject_gene, best_hit.identity / 100, 0)
… number of queries is lower than the given number of unique or min_hits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took a while to get around to looking at this PR. Mostly happy with everything, would just like to take out the specific CAGECAT stuff.
@@ -131,7 +134,13 @@ def write_profiles(profiles: Collection[str], output: str=None) -> str: | |||
output: name of output file | |||
""" | |||
if not output: | |||
output = datetime.now().strftime("cblaster_%Y%m%d%H%M%S.hmm") | |||
counter = 0 | |||
output = Path(cagecat_prefix, datetime.now().strftime(f"cblaster_%Y%m%d%H%M%S-{counter}.hmm")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generalise this and take out the hardcoded CAGECAT stuff here? (e.g. add an output folder to the arguments for write_profiles
)
@@ -149,7 +158,10 @@ def run_hmmsearch(fasta, query): | |||
temp_res: List, String of result file names | |||
""" | |||
LOG.info("Performing hmmsearch") | |||
output = Path(query).with_suffix(".txt") | |||
output = Path(cagecat_prefix, query).with_suffix(".txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, on the CAGECAT side can you just have cagecat_prefix
as part of query
when you pass it to the function?
Hi Cam,
I'll take a look and see if try and get you a new pull request asap.
Best,
-Mohammad
|
Hi Cam,
I sent Matthias a notice about editing the cagecat specific stuff out as I think he might be able to resolve this quicker. In other related news I have unfortunately seen about 20 failed jobs on the CAGECAT site and it seems there is a small bug that I have a temporary fix for.
The problem:
When writing the _summary.txt file for some reason there is a ‘None’ value for the cluster accession in a row. This breaks the formatter in line 17 formatters.py (can’t format the None value). See attached session file if you want to reproduce (when writing session.format(’summary’ ….) )
Solution:
I have added a quick fix which seems to remedy this but may have some other consequences you might be more aware of. Right now it will show “None” in subject column for an item that has this issue.
Just ensured it is parsed as a str( )...
fmt = [f"{str(row[index]):{length}}" for index, length in enumerate(lengths)]
So far I am not getting jobs with this error.
Considering its a small fix I thought maybe not worth a whole other pull request.
Regards,
-Mohammad
|
Yeah I have noticed that error and have made pretty much the exact same fix for the next release. It might just be quicker if I edit the PR directly - I think the easiest way will be to just use temporary files here, which will also remove the need to generate the unique name from a timestamp. Do you do anything special on the CAGECAT side with the HMM files or just delete them post search? |
Hi Cam,
Great that solution sounds good to me, we just remove them after sometime so having them in temp works for CAGECAT. Maybe it can be something configureable if it's easy? (Some users might have limited space not he /tmp drive… but the files are not very large so probably not something to worry about. I liked using the “tempfile” package worked well for other projects on all operating systems, automatically seeks out an OS’s tmp location too.
Best,
-Mohammad
On Feb 16, 2023, at 2:51 AM, Cameron Gilchrist ***@***.******@***.***>> wrote:
Yeah I have noticed that error and have made pretty much the exact same fix for the next release.
It might just be quicker if I edit the PR directly - I think the easiest way will be to just use temporary files here, which will also remove the need to generate the unique name from a timestamp. Do you do anything special on the CAGECAT side with the HMM files or just delete them post search?
—
Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fgamcil%2Fcblaster%2Fpull%2F87%23issuecomment-1432355803&data=05%7C01%7Cmohammad.alanjary%40wur.nl%7C065f869c345f45e8c88808db0fc04764%7C27d137e5761f4dc1af88d26430abb18f%7C0%7C0%7C638121090727060346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6nCPleWNjlWjN%2Fo1MgbIPX4CQtD171GfgIlurcEDssU%3D&reserved=0>, or unsubscribe<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAUBHKM4XVA6SSGADMXMJGRDWXWBY3ANCNFSM6AAAAAAT7CWVQ4&data=05%7C01%7Cmohammad.alanjary%40wur.nl%7C065f869c345f45e8c88808db0fc04764%7C27d137e5761f4dc1af88d26430abb18f%7C0%7C0%7C638121090727060346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ef%2FBucMmK3Xy5Z0dJ14cBrPwmohZDKWbeTmYAlO4Q%2Bg%3D&reserved=0>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Latest updates to improve stability