Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to generate HMM directory from HMM file. Add Defense Finder Models HMM. #2244

Merged
merged 13 commits into from
Mar 18, 2024

Conversation

Ge0rges
Copy link
Collaborator

@Ge0rges Ge0rges commented Mar 16, 2024

Referencing #2164 here is a script that allows a user to turn an HMM file into an anvio HMM directory if they also specify the source.

In addition, I've added another script which automatically turns the mdmparis/defense-finder-models into an anvi'o compatible HMM directory. Some of the HMM files don't define an accession number, so I use the name instead on those cases.

@meren
Copy link
Member

meren commented Mar 17, 2024

Dear @Ge0rges, thank you very much for this PR!

Thank you also for making use of the anvio/utils function as much as possible without reimplementing things that are already there. In that vein, I thought the function get_attribute_from_hmm_file in sandbox/anvi-script-gen-defense-finder-models-to-hmm-directory could be a good addition to utils with some help docs in the function header.

I also think sandbox/anvi-script-hmm-to-hmm-directory will be a very useful script to automatize a lot of things. I thought the use of --hmm-list and --hmm-source parameters will be a bit confusing (especially if there are a lot of HMMs with different sources). In these cases we generate a new 'artifact', like a two-column TAB-delimited file, for instance in this case to list paths for models and their sources, to be passed to a program to avoid too much wrangling in the command line. But I think we can wait for an actual need to implement that in the future.

There are two things missing in this PR, and it would be excellent to add them if you have time and/or energy. Otherwise I can add them later. First one is new entries under anvio/help/docs/programs for these new scripts. Just so there is some online help that ties them to the rest of the software ecosystem and that people can read and see some examples, understand their utility, etc. The second one is a minimal running example to add into anvio/tests/run_component_tests_for_metagenomics.sh so every night these scripts are tested and if something breaks we learn about it immediately. If you don't have energy for these updates let me know, and I'll merge the PR :)

Best wishes,
Meren

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Mar 17, 2024

@meren Thanks for the tips. I updated the code to move get_attribute_from_hmm_file to utils with some error handling. I also added the requested docs including for anvi-script-pfam-accession-to-hmm-directory as I did not find one.

I did not add the commands to the test file as I would rather leave that to you if that's OK.

@meren
Copy link
Member

meren commented Mar 18, 2024

Thank you very much for these updates, @Ge0rges! I am merging your PR and will test them while adding the entries for our component tests :)

I also included your GitHub account as a collaborator to anv'o project, so you now have direct write access to the repository (which I hope will make it easier for you to directly commit changes to master when you see fit, or submit PRs or branches directly and from your fork).

@meren meren merged commit f9bfcd0 into merenlab:master Mar 18, 2024
@meren
Copy link
Member

meren commented Mar 18, 2024

By the way, I'm getting the following error from anvi-script-gen-defense-finder-models-to-hmm-directory -- I didn't look into it but I thought I'd mention since probably it will make immediate sense to you:

$ anvi-script-gen-defense-finder-models-to-hmm-directory
  File "/Users/meren/github/anvio/sandbox/anvi-script-gen-defense-finder-models-to-hmm-directory", line 77
    try:
    ^^^
SyntaxError: expected 'except' or 'finally' block

meren added a commit that referenced this pull request Mar 18, 2024
meren added a commit that referenced this pull request Mar 18, 2024
meren added a commit that referenced this pull request Mar 18, 2024
meren added a commit that referenced this pull request Mar 18, 2024
meren added a commit that referenced this pull request Mar 18, 2024
@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Mar 18, 2024

Hi @meren

Thank you very much for your trust. Sorry about those two bugs, I will push a fix within a couple hours. I haven't actually yet setup anvio on my Mac yet, so my coding workflow is a bit crap and I clearly forgot to pull my last commit on my test server.

Just to clarify I still would plan to make PRs for any significant change for you to review, but would perhaps push directly small changes to fix minor bugs or typos for example.

@meren
Copy link
Member

meren commented Mar 18, 2024

Just to clarify I still would plan to make PRs for any significant change for you to review, but would perhaps push directly small changes to fix minor bugs or typos for example.

Sure! Everyone who contributes with direct write access does that more or less. Whenever we are uncertain, or feel like it would be better to have other sets of eyes on the code, we send in PRs and ask for reviewer input :)

Working directly with the repo makes contributing much easier nevertheless. My anvi'o setup on my system uses anvio-dev, and it makes it whole lot easier to fix/update the code or documentation as I work through datasets and so on.

@Ge0rges
Copy link
Collaborator Author

Ge0rges commented Mar 18, 2024

@meren just a heads up I fixed the try catch block. Also is the author list case sensitive?

@meren
Copy link
Member

meren commented Mar 20, 2024

is the author list case sensitive?

lol, yes, unfortunately, and I did the lazy thing -- rather than updating our code, I updated your username :p

And thanks for the fix, @Ge0rges. I finally was able to test anvi-script-gen-defense-finder-models-to-hmm-directory, and run it on the Infant Gut Dataset just to have an idea about the hits in this collection of models.

Here is how each model and their hits looked like:

image

I was surprised to realize that one of the models, Paris II, was responsible for quite a remarkable number of these hits:

image

It resolves to PF13304. It seems it is in the list because "Several members are annotated as being of the abortive phage resistance system, in which case the family would be acting as the toxin for a type IV toxin-antitoxin resistance system", but in reality it also has significant similarity to your good old ATP hydrolyses that are almost in every genome. Indeed I searched a few genes from the list of hits, and they were involved in protein binding or ATP binding activity.

Well. The long story short, it is working, but perhaps the models are not too specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants