Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabolic-c test files stuck at Prodigal #202

Open
lacy111693 opened this issue Dec 18, 2024 · 6 comments
Open

Metabolic-c test files stuck at Prodigal #202

lacy111693 opened this issue Dec 18, 2024 · 6 comments

Comments

@lacy111693
Copy link

I have downloaded all the dependencies and was able to run metabolic-g on the test files successfully. However, when trying to run metabolic-c on the test files, it gets stuck on the prodigal step. I have tried several times, and this is where it always gets stuck. I usually close it after a couple of days then try again. I decided to leave it be this time and see if it would ever move, and it's currently been at the prodigal step for 5 days. It's just the test files, so I would not think the size would be an issue.

I am using metabolic within Anaconda Navigator if that helps, and on a Mac with Sequoia 15.1.1, the platform is osx-64.

Note in the screenshot, it says that some of the directories can't be created because they already exist since I have already tried this several times.
Screenshot 2024-12-18 at 3 53 06 PM

@lacy111693
Copy link
Author

@patriciatran

@patriciatran
Copy link
Member

Hi @lacy111693 ,

I don't work on maintaining this software, but was able to find the mkdir problem so I fixed it.
I added the -p flag in the mkdir commands such that the program will only make a directory if the folder doesn't exist already.

As for the second question about prodigal, prodigal is not multithreaded so it can take a long time. If I were you, I would use something like pprodigal https://github.com/sjaenick/pprodigal and use that for metabolic-g. That said metabolic-c will take unannotated genomes only, so that is not of help here.

It is strange that it would take over 5 days to run the test file with 5 genomes though. I would check the Anaconda navigator set-up, and see if you are running into any disk, memory, or thread issues.

@lacy111693
Copy link
Author

Thanks @patriciatran, I discovered that the error is actually GTDB-Tk halting at the prodigal steps. When trying to run and debug GTDB-Tk alone, I got these errors.

[2025-02-07 13:42:50] INFO: GTDB-Tk v2.4.0
[2025-02-07 13:42:50] INFO: gtdbtk classify_wf --cpus 1 -x fasta --genome_dir /Users/lacy_barrett/METABOLIC/METABOLIC_test_files/Guaymas_Basin_genome_files --skip_ani_screen --out_dir test_gtdbtk --debug
[2025-02-07 13:42:50] INFO: Using GTDB-Tk reference data version r220: /Users/lacy_barrett/.conda/envs/metabolic/share/gtdbtk-2.4.0/db
[2025-02-07 13:42:50] INFO: Identifying markers in 5 genomes with 1 threads.
[2025-02-07 13:42:50] TASK: Running Prodigal V2.6.3 to identify genes.
==> Processed 0/5 genomes (0%) | | [?genome/s, ETA ?]Traceback (most recent call last):
File "", line 1, in
File "/Users/lacy_barrett/.conda/envs/metabolic/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Users/lacy_barrett/.conda/envs/metabolic/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: 'Parallel' object has no attribute '__process_manager'

When I went to the GTDB-Tk github page to look into issues, other people seemed to have a similar problem, and the response was always that GTDB-Tk does not work on mac because of pplacer. But metabolic says it works on mac, even though GTDB-Tk is one of the dependencies. So I'm trying to figure out how that works?

@patriciatran
Copy link
Member

Hello @lacy111693 , can you link me to where you saw that METABOLIC works on Mac? I have personally never tried it on Mac, but I can fix any documentation that says otherwise.

@lacy111693
Copy link
Author

Hello @lacy111693 , can you link me to where you saw that METABOLIC works on Mac? I have personally never tried it on Mac, but I can fix any documentation that says otherwise.

Let me see if I can find it and I'll link it in a comment! I may be misremembering, it was over a year ago that I started digging into metabolic and thought I had read that it works on mac. To be fair though, I'm not a computer person, so it's entirely possible when I started looking through dependencies and saw that some would not work on windows, then I assumed mac was the only option. I did not know what linux was at the time.

@lacy111693
Copy link
Author

@patriciatran I was able to run linux as a subsystem on my windows computer and got everything downloaded, woo! When running the test files, I did still run into a few issues. A readline closure at 1908 and issues with the metabolic handoff diagrams as well as the energy plots. I just ran perl METABOLIC-C.pl -test true with no changes to any of the code. Any thoughts? Thanks so much for all of your help.

[2025-02-18 12:31:54] The Prodigal annotation is running...
[2025-02-18 12:32:31] The Prodigal annotation is finished
readline() on closed filehandle _IN at METABOLIC-C.pl line 1908.
readline() on closed filehandle _IN at METABOLIC-C.pl line 1908.
[2025-02-18 12:32:32] The hmmsearch is running with 20 cpu threads...
[2025-02-18 12:57:24] The hmmsearch is finished
[2025-02-18 12:57:39] Generating each hmm faa collection...
[2025-02-18 12:57:39] Each hmm faa collection has been made
[2025-02-18 12:57:39] The KEGG module result is calculating...
[2025-02-18 13:00:12] The KEGG identifier (KO id) result is calculating...
[2025-02-18 13:00:12] The KEGG identifier (KO id) seaching result is finished
[2025-02-18 13:00:12] Searching CAZymes by dbCAN2...
[2025-02-18 13:01:47] dbCAN2 searching is done
[2025-02-18 13:01:47] Searching MEROPS peptidase...
[2025-02-18 13:02:18] MEROPS peptidase searching is done
[2025-02-18 13:02:19] METABOLIC table has been generated
[2025-02-18 13:02:19] Drawing element cycling diagrams...
[2025-02-18 13:06:51] Drawing element cycling diagrams finished
[2025-02-18 13:06:51] Drawing metabolic handoff diagrams...
mv: cannot stat 'METABOLIC_out/newdir/Bar_plot/bar_plot_input_1.pdf': No such file or directory
mv: cannot stat 'METABOLIC_out/newdir/Bar_plot/bar_plot_input_1.pdf': No such file or directory
mv: cannot stat 'METABOLIC_out/newdir/Bar_plot/bar_plot_input_2.pdf': No such file or directory
mv: cannot stat 'METABOLIC_out/newdir/Bar_plot/bar_plot_input_2.pdf': No such file or directory
[2025-02-18 13:06:52] Drawing metabolic handoff diagrams finished
[2025-02-18 13:06:52] Drawing energy flow chart...
[2025-02-18 13:06:52] INFO: GTDB-Tk v2.4.0
[2025-02-18 13:06:52] INFO: gtdbtk classify_wf --cpus 20 -x fasta --genome_dir /home/lacy111693/METABOLIC/METABOLIC_test_files/Guaymas_Basin_genome_files --skip_ani_screen --out_dir METABOLIC_out/intermediate_files/gtdbtk_Genome_files
[2025-02-18 13:06:52] INFO: Using GTDB-Tk reference data version r220: /home/lacy111693/anaconda3/envs/metabolic/share/gtdbtk-2.4.0/db
[2025-02-18 13:06:52] INFO: Identifying markers in 5 genomes with 20 threads.
[2025-02-18 13:06:52] TASK: Running Prodigal V2.6.3 to identify genes.
[2025-02-18 13:07:09] INFO: Completed 5 genomes in 16.60 seconds (3.32 seconds/genome).
[2025-02-18 13:07:09] TASK: Identifying TIGRFAM protein families.
[2025-02-18 13:07:12] INFO: Completed 5 genomes in 3.21 seconds (1.56 genomes/second).
[2025-02-18 13:07:12] TASK: Identifying Pfam protein families.
[2025-02-18 13:07:12] INFO: Completed 5 genomes in 0.25 seconds (20.05 genomes/second).
[2025-02-18 13:07:12] INFO: Annotations done using HMMER 3.4 (Aug 2023).
[2025-02-18 13:07:12] TASK: Summarising identified marker genes.
[2025-02-18 13:07:12] INFO: Completed 5 genomes in 0.09 seconds (54.60 genomes/second).
[2025-02-18 13:07:12] INFO: Done.
[2025-02-18 13:07:13] INFO: Aligning markers in 5 genomes with 20 CPUs.
[2025-02-18 13:07:13] INFO: Processing 5 genomes identified as bacterial.
[2025-02-18 13:09:57] INFO: Read concatenated alignment for 107,235 GTDB genomes.
[2025-02-18 13:09:57] TASK: Generating concatenated alignment for each marker.
mv: cannot stat 'METABOLIC_out/Output_energy_flow/Energy_plot/network.plot.pdf': No such file or directory
mv: cannot stat 'METABOLIC_out/Output_energy_flow/Energy_plot/network.plot.pdf': No such file or directory
[2025-02-18 13:10:02] Drawing energy flow chart finished
[2025-02-18 13:10:02] Calculating MW-score ...
[2025-02-18 13:10:03] Calculating MW-score is done
METABOLIC-C was done, the total running time: 00:38:09 (hh:mm:ss)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants