Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUSCO (HMMER) failed to open sequence file due to no extracted proteins from Augustus #15459

Closed
Buuntu opened this issue May 18, 2019 · 6 comments

Comments

@Buuntu
Copy link

Buuntu commented May 18, 2019

I can get BUSCO to run through a docker container such as https://hub.docker.com/r/vera/busco/ but have been unable to run it using conda unfortunately, which is my preference.

I installed with

conda install busco

When I run locally, I get 0% BUSCO scores on files that normally get >97% when run through the Docker instance.

The output I get is

run_BUSCO.py -i ~/Data/microbe.fasta -o test -l ~/Data/busco/enterobacteriales_odb9 -m genome
INFO	****************** Start a BUSCO 3.0.2 analysis, current time: 05/18/2019 11:56:15 ******************
INFO	Configuration loaded from /Users/me/miniconda3/envs/pipeline/bin/../config/config.ini
INFO	Init tools...
INFO	Check dependencies...
INFO	Check input file...
INFO	To reproduce this run: python /Users/me/miniconda3/envs/pipeline/bin/run_BUSCO.py -i /Users/me/Data/microbe.fasta -o test -l /Users/me/Data/busco/enterobacteriales_odb9/ -m genome -c 1 -sp E_coli_K12
INFO	Mode is: genome
INFO	The lineage dataset is: enterobacteriales_odb9 (prokaryota)
INFO	Temp directory is ./tmp/
INFO	****** Phase 1 of 2, initial predictions ******
INFO	****** Step 1/3, current time: 05/18/2019 11:56:15 ******
INFO	Create blast database...
INFO	[makeblastdb]	Building a new DB, current time: 05/18/2019 11:56:15
INFO	[makeblastdb]	New DB name:   /Users/me/Code/pipeline/tmp/test_3811270253
INFO	[makeblastdb]	New DB title:  /Users/me/Data/microbe.fasta
INFO	[makeblastdb]	Sequence type: Nucleotide
INFO	[makeblastdb]	Keep Linkouts: T
INFO	[makeblastdb]	Keep MBits: T
INFO	[makeblastdb]	Maximum file size: 1000000000B
INFO	[makeblastdb]	Adding sequences from FASTA; added 40 sequences in 0.0825248 seconds.
INFO	[makeblastdb]	1 of 1 task(s) completed at 05/18/2019 11:56:16
INFO	Running tblastn, writing output to /Users/me/Code/pipeline/run_test/blast_output/tblastn_test.tsv...
INFO	[tblastn]	1 of 1 task(s) completed at 05/18/2019 11:56:29
INFO	****** Step 2/3, current time: 05/18/2019 11:56:29 ******
INFO	Maximum number of candidate contig per BUSCO limited to: 3
INFO	Getting coordinates for candidate regions...
INFO	Pre-Augustus scaffold extraction...
INFO	Running Augustus prediction using E_coli_K12 as species:
INFO	[augustus]	Please find all logs related to Augustus errors here: /Users/me/Code/pipeline/run_test/augustus_output/augustus.log
INFO	[augustus]	102 of 1015 task(s) completed at 05/18/2019 11:56:30
INFO	[augustus]	203 of 1015 task(s) completed at 05/18/2019 11:56:32
INFO	[augustus]	305 of 1015 task(s) completed at 05/18/2019 11:56:34
INFO	[augustus]	406 of 1015 task(s) completed at 05/18/2019 11:56:35
INFO	[augustus]	508 of 1015 task(s) completed at 05/18/2019 11:56:37
INFO	[augustus]	609 of 1015 task(s) completed at 05/18/2019 11:56:38
INFO	[augustus]	711 of 1015 task(s) completed at 05/18/2019 11:56:40
INFO	[augustus]	812 of 1015 task(s) completed at 05/18/2019 11:56:42
INFO	[augustus]	914 of 1015 task(s) completed at 05/18/2019 11:56:43
INFO	[augustus]	1015 of 1015 task(s) completed at 05/18/2019 11:56:45
INFO	Extracting predicted proteins...
INFO	****** Step 3/3, current time: 05/18/2019 11:56:54 ******
INFO	Running HMMER to confirm orthology of predicted proteins:
INFO	Results:
INFO	C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:781
INFO	0 Complete BUSCOs (C)
INFO	0 Complete and single-copy BUSCOs (S)
INFO	0 Complete and duplicated BUSCOs (D)
INFO	0 Fragmented BUSCOs (F)
INFO	781 Missing BUSCOs (M)
INFO	781 Total BUSCO groups searched
INFO	****** Phase 2 of 2, predictions using species specific training ******
INFO	****** Step 1/3, current time: 05/18/2019 11:56:54 ******
INFO	Extracting missing and fragmented buscos from the ancestral_variants file...
INFO	Running tblastn, writing output to /Users/me/Code/pipeline/run_test/blast_output/tblastn_test_missing_and_frag_rerun.tsv...
INFO	[tblastn]	1 of 1 task(s) completed at 05/18/2019 11:58:55
INFO	Maximum number of candidate contig per BUSCO limited to: 3
INFO	Getting coordinates for candidate regions...
INFO	****** Step 2/3, current time: 05/18/2019 11:58:56 ******
INFO	Training Augustus using Single-Copy Complete BUSCOs:
INFO	Converting predicted genes to short genbank files at 05/18/2019 11:58:56...
INFO	All files converted to short genbank files, now running the training scripts at 05/18/2019 11:58:56...
INFO	Pre-Augustus scaffold extraction...
INFO	Re-running Augustus with the new metaparameters, number of target BUSCOs: 781
INFO	[augustus]	109 of 1081 task(s) completed at 05/18/2019 11:58:58
INFO	[augustus]	217 of 1081 task(s) completed at 05/18/2019 11:58:59
INFO	[augustus]	325 of 1081 task(s) completed at 05/18/2019 11:59:01
INFO	[augustus]	433 of 1081 task(s) completed at 05/18/2019 11:59:03
INFO	[augustus]	541 of 1081 task(s) completed at 05/18/2019 11:59:05
INFO	[augustus]	649 of 1081 task(s) completed at 05/18/2019 11:59:06
INFO	[augustus]	757 of 1081 task(s) completed at 05/18/2019 11:59:08
INFO	[augustus]	865 of 1081 task(s) completed at 05/18/2019 11:59:10
INFO	[augustus]	973 of 1081 task(s) completed at 05/18/2019 11:59:12
INFO	[augustus]	1081 of 1081 task(s) completed at 05/18/2019 11:59:13
INFO	Extracting predicted proteins...
INFO	****** Step 3/3, current time: 05/18/2019 11:59:23 ******
INFO	Running HMMER to confirm orthology of predicted proteins:
INFO	[hmmsearch]	Error: Failed to open sequence file /Users/me/Code/pipeline/run_test/augustus_output/extracted_proteins/POG093P0344.faa.1 for reading
INFO	[hmmsearch]	Error: Failed to open sequence file /Users/me/Code/pipeline/run_test/augustus_output/extracted_proteins/POG093P0175.faa.1 for reading
INFO	[hmmsearch]	Error: Failed to open sequence file /Users/me/Code/pipeline/run_test/augustus_output/extracted_proteins/POG093P054J.faa.1 for reading
...

I believe this is because Augustus is not extracting any predicted proteins. I confirmed and there are no extracted_proteins in the augustus_output folder.

For good measure, I installed Augustus separately with

conda install Augustus

But am still getting the same result. I am running OS X, it looks like that might be an issue with Augustus actually: nextgenusfs/funannotate#3

First few lines of august.log:


/Users/me/miniconda3/envs/pipeline/bin/augustus: ERROR
	PP::Profile: Error parsing pattern file"/Users/me/Data/busco/enterobacteriales_odb9/prfl/POG093P0008.prfl", line 8.


/Users/me/miniconda3/envs/pipeline/bin/augustus: ERROR
	PP::Profile: Error parsing pattern file"/Users/me/Data/busco/enterobacteriales_odb9/prfl/POG093P0009.prfl", line 8.


/Users/me/miniconda3/envs/pipeline/bin/augustus: ERROR
@Buuntu
Copy link
Author

Buuntu commented May 19, 2019

Just confirmed that a similar thing happens when running in a Debian based Linux version through Docker (installed busco with conda install busco from within the Docker instance). In this case, I get the same error but the files are there in augustus_output/predicted_proteins, they are all just empty. I've tried with multiple fasta sequences and get the same result.

Running a docker instance available on Dockerhub of v3 on the same files I get a score of 99.8%:

docker run -it -v $PWD:$PWD comics/busco bash -c "cd $PWD ; run_BUSCO.py -i microbe.fasta -l enterobacteriales_odb9 -o test -m genome"

@damioresegun
Copy link

Hi, I can confirm I ma also having this issue. I haven't tried docker yet but I will be doing that in the near future to see that I have not gone crazy. I have ran busco with the debug option turned on but there really is no big information being given. I have checked the created protein files and they are all empty as @Buuntu mentioned. I am currently running augustus individually in another environment to see if its an issue with my data though this is currently unlikley as it is creating the gff file with no issues.

@jvolkening
Copy link
Contributor

jvolkening commented Jun 27, 2019

Also experienced this as a new problem arising in the past few months with workflows that used to work. It seems to be related to the BLAST version used. Possibly related to the commits associated with #12639. The default version of BLAST installed with BUSCO seems now to be pinned to 2.2.31. If I add an explicit blast version 2.7.1 to my conda config, BUSCO starts working again.

(EDIT) I should note that I'm also using --blast_single_core in my BUSCO call to deal with the threading issues mentioned elsewhere.

@maguileraf
Copy link

any luck fixing this? I am running into the same problem

@SilasK
Copy link
Member

SilasK commented Apr 9, 2020

I assume this issue is outdated or fixed in the busco v4 release, otherwise comment.

@SilasK SilasK closed this as completed Apr 9, 2020
@jvolkening
Copy link
Contributor

Confirmed that the problem no longer exists in v4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants