Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nt -> nt + ti, cannot open 17434647.fasta #6

Open
bashirhamidi opened this issue Nov 8, 2018 · 5 comments
Open

nt -> nt + ti, cannot open 17434647.fasta #6

bashirhamidi opened this issue Nov 8, 2018 · 5 comments

Comments

@bashirhamidi
Copy link

hpc@hpc:/media/box1/tb/ncbint$ bash /home/box1/Downloads/pathoscope2/pasteTaxID/pasteTaxID.bash --multifasta nt.fasta --parallelJobs 50

  • Warning: parallelJobs limit is 40, upper values will set down to this value
  • Splitting multifasta, (if the file is a huge file (~300.000 or more sequences), you should go for a coffee while the script works
    awk: cannot open "17434647.fasta" for output (No space left on device)

Please note that there is over 4 TB of free space available on the drive so it's not a space limitation.

@Sanrrone
Copy link
Contributor

Sanrrone commented Nov 14, 2018

Hi!, sorry for the delay, just to test some in the script, could you try to split the nt.fasta in two (or four) new files and test the script for one of them?, I'm thinking the great amount of fastas is doing an I/O error.

could you try also adding --debug parameter and paste the lines you get when error appear?.

Best
Sandro

@bashirhamidi
Copy link
Author

I split it to 6 files and it crashes the server, perhaps due to the I/O error?
I'm doing a split further to 5GB files. Is there a way to suppress the script from showing the individual tasks with the fetching and such?

@Sanrrone
Copy link
Contributor

the message are mandatory in that step, could be a next improvement suppress the message.
when you mention server, are you logged in a cluster?, could you try run the script locally, and if the problem continues, give me the nt.fasta link to reproduce the error and see in a deep way what happen.

Best
Sandro

@bashirhamidi
Copy link
Author

bashirhamidi commented Nov 16, 2018

Thanks for the response. The database is downloaded directly from NCBI's ftp server.
Edit: For some reason the Markdown is not handling the link properly. Here's the ftp link ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz

In response to your other question, yes, I am on a cluster. Per your recommendation, I ran it locally (first splitting the large file into smaller multifastas).

As part of the process, the script occasionally does not find parsefasta.awk (output below). Any idea why that might be?

++ awk '{print $2}'
++ echo 6283.fasta '>XR_003236166.1' PREDICTED: Vulpes vulpes uncharacterized LOC112925609 '(LOC112925609),' transcript variant X1, ncRNA
++ awk -v ID=emb -f parsefasta.awk
+ ti=9627
+ '[' 9627 == '' ']'
+ '[' 9627 '!=' '' ']'
+ echo '5824.fasta 9627'
+ fastaheader='>XR_003236166.1'
++ echo '>XR_003235922.1'
++ awk -v ID=ti -f parsefasta.awk
++ awk -v ID=ref -f parsefasta.awk
++ echo '>XM_026005165.1'
++ echo '>XM_026005517.1'
awk: cannot open parsefasta.awk (No such file or directory)
fetch.bash: line 105: newheader.txt: No such file or directory
++ awk -v ID=emb -f parsefasta.awk
+ ti=
awk: cannot open parsefasta.awk (No such file or directory)
++ echo '>XM_026003518.1'
++ awk -v ID=gi -f parsefasta.awk
fetch.bash: line 105: newheader.txt: No such file or directory
++ echo '>XR_003235453.1'
* Done :D
awk: cannot open parsefasta.awk (No such file or directory)
+ ref=
awk: cannot open parsefasta.awk (No such file or directory)
awk: cannot open parsefasta.awk (No such file or directory)
+ emb=
++ awk -v ID=ref -f parsefasta.awk
+ gi=
+ ti=
++ echo '>XR_003236166.1'
awk: cannot open parsefasta.awk (No such file or directory)
awk: cannot open parsefasta.awk (No such file or directory)

@Sanrrone
Copy link
Contributor

Hi!, sounds like you are not putting the script and the fasta in same directory, have them in the same directory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants