Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running mmseqs createsubdb: sh: 1: Syntax error: ")" unexpected #10

Open
marcasriv opened this issue Oct 14, 2021 · 7 comments

Comments

@marcasriv
Copy link

Hi,

I'm interested in running PhyloCSF++ with annotate-with-mmseqs on Chinese hamster, but I am getting an error when it reaches the mmseqs createsubdb step:

./phylocsf++ annotate-with-mmseqs --threads 35 --output conservation species.txt 58mammals criGri1.refGene.gtf

Checking whether MMseqs2 is installed ...
Processing GFF /mnt/HDD2/conservation/criGri1.refGene.gtf
Created the genomesDB directory.
Created the cds directory.
Reading reference genome of GFF file /mnt/HDD2/conservation/fastas/criGri1.fa ...
Reading GFF file and extracting CDS coordinates ...
MMseqs2: Indexing genomes ...
MMseqs Version: 42bf6438fec1e1b987f46d8f6d4b09926ecfc019
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3

Converting sequences
[410465] 1m 2s 307ms
Time for merging to genbankseqs_h: 0h 0m 0s 74ms
Time for merging to genbankseqs: 0h 0m 43s 532ms
Database type: Nucleotide
Time for processing: 0h 1m 46s 799ms
bash -c $'mmseqs createsubdb <(awk '$3 == 0' /mnt/HDD2/conservation//genomesDB/genbankseqs.lookup) conservation//genomesDB/genbankseqs /mnt/HDD2/conservation//genomesDB/genbankseqs_0'
sh: 1: Syntax error: ")" unexpected

This is how the input species.txt file looks like:

chinese_hamster conservation/fastas/criGri1.fa
mouse conservation/fastas/Mus_musculus.GRCm39.dna.primary_assembly.fa
rat conservation/fastas/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa
human conservation/fastas/Homo_sapiens.GRCh38.dna.primary_assembly.fa
naked_mole_rat conservation/fastas/Heterocephalus_glaber_female.HetGla_female_1.0.dna.toplevel.fa
guinea_pig conservation/fastas/Cavia_porcellus.Cavpor3.0.dna.toplevel.fa
squirrel conservation/fastas/Ictidomys_tridecemlineatus.SpeTri2.0.dna.toplevel.fa
rabbit conservation/fastas/Oryctolagus_cuniculus.OryCun2.0.dna.toplevel.fa
pika conservation/fastas/Ochotona_princeps.OchPri2.0-Ens.dna.toplevel.fa

And I have downloaded the reference GTF file and fasta files from https://hgdownload.soe.ucsc.edu/goldenPath/criGri1/bigZips/genes/criGri1.refGene.gtf.gz and https://hgdownload.soe.ucsc.edu/goldenPath/criGri1/bigZips/criGri1.fa.gz

Thanks so much,

Marina

@cpockrandt
Copy link
Owner

Hi Marina,

thank you for trying out PhyloCSF++ and opening an issue! I made a fix and pushed it to the master branch. Can you try running it again with the latest commit? Let me know if you need help building PhyloCSF++ from source, I can also upload a statically linked binary here.

If the fix works for you, we will make a new release, update it on bioconda and distribute new binaries.

Christopher

@marcasriv
Copy link
Author

Hi Christopher,

Thanks so much for your help and fix! I re-built PhyloCSF++ with the latest commit and it is now running smoothly pass the error. Unfortunately I've bumped into a new problem. The program it's crashing now at (I believe) line 422 in script phylocsf++annotate_with_mmseqs.hpp (same parameters/files as in previous post):

mmseqs result2dnamsa conservation//cds/cds.index conservation//genomesDB/genbankseqs /conservation//aln/aln_all_tophit conservation//aln/msa --threads _40

MMseqs Version: 42bf6438fec1e1b987f46d8f6d4b09926ecfc019
Skip query false
Threads 40
Compressed 0
Verbosity 3
Query database size: 99405 type: Nucleotide
Target database size: 410501 type: Nucleotide
[=================================================================] 100.00% 99.40K 7m 13s
889ms
Time for merging to msa: 0h 0m 0s 216ms
Time for processing: 0h 7m 15s 116ms
MMseqs2: Score aligned CDS ...

terminate called after throwing an instance of 'std::length_error'
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
terminate called recursively
what(): terminate called recursively
terminate called recursively
terminate called recursively
Aborted (core dumped)

Thanks again,

Marina

@cpockrandt
Copy link
Owner

Can you give me the list of assemblies you used, so that we can try to reproduce this error?

@cpockrandt
Copy link
Owner

Hi Marina,

thank you, we were able to reproduce the error and added a fix to the master branch. Before you run it again, please make sure to delete any temporary files in the output directory from the previous runs.

Christopher

@marcasriv
Copy link
Author

Hi Christopher,

Thanks so much for your reply. I've removed the previous installation of PhyloCSF++ , cloned the latest PhyloCSF++ version and re-installed, and removed any previous files but I'm still getting the same error in the same line of code. I've also tried to change the location of the output directory , but unfortunately no luck so far. Could there be anything in my system overriding the new install?

Marina

@cpockrandt
Copy link
Owner

Hi Marina,

I tried it on another system and it works for me with the latest commit and data set that you listed above. You don't have to "install" PhyloCSF++ on your system, after make you can just call the binary directly in the build directory with ./phylocsf++ to make sure that you really use the latest build and not an outdated binary that might still be in the PATH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants