Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect number of sequence in mashclust.py #14

Closed
Nilad opened this issue Sep 16, 2020 · 7 comments
Closed

Incorrect number of sequence in mashclust.py #14

Nilad opened this issue Sep 16, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@Nilad
Copy link

Nilad commented Sep 16, 2020

First i got this message:

.
.
.
SCREENING READS WITH KMERS (Wed Sep 16 14:01:18 CEST 2020)
 Reads will be screened against database supplied for further filtering and mapping,
 this will reduce the input sequences to map against 1351

CLUSTERING SEQUENCES BY KMER DISTANCE (Wed Sep 16 14:01:29 CEST 2020)
 Sequences obtained after screen will be clustered to reduce redundancy,
 one representative, the largest, will be considered for further analysis 1351

---------------------------------------

ERROR in Script plasmidID on or near line 754; exiting with status 1
MESSAGE:

See /myhome/logs/plasmidID.log for more information.
 command: mashclust.py -i /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta -d 0.5

---------------------------------------

In the log file, i got this:

.
.
.
#Executing /usr/local/plasmidID/bin/filter_fasta.sh 

Output directory is /myhome/NO_GROUP/1351/kmer
Wed Sep 16 14:01:28 CEST 2020
Filtering terms on file 2020-09-03_plasmids.fasta
Wed Sep 16 14:01:29 CEST 2020
DONE Filtering terms on file 2020-09-03_plasmids.fasta
File with filtered sequences can be found in /myhome/NO_GROUP/1351/kmer/database.filtered_0.95_term.fasta
Previous number of sequences= 686
Post number of sequences= 687


Namespace(distance=0.5, input_file='/myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta', output=False, output_grouped=False)
Obtaining mash distance
�[31m�[1mCommand mash FAILED
�[0m�[1mWITH PARAMETERS: �[0mdist -i -p 10 /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta
�[1mEXIT-CODE: -11
ERROR:
�[0mSketching /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta (provide sketch file made with "mash sketch" to skip)...
Obtaining cluster from distance
list index out of range
Traceback (most recent call last):
  File "/usr/local/plasmidID/bin/mashclust.py", line 396, in <module>
    main()
  File "/usr/local/plasmidID/bin/mashclust.py", line 386, in main
    cluster_df = big_pairwise_to_cluster(mash_file, threshold=args.distance)
  File "/usr/local/plasmidID/bin/mashclust.py", line 244, in big_pairwise_to_cluster
    cluster_df_return = cluster_df.stack().droplevel(1).reset_index().rename(columns={'index': 'group', 0: 'id'})
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6251, in stack
    return stack(self, level, dropna=dropna)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 543, in stack
    dtype = dtypes[0]
IndexError: list index out of range
Traceback (most recent call last):
  File "/usr/local/plasmidID/bin/mashclust.py", line 396, in <module>
    main()
  File "/usr/local/plasmidID/bin/mashclust.py", line 386, in main
    cluster_df = big_pairwise_to_cluster(mash_file, threshold=args.distance)
  File "/usr/local/plasmidID/bin/mashclust.py", line 244, in big_pairwise_to_cluster
    cluster_df_return = cluster_df.stack().droplevel(1).reset_index().rename(columns={'index': 'group', 0: 'id'})
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6251, in stack
    return stack(self, level, dropna=dropna)
  File "/myhome/.local/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 543, in stack
    dtype = dtypes[0]
IndexError: list index out of range

In the /myhome/NO_GROUP/mystrain/kmer/database.filtered_0.95_term.fasta file, i got 687 sequences with a empty sequence (only ">"), i removed the empty sequence and the pipeline continue without error.

@saramonzon
Copy link
Member

Thanks Nilad, and sorry for the late response, you are right we have to check filter_fasta.sh script, We'll do probably next week!

@saramonzon saramonzon added the bug Something isn't working label Feb 26, 2021
@saramonzon
Copy link
Member

Same issue to reproduce when no sequence has 0.95 sequence similarity.

@saramonzon
Copy link
Member

@Nilad I know it has been a long time since this, but could you share the files that got the error database.filtered_0.95_term and 2020-09-03_plasmids.fasta to reproduce the error?

@saramonzon
Copy link
Member

Partially fixed here 2775af0

@saramonzon
Copy link
Member

I think this should be fixed! If not please reopen!

saramonzon added a commit that referenced this issue Mar 20, 2021
- Migrated tests to github actions
- Updated environment.yml for conda.
- Fixed issues #12,#14,#15,#17. Cases with no plasmids or too many. Relative paths in html images.
saramonzon added a commit that referenced this issue Mar 20, 2021
- Updated Dockerfile
- Migrated tests to github actions
- Updated environment.yml for conda.
- Fixed issues #12,#14,#15,#17. Cases with no plasmids or too many. Relative paths in html images.
@JoseEspinosa
Copy link

HI @saramonzon , I am trying to implement the plasmidid nf-core module and I am having a similar issue, see the mashclust log below:

2021-04-07 07:31:34,703:Namespace(distance=0.5, input_file='./NO_GROUP/test/kmer/database.filtered_0.95_term.fasta', output=False, output_grouped=False)
2021-04-07 07:31:34,705:Obtaining mash distance
2021-04-07 07:31:34,748:Command mash FAILED
WITH PARAMETERS: dist -i -p 10 /Users/jaespinosa/nxf_scratch/24/e975ea2ca481afb35591d23f5946d8/NO_GROUP/test/kmer/database.filtered_0.95_term.fasta /Users/jaespinosa/nxf_scratch/24/e975ea2ca481afb35591d23f5946d8/NO_GROUP/test/kmer/database.filtered_0.95_term.fasta
EXIT-CODE: -11
ERROR:
Sketching /Users/jaespinosa/nxf_scratch/24/e975ea2ca481afb35591d23f5946d8/NO_GROUP/test/kmer/database.filtered_0.95_term.fasta (provide sketch file made with "mash sketch" to skip)...
2021-04-07 07:31:34,750:Obtaining cluster from distance
2021-04-07 07:31:34,762:list index out of range
Traceback (most recent call last):
  File "/usr/local/bin/mashclust.py", line 396, in <module>
    main()
  File "/usr/local/bin/mashclust.py", line 386, in main
    cluster_df = big_pairwise_to_cluster(mash_file, threshold=args.distance)
  File "/usr/local/bin/mashclust.py", line 244, in big_pairwise_to_cluster
    cluster_df_return = cluster_df.stack().droplevel(1).reset_index().rename(columns={'index': 'group', 0: 'id'})
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 7005, in stack
    return stack(self, level, dropna=dropna)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 522, in stack
    dtype = dtypes[0]
IndexError: list index out of range

I am using the genome.fasta that is available on the nf-core/modules test dataset. I generated the contigs file with minia also using the test data set available on nf-core/modules. I attached the file to the issue test.contigs.fa.txt (note that I changed the extension to txt since otherwise is not possible to attach it to the issue).

This is the command used to run plasmidid as reported by nextflow:

plasmidID \
    -d genome.fasta \
    -s test \
    -c test.contigs.fa \
     \
    -o .

Thanks a lot!

@JoseEspinosa
Copy link

Sorry, I just realized that I was not using the last version of biocontainer that is why I was getting this already solved problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants