Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Invalid char while reading (during assembly stage) #740

Open
mjuilk opened this issue Nov 27, 2024 · 3 comments
Open

ERROR: Invalid char while reading (during assembly stage) #740

mjuilk opened this issue Nov 27, 2024 · 3 comments

Comments

@mjuilk
Copy link

mjuilk commented Nov 27, 2024

Hello, I'm trying to run flye for polishing on nanopore long read data (human), but I've encountered an error during the assembly stage.

[2024-11-27 10:43:15] INFO: Starting Flye 2.9.5-b1801
[2024-11-27 10:43:15] INFO: >>>STAGE: configure
[2024-11-27 10:43:15] INFO: Configuring run
[2024-11-27 10:58:17] INFO: Total read length: 71296209607
[2024-11-27 10:58:18] INFO: Reads N50/N90: 13490 / 7135
[2024-11-27 10:58:18] INFO: Minimum overlap set to 7000
[2024-11-27 10:58:18] INFO: >>>STAGE: assembly
[2024-11-27 10:58:18] INFO: Assembling disjointigs
[2024-11-27 10:58:18] INFO: Reading sequences
[2024-11-27 11:11:57] INFO: Building minimizer index
[2024-11-27 11:11:57] INFO: Pre-calculating index storage
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2024-11-27 11:13:25] INFO: Filling index
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2024-11-27 11:15:03] INFO: Extending reads
[2024-11-27 11:16:05] INFO: Overlap-based coverage: 17
[2024-11-27 11:16:05] INFO: Median overlap divergence: 0.0472166
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2024-11-27 11:54:25] INFO: Assembled 17688 disjointigs
[2024-11-27 11:54:27] INFO: Generating sequence
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2024-11-27 11:55:31] INFO: Filtering contained disjointigs
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2024-11-27 11:55:59] INFO: Contained seqs: 3966
[2024-11-27 12:01:49] ERROR: Invalid char while reading /fs/dss/home/doad5844/results/hapmap/NA12877/flye/00-assembly/draft_assembly.fasta
[2024-11-27 12:01:49] ERROR: Pipeline aborted

I troubleshooted by using grep commands to check draft_assembly.fasta for strange characters, but couldn't find any.
Just the >disjointig header followed by the sequence.

This is the command I used to run flye :

python ext/Flye/bin/flye --nano-hq scripts/NA12877_fastq_pass.fastq \
--out-dir results/hapmap/NA12877/flye --threads 64

I tried looking through issues and googling, but couldn't find a post with this exact error, so I'm seeking advice here.
Thank you,
Jon

@mikolmogorov
Copy link
Owner

Hi Jon,

That looks very strange, and the running time for a human dataset seems very unusual. Could you please share the whole log? I would try to validate the draft_assmebly file with something like seqtk. Is the output size as expected (e.g. ~3G)?

@mjuilk
Copy link
Author

mjuilk commented Jan 2, 2025

Hi Jon,

That looks very strange, and the running time for a human dataset seems very unusual. Could you please share the whole log? I would try to validate the draft_assmebly file with something like seqtk. Is the output size as expected (e.g. ~3G)?

Here is a link to the whole log : https://limewire.com/d/3737b1e4-f4a1-4e27-ae6f-6beb04de5947#GrmylCkYJXkdRahgtDnMRcLDEWMhd-bWTIc2dFioeMg

What is the expected running time for a human dataset with flye? How far off is it?

I checked the output size of the draft assembly file with seqtk and it is 281,690,453.

@mikolmogorov
Copy link
Owner

Hi Jon,

The link you sent is not working for me.. Do you mind uploading the gzipped log directly to github?

The assembly size is definitely not right. Could you please tell more about the data, e.g. what version of pore, basecaller, sample prep etc? Did other assemblers (e.g. Shasta) work with this data? Typically it takes a couple days for a human assembly from R10 Q20 data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants