-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembler benchmarking #72
Comments
@JustinChu / @taltman this will be up your alley for measuring how 'good' we can assemble new CoV. |
Hi @rcedgar My current protocol to evaluation that I was thinking of doing was as follows:
I'll need a pangenome with the strain being tested removed (is up to 80% what we have tested?), the reference sequence of strain and maybe libraries positive for the strain (may be able to simply simulate data instead). Maybe you could just make clear what your folder on the s3 bucket contain so I can perhap reuse them. For instance what do the fasta files in the |
"I'll need a pangenome with the strain being tested removed (is up to 80% what we have tested?)" -- yes, exactly! See benchmark notes here which explain the s3 files: |
Ah, that is what I was looking for, thanks! |
I think a lot of what we want to do here can be done using MetaQUAST: One thing that it is suboptimal in performing is in aligning the short reads back to the assemblies. Takes forever. Perhaps that is an optimization that @rcedgar would be best positioned to tackle? That is my recollection with a large metagenomics assembly from a human gut sample. We'll generate some data on how it runs with our filtered reads, and perhaps will need to address performance if it is still an issue. |
This is my best attempt at writing a very fast read mapper: |
Closed by #130 |
See notes on how to benchmark assemblers here:
200503_rce_assembler_benchmark_notes.pdf
Anyone up for taking on this task?
The text was updated successfully, but these errors were encountered: