Assembler benchmarking #72

rcedgar · 2020-05-03T18:18:10Z

See notes on how to benchmark assemblers here:

200503_rce_assembler_benchmark_notes.pdf

Anyone up for taking on this task?

ababaian · 2020-05-03T23:01:35Z

@JustinChu / @taltman this will be up your alley for measuring how 'good' we can assemble new CoV.

JustinChu · 2020-05-06T03:08:32Z

Hi @rcedgar
Can you point me to the pan genomes/datasets that you created for the alignment experiements?

My current protocol to evaluation that I was thinking of doing was as follows:

Input: library with known SARS or hCov-19 (as in the PDF)
Pangenome (w/o a specific strain like SARS or hCov-19)
Align reads with same protocol as standard pipeline from a COV+ dataset of removed strain. Use only reads that map -> generate contigs (test multiple methods)
Evaluate contigs (completeness, contiguity, mis-assemblies, etc.) on SARS or hCov-19 reference.

I'll need a pangenome with the strain being tested removed (is up to 80% what we have tested?), the reference sequence of strain and maybe libraries positive for the strain (may be able to simply simulate data instead).

Maybe you could just make clear what your folder on the s3 bucket contain so I can perhap reuse them. For instance what do the fasta files in the /r or /q directories contain?

rcedgar · 2020-05-06T03:15:21Z

"I'll need a pangenome with the strain being tested removed (is up to 80% what we have tested?)" -- yes, exactly! See benchmark notes here which explain the s3 files:

200430_covx_benchmark_howto.pdf

JustinChu · 2020-05-06T03:42:07Z

Ah, that is what I was looking for, thanks!

taltman · 2020-05-15T18:04:09Z

I think a lot of what we want to do here can be done using MetaQUAST:
http://quast.sourceforge.net/metaquast

One thing that it is suboptimal in performing is in aligning the short reads back to the assemblies. Takes forever. Perhaps that is an optimization that @rcedgar would be best positioned to tackle? That is my recollection with a large metagenomics assembly from a human gut sample. We'll generate some data on how it runs with our filtered reads, and perhaps will need to address performance if it is still an issue.

rcedgar · 2020-05-15T18:47:13Z

This is my best attempt at writing a very fast read mapper:

https://drive5.com/urmap/manual/downloads.html

ababaian · 2020-06-26T02:03:35Z

Closed by #130

rcedgar added Bioinformatics Bioinformatics task enhancement New feature or request good first issue Good for newcomers labels May 3, 2020

rcedgar mentioned this issue May 3, 2020

Assembly protocol of COV sequences #65

Closed

rcedgar assigned JustinChu May 4, 2020

rcedgar changed the title ~~Assembler benchmark testing -- volunteer needed~~ Assembler benchmark testing May 4, 2020

ababaian mentioned this issue May 14, 2020

State of Assembly #86

Closed

taltman changed the title ~~Assembler benchmark testing~~ Assembler benchmarking May 15, 2020

taltman added this to the Assembly: Validation milestone May 15, 2020

ababaian closed this as completed Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assembler benchmarking #72

Assembler benchmarking #72

rcedgar commented May 3, 2020

ababaian commented May 3, 2020

JustinChu commented May 6, 2020 •

edited

Loading

rcedgar commented May 6, 2020

JustinChu commented May 6, 2020

taltman commented May 15, 2020 •

edited

Loading

rcedgar commented May 15, 2020

ababaian commented Jun 26, 2020

Assembler benchmarking #72

Assembler benchmarking #72

Comments

rcedgar commented May 3, 2020

ababaian commented May 3, 2020

JustinChu commented May 6, 2020 • edited Loading

rcedgar commented May 6, 2020

JustinChu commented May 6, 2020

taltman commented May 15, 2020 • edited Loading

rcedgar commented May 15, 2020

ababaian commented Jun 26, 2020

JustinChu commented May 6, 2020 •

edited

Loading

taltman commented May 15, 2020 •

edited

Loading