Open
Description
I polish several similar assemblies and would like to easily aggregate stats for their polishing. As for now, there are human-readable stats in a well-described log file, however, its parsing is complicated. Is it possible to produce machine-readable tsv or json file (which could be parsed by a future version of MultiQC...)
Here is the code I use now for the perfect case when each assembly has the only contig (in a Snakemake's shell flavor):
(
printf "SampleID\tPercConfirmedBases\tCoverage\tCorrectedSNPs\tCorrectedAmbiguousBases"
printf "\tCorrectedSmallInsertions\tCorrectedSmallDeletions\tFixedLocalBreaks\tFixedGaps\n"
paste <( echo {SAMPLES_STR} | tr ' ' '\n' ) \
<( parallel -k "grep -oP 'Confirmed.*\(\K([0-9.]+)' {{}}" ::: {input} ) \
<( parallel -k "grep -oP 'Mean total coverage: \K([0-9.]+)' {{}} | sed 's/$/x/g'" ::: {input} ) \
<( parallel -k "grep -oP '([0-9]+)(?= snps)' {{}}" ::: {input} ) \
<( parallel -k "grep -oP '([0-9]+)(?= ambiguous bases)' {{}}" ::: {input} ) \
<( parallel -k "grep -oP '([0-9]+)(?= small insertions)' {{}}" ::: {input} ) \
<( parallel -k "grep -oP '([0-9]+)(?= small deletions)' {{}}" ::: {input} ) \
<( parallel -k "grep '^fix break' {{}} | wc -l" ::: {input} ) \
<( parallel -k "grep '^fix gap' {{}} | wc -l" ::: {input} )
) > {output}
Metadata
Metadata
Assignees
Labels
No labels