If you want run yacrd and fpa speedly you can run :
./script/small_test.sh
This script download the E. coli Nanopore dataset run a subsampling on it and run yacrd, fpa and a combination of this tools on this dataset.
This tools need to be avaible in your path :
- seqtk 1.3-r106
- fpa 0.5
- yacrd 0.6
- dascrubber commit 0e90524 you can follow dascrubber-wrapper instruction to install all dascrubber requirements
- snakemake 5.4.3
- wtdbg2 2.3
- miniasm 0.3-r179
- quast v5.0.2
- nucmer 4.0.0beta2
- bwa mem 0.7.17
- samtools 1.9
- reference seeker 1.2
You need change path of this tools in snakemake pipeline file:
Update miniscrub path in file pipeline/scrubbing.snakefile
line 136.
Update porechop path in pipeline/analysis.snakefile
line 68.
If you execute conda env create -f conda_env.yml
conda create environment yacrd_fpa
with all dependency except dascrubber, miniscrub, ra, porechop and shasta.
-
Reference:
- E. coli CFT073 5.231428 Mb
- D. melanogaster 143.726002 Mb
- C. elegans 100.2 Mb
- H. sapiens chr1 248.9 Mb
-
Reads:
- E. coli CFT073:
- Oxford nanopore D melanogaster
- Oxford nanopore H sapiens chr1
- Pacbio RS P6-C4 C elegans
- NCTC Sequel dataset
- Pacbio RSII and Nanopore data from publication https://doi.org/10.1099/mgen.0.000294
run script script/dl_data.sh
, warning this script can take many time it's download more than 65 dataset.
- Run scrubbing+assembly+analysis
snakemake --snakefile pipeline/uncorrected.snakefile all
- Run fpa+assembly+analysis
snakemake --snakefile pipeline/fpa.snakefile all
- Run comparaison against minimap+miniasm and yacrd+minimap+fpa+miniasm pipeline
snakemake --snakefile pipeline/combo.snakefile all
Get information about reads (it could be very long):
./script/read_info.py
Get information about assembly:
./script/asm_info.py
Get information about runing time and memory usage of scrubbing and assembly:
./script/timming.py