Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Change assembly filtering step: try filtering contigs based on SPAdes K-mer coverage, contigs length and GC content, and if it fails filter only based on contigs length and GC content (to try to rescue the sample from failing) Change PASS, FAIL, WARNING rules (FAIL overrides WARNING): - FAIL - Low estimated coverage calculated in EstimatedCoverage module (number of sequenced nucleotides / expected genome size) (default 15x). STOPs sample running. - Lower sample coverage, higher number of absent genes or higher number of genes with multiple alleles than specified in TrueCoverage module config file. STOPs sample running. - Fail FastQC “Per base sequence quality”, “Overrepresented sequences”, “Per sequence GC content” or “Sequence length distribution”. Do not pass FastQC “Per base N content” or “Adapter Content”. STOPs sample running if sample FastqQC fails after Trimmomatic reads cleaning. - Zero read pairs survive to Trimmomatic cleaning. - AssemblyMapping module does not run successfully. - Assembly coverage (calculated in AssemblyMapping module) of filtered contigs does not reach the minimum required (30x). - MLST scheme found does not match with provided species (mlst module) (with the exception of Yersinia genus, which might raise a warning). - WARNING - Fail FastQC “Per base sequence content”. Do not pass FastQC “Per base sequence quality” or “Overrepresented sequences”. - Higher number of contigs than allowed or odd number of assembled nucleotides. - Less than 95% of the reads mapped back to the assembly (in AssemblyMapping module). - mlst module did not run. - Found MLST scheme for a species with unknown scheme. In case of Yersinia genus, only raises a warning if the specific scheme found does not match with the scheme for provided species name (but matches the genus). Make SPAdes QC assessment only depending of SPAdes run information (not relying on AssemblyMapping anymore). Remove Trimmomatic and Pear information from QC assessment. Update E. coli maximum number of multiple alleles in TrueCoverage module. Include ReMatCh (https://github.com/B-UMMI/ReMatCh) as dependency for TrueCoverage module running. Make INNUca compatible with new MLST version. Add option –fastQCproceed to force INNUca to continue even if a sample fails FastQC. Add option --maxNumberContigs to set the maximum number of contigs per 1.5 Mb of expected genome size (useful for species that intrinsically produce a more fragmented genome assembly). Change --spadesUse_3_9 to –spadesVersion to specify a SPAdes version (default: 3.11.0). Add option --noLog to tell INNUca to not create a log file (useful in Slurm environment since stdout and stderr are usually saved in a file). Add option --noGitInfo to tell INNUca to not retreive GitHub repository information (useful when running INNUca in parallel independent jobs for many samples since it might save some time). Change final general report (PASS, FAIL, WARNING samples). Write failing and warning reports for each sample. Change dependencies checking. Fix minor errors. Update README. Add Dockerfile with Docker image recipe creation.
- Loading branch information