From 6eb6d2cdf15d0212b20f76ae915bb3a3ec639a51 Mon Sep 17 00:00:00 2001 From: "Brian P. Walenz" Date: Thu, 18 Jul 2024 00:24:31 -0400 Subject: [PATCH] Fail early if more than 4095 input files are supplied. Issue #1910. --- documentation/source/quick-start.rst | 5 +++-- src/pipelines/canu.pl | 11 +++++++++++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/documentation/source/quick-start.rst b/documentation/source/quick-start.rst index d6d039cb4..16c6e6dc1 100644 --- a/documentation/source/quick-start.rst +++ b/documentation/source/quick-start.rst @@ -15,7 +15,8 @@ however, between 30x and 60x coverage is the recommended minimum. More coverage longer reads for assembly, which will result in better assemblies. Input sequences can be FASTA or FASTQ format, uncompressed or compressed with gzip (.gz), bzip2 -(.bz2) or xz (.xz). Note that zip files (.zip) are not supported. +(.bz2) or xz (.xz). Note that zip files (.zip) are not supported. Up to 4,095 input files are +allowed. Canu can resume incomplete assemblies, allowing for recovery from system outages or other abnormal terminations. On each restart of Canu, it will examine the files in the assembly directory to @@ -152,7 +153,7 @@ Trio binning does not yet support inputting PacBio HiFi reads for binning as the Assembling With Multiple Technologies and Multiple Files ------------------------------------------- -Canu can use reads from any number of input files, which can be a mix of formats and technologies. Note that current combining PacBio HiFi data with other datatypes it not supported. We'll assemble a mix of 10X PacBio CLR reads in two FASTQ files and 10X of Nanopore reads in one FASTA +Canu can use reads from any number of input files (up to 4,095 in total), which can be a mix of formats and technologies. Note that current combining PacBio HiFi data with other datatypes it not supported. We'll assemble a mix of 10X PacBio CLR reads in two FASTQ files and 10X of Nanopore reads in one FASTA file:: curl -L -o mix.tar.gz http://gembox.cbcb.umd.edu/mhap/raw/ecoliP6Oxford.tar.gz diff --git a/src/pipelines/canu.pl b/src/pipelines/canu.pl index dda71ff42..3922d4f06 100644 --- a/src/pipelines/canu.pl +++ b/src/pipelines/canu.pl @@ -584,6 +584,17 @@ print STDERR "--\n"; print STDERR "-- Found $ct $rt reads in the input files.\n"; + + # If there are more than 4095 input files, sqStore fails as it needs to + # encode the (possibly unused anymore) read library id (equivalent to + # the input file number) in 12 bits. The best we can do is fail now - + # increasing the limit will result in an on-disk metadata change (see + # _libraryID in sqRead.H). Issue #1910. + # + if (scalar(@inputFiles >= 4096)) { + my $nf = scalar(@inputFiles); + caExit("ERROR: Too many input read files ($nf). Must be fewer than 4096", undef); + } } # Otherwise, no reads found in a store, and no input files.