Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Nextflow implementation template (#118)
* 🎨 Add entrypoints for taxon assignment workflow 🎨 Add Autometa nextflow implementation template. 🎨 Update majority_vote.py parameters to more easily construct taxon assignment workflow. * 🎨 Comment out container directives 🎨🔥 Update optional arguments to mandatory arguments (metagenome, interim, processed) 🎨 Prefix output files with metagenome.simpleName in their respective output directories. 🎨 Name main workflow AUTOMETA and call with channel 🔥 Remove handling of coverage outside of SPAdes assembly (TODO: Incorporate separate COVERAGE workflow to pass into AUTOMETA) 🐛 fix AUTOMETA:UNCLUSTERED_RECRUITMENT input where BINNING.out was emitting two outputs instead of the binning results (was also emitting embedded kmers) * 🎨 Add nextflow config with slurm executor configuration and nextflow project details * 🐛 Add end of file newline * 🐛 Add missing line continuation in MARKERS command. * 🐛 Fix incorrect keyword argument in lca.py main call 🐛 Fix incorrect flag in entrypoint (MARKERS process) * 🎨 Keep hmmscan output file in MARKERS * Update gitignore with paths to ignore nextflow generated files * 🐛 Fix broken paths in SPLIT_KINGDOMS 🎨 Add parameter '--outdir' to autometa-taxonomy entrypoint. * 🐛 Fix missing line continuation in BINNING * 🎨 Update output paths so only binning results are in processed directory 🎨 Add completeness and purity parameters to autometa.nf * 🎨 Add completeness and purity parameters to log at beginning of run * 🐛 Handle for case where archaea are not recovered from metagenome * 🎨 Add config file for autometa input parameters 🔥 Remove copy mode from all publishDir settings for all processes in autometa workflow 🎨 Update autometa.taxonomy.vote entrypoint paramters 💚 Update mocked args to be compatible with new autometa.taxonomy.vote paramters 🎨 Add type hints to ncbi.py 🔥 Remove most of redundant logic from vote.py s.t. entrypoint now is only responsible for adding canonical ranks to voted taxids and writing out ranks split by provided rank 🎨🔥 Remove hardcoded parameters and add additional parameters to allow user finer control of entire autometa workflow 🎨 Add HTCondor executor profile with comments * 💚🐛🔥 Remove keyword argument 'out' from vote.add_ranks(...) func * 🎨 Add params.cpus to initial info log * 🔥🐛🎨 Remove unnecessary autometa prodigal wrapper. 🔥 Removes GNU parallel functionality from ORFs process. This was removed because the number of ORF sequences recovered using GNU parallel was non-deterministic This will take a hit on performance as a trade-off for determinism. * 🎨 Update nextflow scripts to use jason-c-kwan/autometa:dev docker image 🎨 Add dockerignore prevent unnecessary context bloat and image bloat. 🔥 Remove Makeflow autometa template 🎨 Move autometa.nf containing AUTOMETA workflow to nextflow directory :up_arrow: Add minimum pandas version of 1.1. 📝 Update link to references in normalize(...) func in kmers.py 🎨 Update parameters.config to reflect updated nextflow parameters 🎨 Update Dockerfile with entrypoint checks autometa-taxonomy-lca and autometa-taxonomy-majority-vote 🎨 Add main.nf for use with manifest as a pre-requisite for nextflow pipeline sharing through GitHub. 🎨 Update manifest in nextflow.config to reflect change in mainScript 🎨 Add fixOwnership to docker scop in nextflow.config * 🎨 Update manifest with 'doi' and 'defaultBranch' * 🎨 Update arguments for entrypoints autometa-binning and autometa-unclustered-recruitment 🎨 Propagate these argument changes to nextflow processes 💚 Update tests to accomodate updated arguments * 🔥 Remove unused/unnecessary configuration scripts 🎨 Move code in config/__init__.py to config/utilities.py and update respective imports to point to this file 🎨 Split autometa-configure entrypoint into two entrypoints autometa-config and autometa-update-databases 🐛 Change default markers directory to look inside default.config instead of source directory 🔥 Remove __main__.py and autometa.py wrapper to __main__.py in exchange for using nextflow files. ⬆️ Add diamond to requirements.txt 🐛 Modify config to point to autometa/databases after installation in Docker build 🎨📝 Add typehints across config scripts * 🎨 Apply black formatting * ✅🎨 Update call to parse_args from config.parse_args(...) to config.utilities.parse_args(...) * ✅🐛 Update config.parse_args(...) to autometa.config.utilities.parse_args(...) * ✅ Alias config.utilities imports to configutils. Provides access to parse_args attribute while avoiding confusion with autometa.common.utilities functions * 🎨 Update default databases retrieval logic 🐛 Remove issue of redundant executable versions being written in default.config 🐛 Fix automatically updating autometa home_dir configuration in default.config 🎨 Add exception handling in parse_argparse.py to provide more debugging information * ✅📝 Fix error when parsing databases argparse. 🎨 Remove any indentation in written argparse blocks for retrieving argparse usage * 🎨 add EOF line in dockerignore * 🐛 Fix default path to markers database in MARKERS process * 🐛 Fix incorrect option when attempting to download missing ncbi files * 🐛 Fix clean command in Makefile so it actually removes provided directories * 🎨 replace only first ftp in ncbi ftp filepaths * 🎨 Remove orfs filepath dependency in LCA and majority vote 🎨 Change entrypoint arguments for autometa-taxonomy-lca and autometa-taxonomy-majority-vote * 🎨 Changed entrypoint parameters for autometa-length-filter. 🔥 Remove unused methods in metagenome.py 🎨✅ Remove unuseded tests in test_metagenome. Update MockedParser to reflect new entrypoint args 🎨 Update nextflow LENGTH_FILTER process to accomodate new parameters. Now uses named emits (fasta, stats, gc_content) 🎨📝 Add new binning metrics into parameters.config (gc_stddev_limit,cov_stddev_limit) 📝🎨 Add type hints into metagenome.py * 📝 Update log with added parameters * 🐛 Fix incorrect path to default markers database in nf pipeline (location in docker image is currently hardcoded in MARKERS process). 🎨 Next step is for default to point to absolute path in docker image instead of relative path * 🔥 Remove --dbdir hardcoded parameter in MARKERS process. This is now being appropriately configured in the docker image that is utilized by nextflow 🐛 Add conda channels conda-forge and bioconda to create_environment command 🎨 Update Dockerfile to configure autometa databases with the DB_DIR environment variable as an absolute path (relative path may cause bugs) * Update autometa/common/metagenome.py * 🐛 replace 'orfs' tags with the respective single input path tag * 🐛🔥 Remove --multiprocess flag from autometa-kmers command in KMERS process * 🔥 Remove duplicate dependencies * 🐛 Fix cryptic bug where imports do not work when explicit python interpreter is used in Makefile commands 🎨 Add functionality to handle for gzipped orfs for autometa-markers entrypoint * 🔥 Remove Makefile from .dockerignore 🎨 use of make commands from Makefile for autometa directory cleanup and install 🐛⬆️ Set samtools minimum version in requirements.txt. Otherwise samtools command would not work properly * 🎨 Change --output parameter to --output-binning in recursive_dbscan.py > 🎨 Add '--output-master' paramter to autometa-binning entrypoint > ✅ Update MockArgs to account for updated entrypoint parameters > ✅🎨 Add args check to autometa-binning entrypoint for embed_dimensions and embed_pca_dimensions inputs > 🎨 Fix typo in kmers embed docstring > 🎨 Standardize output columns from kmers.embed(...) to 1-indexed 'x_1' to 'x_{embed_dimensions}' instead of x,y,z... > 🐛 Add coverage and gc_content std.dev. limits to drop columns in run_hdbscan(...) > 🎨 drop columns in run_hdbscan(...) and run_dbscan(...) are now performed on one line and if the df does not contain any of the columns in dropcols, the error is ignored * 🔥 Remove conda install using py2.7 🔥🎨 Rename references from master to main throughout nf and autometa binning scripts 📝 Format notes in parameters.config * ⬆️ Add minimum version of diamond 2.* 💚 Add output_main to MockedArgs * 📝🎨 Add copyright and short script description to all unit test files * 🎨 Add autometa-parse-bed entrypoint 🎨 Add READ_COVERAGE workflow in common-tasks to compute coverage from read alignments instead of SPAdes headers * 📝 Replace 2020 copyright with 2021 copyright 📝🔥 Remove note on ORF calling warning and replace with contig cutoff warning 📝 Update help text for --binning argument in unclustered_recruitment * 🔥 Remove --do-pca argument from kmers.py 📝 Fix help string in --norm-method in kmers.py 🎨 Change --normalized to --norm-output in kmers.py 🎨 Change --embedded to --embedding-output in kmers.py 🎨 Change --embed-dimensions to --embedding-dimensions in kmers.py 🎨 Change --embed-method to --embedding-method in kmers.py 🎨 Update KMERS in common-tasks.nf to account for updated parameters 💚 Update test_kmers.py MockedArgs to account for updated arguments * 🔥💚 Remove references to removed do_pca parameter 🐛 Update marker databases checksums so they correspond to md5sum 🎨 sort main file output columns in autometa-binning entrypoint * 🔥🎨 Remove 'string' metavar for clustering-method arg * 🔥 Remove kmer embedding args from autometa-binning entrypoint 🎨 Change KMERS.out.normalized as input for binning to KMERS.out.embedded 💚 Update test_recursive_dbscan kmers fixture and mocked args to account for removed kmer parameters 🎨 Add convert_dtypes method call to load(...) func for markers dataframe 🔥🎨 Remove parameters for kmers in binning-tasks and update parameters to correspond to kmers args 🎨 unclustered recruitment now writes output-binning with contig, cluster and recruited_cluster columns * 🎨 Add autometa-binning-summary entrypoint 🎨 unclustered recruitment now writes out binning with columns 'cluster' and 'recruited_cluster' 🐛💚 Fix duplicate mocks in test_recursive_dbscan(...) 🎨 Add BINNING_SUMMARY process in autometa.nf workflow 🎨 Define BINNING_SUMMARY process in binning-tasks.nf * 💚🐛 Change broken variable main to main_df * 💚🔥 Remove kmer embedding dimensions test * 🐛🔥 Remove assembly argument in get_metabin_stats(...) 💚🔥 Remove unused mocked dependencies in test_kmers.py 🔥💚 Remove tests corresponding to old summary.py functionality * 💚 Add gc_content column to bin_df fixture in test_summary * 📝 Add docstrings and explanation within vote.py 🎨 Change vote.py argument from --input to --votes and add metavars to parser args 💚 Change make_test_data.py summary data to create gc_content column instead of GC column 💚 Update MockedArgs in vote.py to correspond to updated --votes parameter 🎨 Replace --input argument in autometa-taxonomy for SPLIT_KINGDOMS process to --votes * 🐛 Fig arg passed in pd.read_csv(...) for autometa.taxonomy.vote * 🐎 Add autometa/databases to dockerignore * 🎨 Update autometa-orfs entrypoint arguments 📝 Add type hints to autometa.common.external.prodigal funcs 🔥🎨 Remove --parallel parameter from autometa-orfs. Parallel is now inferred from --cpus arg * 🐎 ignore the ignore for autometa/databases/markers Add test of autometa-binning-summary entrypoint * 🐛 Replace incorrect variable (orfs) in BINNING_SUMMARY tag * 📝 Replace old kmer paramters in log info with new paramters
- Loading branch information