This should probably be broken up into a subworkflow where `ANALYZE_KMERS` would be #187

chasemc · 2021-09-02T00:04:16Z

This should probably be broken up into a subworkflow where ANALYZE_KMERS would be

workflow ANALYZE_KMERS {
  take:
    fasta
  main:
    COUNT_KMERS(fasta)
    NORMALIZE_KMERS(COUNT_KMERS.out)
    EMBED_KMERS(NORMALIZE_KMERS.out)
  emit:
    counts = COUNT_KMERS.out
    normalized = NORMALIZE_KMERS.out
    embedded = EMBED_KMERS.out
}

Originally posted by @WiscEvan in #157 (comment)

The text was updated successfully, but these errors were encountered:

chasemc · 2021-10-18T16:20:05Z

relevant current code:
https://github.com/KwanLab/Autometa/blob/dev/modules/local/analyze_kmers.nf

chasemc · 2021-10-18T16:34:38Z

@WiscEvan I can't see that comment anymore, just wanted to confirm this still what we want to do?
-> Basically split the single Python endpoint into 3 endpoints and then make nextflow processes for those 3, and wrap those 3 nextflow processes into a sub workflow

@Sidduppal You may be comfortable enough with NF now to tackle this one if you wanted.

Relevant code:

Autometa/autometa/common/kmers.py

Lines 718 to 743 in f372b82

    
           if os.path.exists(args.kmers) and not args.force: 
        
               df = pd.read_csv(args.kmers, sep="\t", index_col="contig") 
        
           else: 
        
               df = count( 
        
                   assembly=args.fasta, 
        
                   size=args.size, 
        
                   out=args.kmers, 
        
                   force=args.force, 
        
                   cpus=args.cpus, 
        
               ) 
        
           if args.norm_output: 
        
               df = normalize( 
        
                   df=df, method=args.norm_method, out=args.norm_output, force=args.force 
        
               ) 
        
           if args.embedding_output: 
        
               embedded_df = embed( 
        
                   kmers=df, 
        
                   out=args.embedding_output, 
        
                   force=args.force, 
        
                   method=args.embedding_method, 
        
                   embed_dimensions=args.embedding_dimensions, 
        
                   pca_dimensions=args.pca_dimensions, 
        
                   seed=args.seed, 
        
               )

https://github.com/KwanLab/Autometa/blob/dev/modules/local/analyze_kmers.nf

evanroyrees · 2021-11-11T19:40:03Z

We should be able to implement this in nextflow without changing the autometa-kmers entrypoint..

e.g.

# process: count-kmers
autometa-kmers --fasta $fasta --kmers $counts --size $size --cpus $cpus

# process:  normalize-kmers
autometa-kmers --fasta $fasta --kmers $counts --size $size --norm-method $method --norm-output $output

# process: embed-kmers
autometa-kmers \
    --fasta $fasta \
    --kmers $counts \
    --norm-output $output \
    --embedding-method $method \
    --seed $embed_seed \
    --embedding-dimensions $embed_dims \
    --embedding-output $output

🎨✅ Fix test for unclustered_recruitment.py 🎨🐛🍏🐍 Re-write unclustered recruitment features table to output-features and main to output-main 🎨🐍🐛 Incorrect cluster col usage in get_metabin_stats for binnin/summary.py 🎨🍏 Rename bin_contigs.nf to binning.nf 🎨🍏 Update tags for kmers, binning and recruitment 🎨🐍 Update kmers main behavior so kmer pipeline may be run at multiple different stages 🎨🍏 Update modules.config for new local kmer and binning processes

…ntainer (#229) * 🎨🍏 Add optional output and logic to handle archaea input * 🎨🔥🍏 Remove unnecessary subworkflow for binning summary and binning. * 🎨🍏 Add working version of using either bacteria or archaea based on params.kingdom * 🎨🐛🍏 Add meta.cov_from_assembly = spades for mock data spades coverage channel * 🎨🍏🔥 Remove redundant code in autometa.nf channels and unused params line in binning_summary.nf * 🎨🍏 fixes #187 KMER sub-workflow * 🎨✅ Fix test for unclustered_recruitment.py * 🎨🐛🍏🐍 Re-write unclustered recruitment features table to output-features and main to output-main * 🎨🐍🐛 Incorrect cluster col usage in get_metabin_stats for binnin/summary.py * 🎨🍏 Rename bin_contigs.nf to binning.nf * 🎨🍏 Update tags for kmers, binning and recruitment * 🎨🐍 Update kmers main behavior so kmer pipeline may be run at multiple different stages * 🎨🍏 Update modules.config for new local kmer and binning processes * 🔥🍏 Fixes #163 * 🎨🐛🐍 Fix main logic for handling missing files for kmers * 🔥🐛 Remove added args in main logic of kmers.py * 🔥✅ Remove unnecessary import * 💚🔥🐛 Remove duplicat norm_df fixture in test_kmers.py * 🔥 Remove unused import in test_recursive_dbscan.py * ✅🎨🐍🍏 Add behavior to raise/handle 204 exit code for autometa-binning/binning.nf * ✅ Add test for raising a TableFormatError * 🍏 Add errorStrategy to binning.nf to ignore the 204 error * 🐍🎨 Add sys.exit(204) exit code when raising a BinninError or TableFormatError for recursive_dbscan.py * 🎨🍏🐍 Replace 0 exit code with 204 and add handling 204 exitcode in RECRUIT

evanroyrees added nextflow Nextflow related issues/code enhancement New feature or request labels Sep 28, 2021

evanroyrees linked a pull request Jan 28, 2022 that will close this issue

🐛 🎨 🍏 Fix kingdom-handling and mounting NCBI databases into docker container #229

Merged

evanroyrees closed this as completed Jan 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This should probably be broken up into a subworkflow where `ANALYZE_KMERS` would be #187

This should probably be broken up into a subworkflow where `ANALYZE_KMERS` would be #187

chasemc commented Sep 2, 2021

chasemc commented Oct 18, 2021

chasemc commented Oct 18, 2021

evanroyrees commented Nov 11, 2021

This should probably be broken up into a subworkflow where ANALYZE_KMERS would be #187

This should probably be broken up into a subworkflow where ANALYZE_KMERS would be #187

Comments

chasemc commented Sep 2, 2021

chasemc commented Oct 18, 2021

chasemc commented Oct 18, 2021

evanroyrees commented Nov 11, 2021

This should probably be broken up into a subworkflow where `ANALYZE_KMERS` would be #187

This should probably be broken up into a subworkflow where `ANALYZE_KMERS` would be #187