Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This should probably be broken up into a subworkflow where ANALYZE_KMERS would be #187

Closed
chasemc opened this issue Sep 2, 2021 · 3 comments · Fixed by #229
Closed

This should probably be broken up into a subworkflow where ANALYZE_KMERS would be #187

chasemc opened this issue Sep 2, 2021 · 3 comments · Fixed by #229
Labels
enhancement New feature or request nextflow Nextflow related issues/code

Comments

@chasemc
Copy link
Member

chasemc commented Sep 2, 2021

This should probably be broken up into a subworkflow where ANALYZE_KMERS would be

workflow ANALYZE_KMERS {
  take:
    fasta
  main:
    COUNT_KMERS(fasta)
    NORMALIZE_KMERS(COUNT_KMERS.out)
    EMBED_KMERS(NORMALIZE_KMERS.out)
  emit:
    counts = COUNT_KMERS.out
    normalized = NORMALIZE_KMERS.out
    embedded = EMBED_KMERS.out
}

Originally posted by @WiscEvan in #157 (comment)

@evanroyrees evanroyrees added nextflow Nextflow related issues/code enhancement New feature or request labels Sep 28, 2021
@chasemc
Copy link
Member Author

chasemc commented Oct 18, 2021

@chasemc
Copy link
Member Author

chasemc commented Oct 18, 2021

@WiscEvan I can't see that comment anymore, just wanted to confirm this still what we want to do?
-> Basically split the single Python endpoint into 3 endpoints and then make nextflow processes for those 3, and wrap those 3 nextflow processes into a sub workflow

@Sidduppal You may be comfortable enough with NF now to tackle this one if you wanted.

Relevant code:

if os.path.exists(args.kmers) and not args.force:
df = pd.read_csv(args.kmers, sep="\t", index_col="contig")
else:
df = count(
assembly=args.fasta,
size=args.size,
out=args.kmers,
force=args.force,
cpus=args.cpus,
)
if args.norm_output:
df = normalize(
df=df, method=args.norm_method, out=args.norm_output, force=args.force
)
if args.embedding_output:
embedded_df = embed(
kmers=df,
out=args.embedding_output,
force=args.force,
method=args.embedding_method,
embed_dimensions=args.embedding_dimensions,
pca_dimensions=args.pca_dimensions,
seed=args.seed,
)

https://github.com/KwanLab/Autometa/blob/dev/modules/local/analyze_kmers.nf

@evanroyrees
Copy link
Collaborator

We should be able to implement this in nextflow without changing the autometa-kmers entrypoint..

e.g.

# process: count-kmers
autometa-kmers --fasta $fasta --kmers $counts --size $size --cpus $cpus

# process:  normalize-kmers
autometa-kmers --fasta $fasta --kmers $counts --size $size --norm-method $method --norm-output $output

# process: embed-kmers
autometa-kmers \
    --fasta $fasta \
    --kmers $counts \
    --norm-output $output \
    --embedding-method $method \
    --seed $embed_seed \
    --embedding-dimensions $embed_dims \
    --embedding-output $output

evanroyrees added a commit that referenced this issue Jan 28, 2022
🎨✅ Fix test for unclustered_recruitment.py
🎨🐛🍏🐍 Re-write unclustered recruitment features table to output-features and main to output-main
🎨🐍🐛 Incorrect cluster col usage in get_metabin_stats for binnin/summary.py
🎨🍏 Rename bin_contigs.nf to binning.nf
🎨🍏 Update tags for kmers, binning and recruitment
🎨🐍 Update kmers main behavior so kmer pipeline may be run at multiple different stages
🎨🍏 Update modules.config for new local kmer and binning processes
evanroyrees added a commit that referenced this issue Jan 30, 2022
…ntainer (#229)

* 🎨🍏 Add optional output and logic to handle archaea input
* 🎨🔥🍏 Remove unnecessary subworkflow for binning summary and binning.
* 🎨🍏 Add working version of using either bacteria or archaea based on params.kingdom
* 🎨🐛🍏 Add meta.cov_from_assembly = spades for mock data spades coverage channel
* 🎨🍏🔥 Remove redundant code in autometa.nf channels and unused params line in binning_summary.nf
* 🎨🍏 fixes #187 KMER sub-workflow
* 🎨✅ Fix test for unclustered_recruitment.py
* 🎨🐛🍏🐍 Re-write unclustered recruitment features table to output-features and main to output-main
* 🎨🐍🐛 Incorrect cluster col usage in get_metabin_stats for binnin/summary.py
* 🎨🍏 Rename bin_contigs.nf to binning.nf
* 🎨🍏 Update tags for kmers, binning and recruitment
* 🎨🐍 Update kmers main behavior so kmer pipeline may be run at multiple different stages
* 🎨🍏 Update modules.config for new local kmer and binning processes
* 🔥🍏 Fixes #163
* 🎨🐛🐍 Fix main logic for handling missing files for kmers
* 🔥🐛 Remove added args in main logic of kmers.py
* 🔥✅ Remove unnecessary import
* 💚🔥🐛 Remove duplicat norm_df fixture in test_kmers.py
* 🔥 Remove unused import in test_recursive_dbscan.py
* ✅🎨🐍🍏 Add behavior to raise/handle 204 exit code for autometa-binning/binning.nf
* ✅ Add test for raising a TableFormatError
* 🍏 Add errorStrategy to binning.nf to ignore the 204 error
* 🐍🎨 Add sys.exit(204) exit code when raising a BinninError or TableFormatError for recursive_dbscan.py
* 🎨🍏🐍 Replace 0 exit code with 204 and add handling 204 exitcode in RECRUIT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request nextflow Nextflow related issues/code
Projects
None yet
2 participants