-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legacy Host Filter initial commit #224
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
rzlim08
commented
Apr 20, 2023
•
edited
Loading
edited
- Creates the initial legacy-host-filter workflow
rzlim08
changed the title
legacy-host-filter-inital-commit
Legacy Host Filter initial commit
Apr 21, 2023
rzlim08
added a commit
that referenced
this pull request
Apr 25, 2023
* fastp * fastp single * bowtie2 run * hisat2 run * dedup run * run subsample * run kallisto * adjust index tar filenames * polishing * polishing * count reads in each step * Create host_filter_indexing.wdl * boost fastp complexity threshold * output fastp report * build fastp from our fork with SDUST complexity filtering * use fastp --sdust_complexity_filter * bump * bump * tune * stub the remaining step descriptions * wire to tests * and auto_benchmark * fixup tests * fixup tests * fixup tests * fixup tests * fixup tests * fixup tests * add back in picard CollectInsertSizeMetrics * picard step description * host_filter_2022.wdl => host_filter.wdl * polish * restore fastqs_0 and fastqs_1 to minimize collateral changes * add minimap2 index build * picard_insert_metrics.txt * amr/run.wdl workaround * index multiple transcripts_fasta_gz * make gtf optional * allow uncompressed genome fasta * allow uncompressed genome fasta * allow uncompressed genome fasta * bump minimap2 memory * bump minimap2 memory * step descriptions -- first draft * add indexing driver & draft readme * include invocations in step descriptions * rebase amr fix * load card_json * run kallisto every time * fix amr wdl * fix short-read-mngs rebase weirdness * add final things * [modernized host filter] add ERCC and gene-level outputs to kallisto (#175) The kallisto step gains two new derivative output files: * `ERCC_counts.tsv`: Estimated read counts for the ERCC sequences only (two-column TSV: ERCC_id, est_counts) * `gene_abundance.tsv`: gene-level est_counts and tpm, computed by summing over all transcripts for each gene * (and `abundance.tsv` is renamed to `transcript_abundance.tsv`) To get the `gene_abundance.tsv` we need a new input `gtf_gz`, the Ensembl GTF file for the host species that will tell it how to map the transcript IDs in `transcript_abundance.tsv` onto gene IDs for the roll-up. The input is optional and if absent then the `gene_abundance.tsv` output is omitted too. Note: docker image update needed to install & upgrade some dependencies. * load card_json explicitly * add ~ * fix host_filter unit tests * fix host_filter unit tests * bowtie2: sort by read name for better reproducibility * update minimap2 indexing invocation * add chelonia_mydas, drosophila_melanogaster, gray_whale, pea-aphid * copy-paste {bowtie2,hisat2}_human_filter to support pipeline viz * allow kallisto nonzero exit * rename modern host filtering inputs/outputs and create a 1-1 mapping between inputs/outputs * fix lint issue * rename reads_in_count to input_read_count * auto_benchmark updates * fix test_RunCZIDDedup_safe_csv * rename kallisto output files * update mosquitos with several Culicidae * add files to wdl output for pipeline viz compatibility * convert headers in descriptions to bolded text * delete host_filter_indexing since it's subsumed in #182 * fix glob patterns in read counting * Revert "fix glob patterns in read counting" This reverts commit aeb234f. * [Bug] fix count expansion for single file short-read-mngs (#216) * fix bowtie2 counts for single file * fix extra expansions * relieve hisat2 dependency * single sample hisat2 * fix hisat2 * fix dockerfile for hisat2 --------- Co-authored-by: Omar Valenzuela <51972068+ovalenzuela19@users.noreply.github.com> * Remove AMR changes that are a WIP from modern host filtering branch (#219) * Revert "output gene id in primary output file (#209)" This reverts commit 2d9ff56. * Revert "Output non host reads and non host contigs for AMR (#205)" This reverts commit 9de3fc2. * tune hisat2 memory usage (#223) * Legacy Host Filter initial commit (#224) * legacy-host-filter-inital-commit * linting * add stage io map * remove stage io map swp file * Revert "Remove AMR changes that are a WIP from modern host filtering branch (#219)" (#226) This reverts commit 227a489. --------- Co-authored-by: Mike Lin <mlin@Mikes-MacBook-Pro.local> Co-authored-by: Omar Valenzuela <ovalenzuela@chanzuckerberg.com> Co-authored-by: Omar Valenzuela <51972068+ovalenzuela19@users.noreply.github.com> Co-authored-by: rzlim08 <37033997+rzlim08@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.