-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create tool for producing genomic regions (as a BED file) #7159
Labels
learn GATK
Suitable for GATK beginners
Comments
sanashah007
added a commit
that referenced
this issue
Aug 29, 2024
#8942) * Initial commit and basic code to read gtf * add: code to write to bed & integration test * fix: make getAllFeatures public and use the nesting of features to get to transcripts * add: filtering transcripts by basic tag * add: sorts by contig and start (need to fix - sorting lexicographically) * fix: now sorts by contig then start & output is correct * fix: make dictionary an arg * add: comments + simplified CompareGtfInfo * refactor: apply method test: add separate tests for gene and transcript * refactor: onTraversalSuccess and writeToBed * add: more tests * fix: test files in correct dir pt1. (files are too large) * fix: test files in correct dir pt2. * add: compareFiles and ground truth bed files * fix: runGtfToBed assert * add: comments to GtfToBed * fix: error handling for different versions of gtf and dictionary * fix: edited some bad conventions * fix: remove spaces from input file fullName * add: gtf file with MYT1L and MAPK1 * add: many transcripts unit test and refactoring * add: tiebreaker sorting by id * add: make sort by basic optional * add: html doc comment * fix: dictionary arg * fix: add "Gencode" to description * add: sample mouse gencode testing * fix: Remove arg shortnames * fix: rename and move CompareGtfInfo * fix: kebab-case args * fix: update html doc * fix: use IntegrationTestSpec.assertEqualTextFiles() * fix: remove unnecessary test of pik3ca * fix: remove set functions in GtfInfo * fix: style of comparator * fix: style of comparator * fix: use Files.newOutputStream() to write and logger for errors * fix: use getBestAvailableSequenceDictionary() * fix: use dataProvider for integration tests * fix: better encapsulation * fix: move mapk1.gtf to large dir * fix: arg names * fix: rename reference dict. * fix: sequence-dictionary arg javadoc * add: javadoc to GtfInfo * add: dictionary exception and corresponding test * add: test with fasta file as reference arg * add: javadoc for fasta file * fix: javadoc and onTraversalStart exception
Resolved by #8942 . Unless @LeeTL1220 or @droazen has any objections I will close this with the note that all P0 requirements are met by the relevant PR. For the P2 requirements, the following are not included as of now but if they are deemed important I can keep this open:
Thanks to @sanashah007 for all the work! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature request
Tool(s) or class(es) involved
This is a request for a new tool
GencodeRegionsAsBED
Description
Given a GENCODE gtf, create a BED file with the region of the genes. Each row is a gene.
Suggestion: This can be implemented as a
FeatureWalker<GencodeGtfFeature>
Requirements
[P0] = "Must have. Cannot close this issue without this feature or without filing another issue. This tool is not considered complete without this feature."
[P2] = "Not required. This tool can be considered complete without this feature. No need to ask permission to drop it. If it is NOT delivered, please mention what P2's were not delivered in the closing comment of this issue."
Example output
BED is tab-delimited...
With transcript option:
Note: The union of the transcript regions is reported when the transcript option is not present.
The text was updated successfully, but these errors were encountered: