Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: remove pybedtools as a dependency. This slightly changes the interpretation of BED files. From now on, the end position of each region will not be included. #184

Merged
merged 15 commits into from
Feb 7, 2024

Conversation

aryarm
Copy link
Member

@aryarm aryarm commented Jul 16, 2023

TRTools previously used pybedtools to read tabix-indexed BED files, but it turns out that pysam can do that already

since this was the only use for pybedtools in the codebase, this PR also removes it from the list of dependencies

this PR doesn't add any tests because the relevant code is already tested in test_RegionFilter()

Once this PR gets merged, it might be easier to package TRTools with haptools somehow. I was previously unable to do it because (I think) pybedtools is one of those packages on PyPI that hasn't published it's dependency metadata and thus is impossible to install with poetry? (see here for another example and more info)

> curl -s https://pypi.org/pypi/pybedtools/json | jq ".info.requires_dist"
null

@gymreklab, as the original author of the region handling code, can I request your review? Feel free to let me know if there are more tests I should include

@aryarm aryarm requested review from LiterallyUniqueLogin and gymreklab and removed request for LiterallyUniqueLogin October 30, 2023 19:44
@aryarm aryarm changed the base branch from develop to master October 30, 2023 21:26
@aryarm
Copy link
Member Author

aryarm commented Nov 9, 2023

ok, two of the dumpSTR tests did in fact fail 😦

It appears as if the failing tests are those where files output by dumpSTR are compared to those that have been already generated.

In the tests, there is a bed file provided to dumpSTR that contains only one region in it: chr21:9487190-14405799. But in the input and output, there is a TR record that is at exactly position 14405799 (at the end of the region). Perhaps pysam treats the regions as exclusive of the end position while pybedtools includes them? In that case, we could just add 1 to every end position before giving it to pysam.

For now, I've just changed the already generated files to match the way that pysam works. See 9784fec for the changes.

@LiterallyUniqueLogin
Copy link
Contributor

That's fine. IIRC, those tests were a quick way to prevent future bugs from going unnoticed without exhaustively testing the module which is more desirable but very time intensive. If the change in output is acceptable, just swap the old file for the new. The test could also be removed if it's causing more issues than helping, but hopefully not.

@aryarm aryarm marked this pull request as draft November 9, 2023 16:47
@LiterallyUniqueLogin
Copy link
Contributor

This change looks good, I don't see anything else you need to review (though feel free to if you'd like). If you want to push, I'll approve. Just mention in our README that bed files are end point exclusive, and mention this in the change log.

@aryarm aryarm marked this pull request as ready for review November 9, 2023 23:35
@aryarm aryarm changed the title refactor: use pysam instead of pybedtools refactor: remove pybedtools as a dependency. This slightly changes the interpretation of BED files. From now on, the end position of each region will not be included. Jan 17, 2024
@LiterallyUniqueLogin
Copy link
Contributor

We need to retitle this PR to make it clear to the version-handling software that this is a breaking change. Will approve after that!

@aryarm aryarm changed the title refactor: remove pybedtools as a dependency. This slightly changes the interpretation of BED files. From now on, the end position of each region will not be included. refactor!: remove pybedtools as a dependency. This slightly changes the interpretation of BED files. From now on, the end position of each region will not be included. Feb 7, 2024
@aryarm
Copy link
Member Author

aryarm commented Feb 7, 2024

done! that's a good catch - I added an exclamation mark to the conventional commits prefix

@LiterallyUniqueLogin LiterallyUniqueLogin merged commit 89c1dd3 into gymrek-lab:master Feb 7, 2024
8 checks passed
@aryarm aryarm deleted the ref/pybedtools-pysam branch February 7, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants