Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BED files are 1-based instead of 0-based, half-closed #98

Open
leonorpalmeira opened this issue Nov 28, 2022 · 2 comments
Open

BED files are 1-based instead of 0-based, half-closed #98

leonorpalmeira opened this issue Nov 28, 2022 · 2 comments

Comments

@leonorpalmeira
Copy link

Dear WisecondorX team,

we are using your tool in one of our pipelines and are very happy with it. When double-checking some results, we have realized that the nomenclature you use for the BED format does not follow the classical 0-based, half-closed nomenclature.

This is the head of the _bins.bed file of one of our samples:

chr	start	end	id	ratio	zscore
1	1	15000	1:1-15000	nan	nan
1	15001	30000	1:15001-30000	nan	nan
1	30001	45000	1:30001-45000	nan	nan
1	45001	60000	1:45001-60000	nan	nan
1	60001	75000	1:60001-75000	nan	nan
1	75001	90000	1:75001-90000	nan	nan
1	90001	105000	1:90001-105000	nan	nan
1	105001	120000	1:105001-120000	nan	nan
1	120001	135000	1:120001-135000	nan	nan

Is this something that could be modified in the future?

@matthdsm
Copy link

matthdsm commented Nov 29, 2022

Hi Leonor,

We're planning to resume (some) development on wisecondorx in Q1 of next year. In the mean time we welcome any and all PR with improvements and suggestions!

Thanks
Matthias

@leonorpalmeira
Copy link
Author

Hi Matthias,

unfortunately, we will not have time to dive into the WisecondorX to try to solve this issue. However, I can give my two cents on how to investigate for this. I see two possibilities:

  • If the bins are defined as 1 1 15000 in the BED file but are correctly counted in the python code (from position 0 to position 14999), then all we need is to correct the BED output file to 1 0 15000 to follow the BED nomenclature(0-based, half-closed).
  • If the bins are counted as 1 1 15000 in the python code, then this has to be corrected to make sure the border positions are properly taken into account. This probably means modifying several parts of the code.

Currently, this is not a high priority for us, given that the impact on our CNV inference is clearly negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants