-
Notifications
You must be signed in to change notification settings - Fork 32
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve distance binning for FR,RR,FR,RF pairs "scalings" in stats output #81
Comments
also should we expose
|
So the problem is matching bins between cooler and pairtools. It's a little tricky. There are a few options.
Current proposal for nice cooler bins: 1,2,3,4,5,6,8,10,13,16,20,25,32,40,50,63,79,100, at 1kb those bins become: 1000,2000,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000,79000... The two are clearly different pairtools bins not matched to cooler: pairtools= bins matched to cooler at 1kb: pairtools bis matched to cooler at 1kb and 100bp/200bp if 100bp/200bp cooler uses modified bins: And bins matched to cooler at 100, 200, 1000bp resolutions with extra bins for pairs. I couldn't think of a more general solution. Powers of two obviously has one, but not here... |
@golobor @sergpolly @agalitsyna - what do you guys think? We should probably decide on this before we merge in cooltools logbin_expected. -- Should we aim at matching at one resolution, or at two, or matching at all? |
alternatively, we can let users decide between two options: (a) keep bins
nice within all orders of magnitude, (b) do not use nice bins at all.
|
IMHO - ~100 bp is needed , at least for pair-level stuff because of DNase/MNase-based methods like microC, OmniC, and whateverC might happen "tomorrow" 100bp coolers for microC isn't a crazy thing to do, so perhaps it makes sense to match it like @mimakaev suggested:
but this would only work for high-resolution coolers and wouldn't be applicable to sparse data - <50-100M pairs of usable pairs in a cooler. So like @golobor is suggesting - this matching between bins for coolers and pairs could be optional another IMHO - i don't think it is THAT crucial to match bins for |
Yeah, that would probably be ideal. I will a little better engineer that set and make sure it is actually matched. |
These are ratios of neighboring pair bins in the current version of bins. bins = [10,13,16,20,25,32,40,50,63,79,100,126,159,200,240,300,400,490,600,800,1000,1200,1600,2000,2400,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000] Bins for 100bp and 200bp (just without 100 and 300) |
i'll repeat this, but - why not making bins nice in all orders of magnitude?
(
[1,2,3,4,5,6,8,10]
+ [1,2,3,4,5,6,8,10] * 10
+ [1,2,3,4,5,6,8,10] * 100
)
What negative consequences would this have?
…On Fri, 6 Mar 2020 at 01:55, Maksim Imakaev ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/9454715/76039728-a7af5080-5f23-11ea-9501-8f288256dfa4.png>
These are ratios of neighboring pair bins in the current version of bins.
bins =
[10,13,16,20,25,32,40,50,63,79,100,126,159,200,240,300,400,490,600,800,1000,1200,1600,2000,2400,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000]
Bins for 100bp and 200bp (just without 100 and 300)
100,200,300,400,600,800,1000,1200,1600,2000,2400,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#81>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG64CRFPJEFYQW2JFOCFDDRGBCZ5ANCNFSM4KZJNMVQ>
.
|
ok, now I get it. A large negative consequence is a two-fold jump from 1 to 2. Could have used 1 2 5 10 instead - that's at least even. A partial remedy is to use these bins, and drop #2,3,5 in the first order of magnitude |
I will convert this to the discussion for now, but feel free to comment or open an issue if binning improvements are needed! |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L147
consider replacing it with something more readable and usable, e.g. @mimakaev 's robust bins:
currently we have:
which are also non-decreasing, but are too sparsely spaced ... - and code is hard to read
The text was updated successfully, but these errors were encountered: