-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_cis_expected API (smoothing, aggregation, etc) #280
Comments
thank you for these thoughtful comments!
not sure if I agree with this one! By definition, smoothing assumes that the "true" value of contact frequency for a given diagonal can be inferred from those from adjacent diagonals. If if some diagonal misses good pixels entirely, I would say, we should still be able to estimate its contact frequency, no?..
Done!
Yes, that is a good point - I just pushed new API that does not have this argument and does not drop columns that were present in the cvd table:
https://github.com/open2c/cooltools/blob/master/cooltools/sandbox/expected_smoothing_example.ipynb |
going through the notebooks, noticed that the docstring for def get_cis_expected(
...
intra_only=True,
...
):
says intra_only=False returns all combinations, but it only returns non-symmetric combinations.
Had we discussed a desired behavior?
https://github.com/open2c/cooltools/blob/ad018d6005b5df852cf39f6ec5321bdfab95106f/cooltools/expected.py#L984 |
yes, - we've discussed ... |
Update: looks like the issue is with the sorting check-- I think it actually doesn't make sense, b/c there could be more regions in the view than in the cooler, but it could still be compatible |
@gfudenberg we need viewframe to be sorted (by coordinate and according to cooler's order of chromosomes) because we generate "inter"-regions (inter arms, inter - whatever) in a pairwise combination fashion - i.e. first with second, first with third etc - and if viewframe isn't sorted - some of these inter regions would end up in the lower left side of the heatmap enforcing sorted viewframe is the easiest way to deal with it |
I'm not sure if this would be a different check if it's only used
sometimes-- one could potentially want a view that is in a different order
of a cooler for many analyses while still wanting this intervals contained
in the cooler.
Also,
current sort implementation doesn't work for a set of arms that are sorted.
So it might need to use an interval-wise sort rather than a name-wise sort.
…On Thu, Nov 11, 2021, 6:38 AM Sergey Venev ***@***.***> wrote:
@gfudenberg <https://github.com/gfudenberg> we need viewframe to be
sorted (by coordinate and according to cooler's order of chromosomes)
because we generate "inter"-regions (inter arms, inter - whatever) in a
pairwise combination fashion - i.e. first with second, first with third etc
- and if viewframe isn't sorted - some of these inter regions would end up
in the lower left side of the heatmap
enforcing sorted viewframe is the easiest way to deal with it
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEV7GZLRFUADXW25A4HR5ITULPINTANCNFSM5GAP72MA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I've just checked is_sorted on arms and on the subset of arms and everything seems to work ok - i.e. in the same open2c example notebook with the microC dataset reduced to 2 chroms . The only thing I had to do is to reduce hg38_arms down to chromosomes available in the cooler itself - i.e. chr2 and chr17 |
the problem with |
Ah, had though I dropped chrms but maybe I didn't. Should definitely add a
note about sortedness to the docstring.
…On Thu, Nov 11, 2021, 7:29 AM Sergey Venev ***@***.***> wrote:
the problem with intra_only=False was indeed real - it didn't do what
we've agreed on - it was only returning intra (i.e. asymmetric expecteds)
I'm about to push some changes to correct for it
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#280 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEV7GZJOLQXB32CDOWYSUOTULPOMXANCNFSM5GAP72MA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
discussion 11.12: proposed changes for expected_cis: some name changes for arguments def expected_cis(
clr,
view_df=None,
intra_only=True,
smooth=True
aggregate_smoothed=True,
smooth_sigma=0.1,
clr_weight_name="weight",
ignore_diags=2, # should default to cooler info
chunksize=10_000_000,
nproc=1,
): Some simplification for the returned dataframe:
Always return: Return if smooth=True and aggregate_smoothed=True Return if smooth=True and aggregate_smoothed=True and clr_weight_name is not None: This can be implemented by:
For release we will not return the count smoothing. |
bubbling up the idea of propagating NaNs-- |
this PR adds np.nan propagation for smoothed outputs cooltools/cooltools/api/expected.py Line 1047 in 1e953e3
is that o.k. ? @golobor @sergpolly |
the newest API proposal - expected_full(
clr,
view_df=None, # same view e.g. arms for cis and trans
smooth_cis=False, # smooth cis True|False
aggregate_cis=False, # aggregate cis expected False | "chrom" | "genome"
# we have to allow for aggregate_trans, e.g. when calculating trans expected by arm
aggregate_trans=False, # aggregate trans expected False | "chrom" | "genome"
expected_column_name="expected", # store final result in a single column
drop_intermediate_columns=True, # drop count.sum, balanced.sum, n_valid balanced.sum.smooth etc ...
# usual options
smooth_sigma=0.1,
ignore_diags=2,
clr_weight_name='weight',
chunksize=10_000_000,
nproc=4,
) sample output without intermediate columns:
note, how trans-expected has a special value for distance Here is the same output with the intermediate columns:
|
in the name of biology we converged on the following API: expected_full_fast(
clr,
view_df=hg38_arms, # same view for cis and trans
smooth_cis=True, # applies yto both inter and intra (-cis)
smooth_log10=0.03
combine_cis = False|"intra_genomewide"|function !
combine_trans= False(None)|"chrom"|"genomewide"|function !
expected_column_name="expected"
store_intermediate_columns=True
)
# region1, region2, dist, expected_column - default
# if everything ios False and choose to store intermediates ...
# region1 region2 dist n_valid balanced.sum balanced.avg expected |
API discussion superseded by #501 the issue of propagating NaNs from count.avg & balanced.avg to the smoothed columns remains |
propagating NaNs through smoothing e.g. balanced.sum.smoothed should be NaN if balanced.sum is NaN.
consider renaming balanced & areas here to something more interpretable (smoothed_values, smoothed_counts?)
cooltools/cooltools/sandbox/expected_smoothing.py
Line 195 in f478c1c
its not currently clear what the agg argument is supposed to do-- it seems to deletes the count column (which also has slightly unexpected values)
cc @golobor
The text was updated successfully, but these errors were encountered: