get_cis_expected API (smoothing, aggregation, etc) #280

gfudenberg · 2021-10-14T20:32:13Z

propagating NaNs through smoothing e.g. balanced.sum.smoothed should be NaN if balanced.sum is NaN.
consider renaming balanced & areas here to something more interpretable (smoothed_values, smoothed_counts?)

cooltools/cooltools/sandbox/expected_smoothing.py

Line 195 in f478c1c

balanced, areas = log_smooth(
its not currently clear what the agg argument is supposed to do-- it seems to deletes the count column (which also has slightly unexpected values)

golobor · 2021-10-18T19:51:55Z

thank you for these thoughtful comments!

propagating NaNs through smoothing e.g. balanced.sum.smoothed should be NaN if balanced.sum is NaN.

not sure if I agree with this one! By definition, smoothing assumes that the "true" value of contact frequency for a given diagonal can be inferred from those from adjacent diagonals. If if some diagonal misses good pixels entirely, I would say, we should still be able to estimate its contact frequency, no?..

consider renaming balanced & areas here to something more interpretable (smoothed_values, smoothed_counts?)

Done!

its not currently clear what the agg argument is supposed to do-- it seems to deletes the count column (which also has slightly unexpected values)

Yes, that is a good point - I just pushed new API that does not have this argument and does not drop columns that were present in the cvd table:

cooltools/cooltools/sandbox/expected_smoothing.py

Line 232 in 704f259

def agg_smooth_cvd(

https://github.com/open2c/cooltools/blob/master/cooltools/sandbox/expected_smoothing_example.ipynb

gfudenberg · 2021-11-11T00:59:09Z

going through the notebooks, noticed that the docstring for

def get_cis_expected(
    ...
    intra_only=True,
    ...
):
 says intra_only=False returns all combinations, but it only returns non-symmetric combinations. 

Had we discussed a desired behavior?


https://github.com/open2c/cooltools/blob/ad018d6005b5df852cf39f6ec5321bdfab95106f/cooltools/expected.py#L984

sergpolly · 2021-11-11T01:54:35Z

yes, - we've discussed ...
conclusion was to makr intra_only=False to return EVERYTHING - intra and inter together
I'll check after everyone goes to sleep

gfudenberg · 2021-11-11T02:09:08Z

actually, now seems to be an issue on the main branch that emerges with intra_only=False for get_cis_expected (or one of the checks that gets called)

gfudenberg · 2021-11-11T02:27:54Z

Update: looks like the issue is with the sorting check-- I think it actually doesn't make sense, b/c there could be more regions in the view than in the cooler, but it could still be compatible

sergpolly · 2021-11-11T14:38:07Z

@gfudenberg we need viewframe to be sorted (by coordinate and according to cooler's order of chromosomes) because we generate "inter"-regions (inter arms, inter - whatever) in a pairwise combination fashion - i.e. first with second, first with third etc - and if viewframe isn't sorted - some of these inter regions would end up in the lower left side of the heatmap

enforcing sorted viewframe is the easiest way to deal with it

gfudenberg · 2021-11-11T15:18:45Z

I'm not sure if this would be a different check if it's only used sometimes-- one could potentially want a view that is in a different order of a cooler for many analyses while still wanting this intervals contained in the cooler. Also, current sort implementation doesn't work for a set of arms that are sorted. So it might need to use an interval-wise sort rather than a name-wise sort.

…

On Thu, Nov 11, 2021, 6:38 AM Sergey Venev ***@***.***> wrote: @gfudenberg <https://github.com/gfudenberg> we need viewframe to be sorted (by coordinate and according to cooler's order of chromosomes) because we generate "inter"-regions (inter arms, inter - whatever) in a pairwise combination fashion - i.e. first with second, first with third etc - and if viewframe isn't sorted - some of these inter regions would end up in the lower left side of the heatmap enforcing sorted viewframe is the easiest way to deal with it — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#280 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEV7GZLRFUADXW25A4HR5ITULPINTANCNFSM5GAP72MA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

sergpolly · 2021-11-11T15:22:48Z

is_compatible_viewframe does not check order by default - only when requested
we request it for pairwise expected calculations , because we need it there to ensure we're in the upper right triangle of the heatmap

I've just checked is_sorted on arms and on the subset of arms and everything seems to work ok - i.e. in the same open2c example notebook with the microC dataset reduced to 2 chroms . The only thing I had to do is to reduce hg38_arms down to chromosomes available in the cooler itself - i.e. chr2 and chr17

sergpolly · 2021-11-11T15:29:04Z

the problem with intra_only=False was indeed real - it didn't do what we've agreed on - it was only returning intra (i.e. asymmetric expecteds)
I'm about to push some changes to correct for it

gfudenberg · 2021-11-11T15:42:57Z

Ah, had though I dropped chrms but maybe I didn't. Should definitely add a note about sortedness to the docstring.

…

On Thu, Nov 11, 2021, 7:29 AM Sergey Venev ***@***.***> wrote: the problem with intra_only=False was indeed real - it didn't do what we've agreed on - it was only returning intra (i.e. asymmetric expecteds) I'm about to push some changes to correct for it — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#280 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEV7GZJOLQXB32CDOWYSUOTULPOMXANCNFSM5GAP72MA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

gfudenberg · 2021-11-12T20:56:01Z

discussion 11.12:

proposed changes for expected_cis:

some name changes for arguments

def expected_cis(
    clr,
    view_df=None,
    intra_only=True,
    smooth=True
    aggregate_smoothed=True,
    smooth_sigma=0.1,
    clr_weight_name="weight",
    ignore_diags=2,  # should default to cooler info
    chunksize=10_000_000,
    nproc=1,
):

Some simplification for the returned dataframe:

return balanced columns if clr_weight_name is not None
return .smoothed column if smooth=True
return .smoothed.agg column if aggregate_smoothed=True

Always return:
region1, region2,
dist, n_valid,
count.sum, count.avg,

Return if smooth=True and aggregate_smoothed=True
count.avg.smoothed, count.avg.smooth.agg

Return if smooth=True and aggregate_smoothed=True and clr_weight_name is not None:
balanced.sum, balanced.avg, (if cooler weight name not none)
balanced.avg.smoothed, balanced.avg.smooth.agg (if aggregate_smoothed=Ture)

This can be implemented by:

moving the merge into expected_cis
changing how the dictionary of DEFAULT_CVD_COLS is passed to the smoothing functions.

For release we will not return the count smoothing.
@golobor any suggestions for best way to pass the DEFAULT_CVD_COLS dictionary?

gfudenberg · 2021-11-13T01:07:36Z

bubbling up the idea of propagating NaNs--
the current smoothing does not interpolate to dist=1 correctly, that's all I was suggesting, not NaNs later in the curve. Does that make more sense @golobor ?

gfudenberg · 2022-09-01T18:37:06Z

this PR adds np.nan propagation for smoothed outputs

cooltools/cooltools/api/expected.py

Line 1047 in 1e953e3

# propagate nan

#380

is that o.k. ? @golobor @sergpolly

sergpolly · 2022-09-21T20:53:37Z

the newest API proposal - expected_full that would combine cis and trans expected-s together:

expected_full(
        clr,
        view_df=None,  # same view e.g. arms for cis and trans
        smooth_cis=False,  # smooth cis True|False
        aggregate_cis=False,  # aggregate cis expected False | "chrom" | "genome"
        # we have to allow for aggregate_trans, e.g. when calculating trans expected by arm
        aggregate_trans=False,  # aggregate trans expected False | "chrom" | "genome"
        expected_column_name="expected",  # store final result in a single column
        drop_intermediate_columns=True,  # drop count.sum, balanced.sum, n_valid balanced.sum.smooth etc ...

        # usual options
        smooth_sigma=0.1,
        ignore_diags=2,
        clr_weight_name='weight',
        chunksize=10_000_000,
        nproc=4,
)

sample output without intermediate columns:

region1	region2	dist	expected
chr2_p	chr2_p	0	NaN
chr2_p	chr2_p	1	NaN
chr2_p	chr2_p	2	0.068918
chr2_p	chr2_p	3	0.045381
...	...	...	...
chr2_p	chr2_q	2421	0.000050
chr2_p	chr17_p	-1	0.000022
chr2_p	chr17_q	-1	0.000022
chr2_q	chr17_p	-1	0.000022
chr2_q	chr17_q	-1	0.000022

note, how trans-expected has a special value for distance -1 - allowing for easy filtering and aggregation.

Here is the same output with the intermediate columns:

region1	region2	dist	n_valid	balanced.sum	balanced.avg	balanced.avg.smooth	expected
chr2_p	chr2_p	0	878.0	NaN	NaN	NaN	NaN
chr2_p	chr2_p	1	876.0	NaN	NaN	0.000795	0.000795
chr2_p	chr2_p	2	874.0	65.287351	0.074699	0.068918	0.068918
chr2_p	chr2_p	3	872.0	41.011675	0.047032	0.045381	0.045381
...	...	...	...	...	...	...	...
chr2_p	chr2_q	2421	0.0	0.000000	NaN	0.000050	0.000050
chr2_p	chr17_p	-1	174722.0	3.940185	0.000023	NaN	0.000022
chr2_p	chr17_q	-1	477632.0	10.879623	0.000023	NaN	0.000022
chr2_q	chr17_p	-1	284769.0	5.839495	0.000021	NaN	0.000022
chr2_q	chr17_q	-1	778464.0	16.283992	0.000021	NaN	0.000022

sergpolly · 2022-09-26T20:34:51Z

in the name of biology we converged on the following API:

expected_full_fast(
    clr,
    view_df=hg38_arms,  # same view for cis and trans
    smooth_cis=True, # applies yto both inter and intra (-cis)
    smooth_log10=0.03
    combine_cis = False|"intra_genomewide"|function !
    combine_trans= False(None)|"chrom"|"genomewide"|function !
    expected_column_name="expected"
    store_intermediate_columns=True
)
# region1, region2, dist, expected_column - default

# if everything ios False and choose to store intermediates ...
# region1 region2 dist n_valid balanced.sum balanced.avg expected

gfudenberg · 2024-03-11T08:52:35Z

API discussion superseded by #501

the issue of propagating NaNs from count.avg & balanced.avg to the smoothed columns remains

gfudenberg changed the title ~~suggestions for cvd smoothing~~ get_cis_expected API (smoothing, aggregation, etc) Nov 11, 2021

gfudenberg assigned golobor Nov 11, 2021

gfudenberg mentioned this issue Nov 11, 2021

notebooks update open2c/open2c_examples#16

Merged

Phlya mentioned this issue Nov 12, 2021

enable smooth and aggregate in cis-expected using Antons smoothing #305

Merged

gfudenberg mentioned this issue Nov 17, 2021

v0.6 roadmap #312

Open

49 tasks

sergpolly mentioned this issue Sep 21, 2022

OE update in sandbox #391

Merged

gfudenberg closed this as completed Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_cis_expected API (smoothing, aggregation, etc) #280

get_cis_expected API (smoothing, aggregation, etc) #280

gfudenberg commented Oct 14, 2021 •

edited

Loading

golobor commented Oct 18, 2021

gfudenberg commented Nov 11, 2021

sergpolly commented Nov 11, 2021

gfudenberg commented Nov 11, 2021

gfudenberg commented Nov 11, 2021

sergpolly commented Nov 11, 2021

gfudenberg commented Nov 11, 2021 via email

sergpolly commented Nov 11, 2021

sergpolly commented Nov 11, 2021

gfudenberg commented Nov 11, 2021 via email

gfudenberg commented Nov 12, 2021 •

edited

Loading

gfudenberg commented Nov 13, 2021

gfudenberg commented Sep 1, 2022 •

edited

Loading

sergpolly commented Sep 21, 2022

sergpolly commented Sep 26, 2022

gfudenberg commented Mar 11, 2024

get_cis_expected API (smoothing, aggregation, etc) #280

get_cis_expected API (smoothing, aggregation, etc) #280

Comments

gfudenberg commented Oct 14, 2021 • edited Loading

golobor commented Oct 18, 2021

gfudenberg commented Nov 11, 2021

sergpolly commented Nov 11, 2021

gfudenberg commented Nov 11, 2021

gfudenberg commented Nov 11, 2021

sergpolly commented Nov 11, 2021

gfudenberg commented Nov 11, 2021 via email

sergpolly commented Nov 11, 2021

sergpolly commented Nov 11, 2021

gfudenberg commented Nov 11, 2021 via email

gfudenberg commented Nov 12, 2021 • edited Loading

gfudenberg commented Nov 13, 2021

gfudenberg commented Sep 1, 2022 • edited Loading

sergpolly commented Sep 21, 2022

sergpolly commented Sep 26, 2022

gfudenberg commented Mar 11, 2024

gfudenberg commented Oct 14, 2021 •

edited

Loading

gfudenberg commented Nov 12, 2021 •

edited

Loading

gfudenberg commented Sep 1, 2022 •

edited

Loading