Strip empty first dimension for default windows. #313

jeromekelleher · 2019-08-20T16:19:26Z

This implements a subset of #200. The idea is that, when we don't specify windows then it's a pain to have to write stat[0] to get what you actually want. This seems like a good usablilty feature to add, to me --- this is what you'd actually want the library to do, right?

More concretely, take the following example:

afs = ts.allele_frequency_spectrum(mode="branch")
print("afs: ", afs)
print("diversity", ts.diversity([ts.samples()], mode="branch"))

afs = ts.allele_frequency_spectrum(mode="branch", windows=[0, ts.sequence_length])
print("afs:", afs)
print("diversity",
    ts.diversity([ts.samples()], mode="branch", windows=[0, ts.sequence_length]))

gives

afs:  [0.  2.5 0.  0. ]
diversity [3.33333333]
afs: [[0.  2.5 0.  0. ]]
diversity [[3.33333333]]

In the first case, we don't specify any windows so we're only interested in a single window and we remove the empty first dimension. In the second case, we explicitly specify the windows and so we keep the dimension in place.

I agree this is going to be a bit confusing to explain, but it seems worth it. The mistakes that are made will be less annoying, in the long run, than having to add an extra [0] to the end of each call when you just want one window. If, as @petrelharp says people are rarely interested in a single window, then they won't be affected by this default behaviour.

The reason I'm bringing this up now is because it is a long-term decision. Because we've specified the default value for windows, we can't change the default behaviour after we ship 0.2.0. So we're making the decision here that we'll never want to do this, which to me seems a real shame as it's quite neat and elegant (IMO).

Because we don't have a default value for, e.g., diversity now we don't have to worry about the other half of #200 (stripping off the empty dimensions when we're only looking at one sample set in, e.g., diversity until later).

FWIW, the implementation is easy and it doesn't break many of the tests.

jeromekelleher · 2019-08-20T16:20:42Z

docs/stats.rst

@@ -70,6 +70,12 @@ e.g., the sites of the SNPs.
 Windowing
 *********

+By default, statistics


I started writing this, and then decided to see how easy it would be to implement before getting into it.

petrelharp · 2019-08-20T17:21:17Z

You are probably right. I do worry that it's strange that specifying a non-default argument changes the output dimensions, but at least None is a different argument. Ok, go for it.

Re: the other half; I don't think we need to drop dimensions for a default indexes (when there is one); since np.array([2.0]) acts a lot like 2.0. But, do we want to set up default indexes for other stats?

jeromekelleher · 2019-08-20T19:24:33Z

You are probably right. I do worry that it's strange that specifying a non-default argument changes the output dimensions, but at least None is a different argument. Ok, go for it.

Hooray! OK, I'll flesh this out and ping you.

Re: the other half; I don't think we need to drop dimensions for a default indexes (when there is one); since np.array([2.0]) acts a lot like 2.0. But, do we want to set up default indexes for other stats?

Hmm, that's a good question. I'm not sure --- maybe this is an argument for dropping the default indexes value for Fst until after 0.2.0? The way I'm thinking about it at the moment is that post 0.2.0 we can possibly refine the semantics at bit such that the dimensions of your arguments determines the dimensions of your output. So, for example, we'd have

ts.diversity([ts.samples()])   ->   [x]
ts.diversity(ts.samples())     ->   x

Here, the user is giving clear information that they're only interested in one sample set in the second example, so we strip off the first dimension and just return the value. Possibly, the right behaviour then is for ts.diversity() to return x also. I'm not sure how this interacts with the indexes argument, but I'm not sure I want to tie us down to something without thinking it through properly either.

I know you have reservations about playing fast and loose with the dimensions like this, so I'm not saying this is something I'd definitely want to do. But, I don't definitely not want to do it either, so keeping the door open by not specifying default values for things we're not sure about might be good.

petrelharp · 2019-08-20T19:26:12Z

Re: indexes - sounds good; also see #315 in the meantime.

jeromekelleher · 2019-08-21T13:16:39Z

OK, turns out we couldn't really punt this one down the road and need to deal with it now. Here's my proposal for how we strip off empty dimensions @petrelharp --- the new tests should explain what it's doing. If we think this is a good idea I'll tidy up the rest of the tests and fix up the documentation.

I haven't looked at the derived stats like Fst, Tajimas D and the covariance stats etc. These may need to be treated differently.

python/tskit/trees.py

python/tests/test_tree_stats.py

petrelharp · 2019-08-21T15:06:31Z

Ok, so in words, we drop a dimension if

it is a one-way stat and sample_sets is actually just a list of samples
it is a k>1-way stat and indexes is actually just a k-tuple of indexes
it is a k>1-way stat and indexes=None and there are exactly k sample sets

This is nice and elegant. I like it. It also makes the interface a bit more confusing, but I say we go for it.

One wrinkle is that in the future we will probably have statistics that can take variaible numbers of indexes, so that e.g. indexes = [(0, 1, 2), (0, 1)] would be valid. (For instance, a multi-pop Fst.) I think we can just adjust things at that point, but we should make sure we don't make this impossible.

Do you have an opinion about dropping dimensions when windows=None?

python/tests/test_tree_stats.py

petrelharp · 2019-08-21T15:09:15Z

I haven't looked at the derived stats like Fst, Tajimas D and the covariance stats etc. These may need to be treated differently.

They should work the same, but we might need different code to make that happen...

jeromekelleher · 2019-08-21T15:18:38Z

This is nice and elegant. I like it. It also makes the interface a bit more confusing, but I say we go for it.

Excellent!

Do you have an opinion about dropping dimensions when windows=None?

Yes, this is already implemented --- we drop the first dimension when windows is None. See the _default_windows tests for examples.

petrelharp · 2019-08-21T15:25:56Z

Yes, this is already implemented --- we drop the first dimension when windows is None. See the _default_windows tests for examples.

Oh, duh. =) Yes, looks great!

codecov · 2019-08-21T20:16:39Z

Codecov Report

Merging #313 into master will decrease coverage by 1.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
- Coverage   87.47%   86.44%   -1.03%     
==========================================
  Files          19       20       +1     
  Lines       10282    14122    +3840     
  Branches     1902     2777     +875     
==========================================
+ Hits         8994    12208    +3214     
- Misses        764      987     +223     
- Partials      524      927     +403

Flag	Coverage Δ
#c_tests	`87.5% <100%> (+0.02%)`	⬆️
#python_c_tests	`90.3% <100%> (?)`
#python_tests	`99.24% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
python/tskit/trees.py	`98.67% <100%> (+0.02%)`	⬆️
python/_tskitmodule.c	`83.6% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23fa1a8...e86a72b. Read the comment docs.

codecov · 2019-08-21T20:16:45Z

Codecov Report

Merging #313 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
+ Coverage   86.45%   86.47%   +0.01%     
==========================================
  Files          20       20              
  Lines       14015    14035      +20     
  Branches     2748     2751       +3     
==========================================
+ Hits        12117    12137      +20     
  Misses        979      979              
  Partials      919      919

Flag	Coverage Δ
#c_tests	`87.55% <100%> (+0.02%)`	⬆️
#python_c_tests	`90.31% <100%> (+0.02%)`	⬆️
#python_tests	`99.23% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
python/tskit/trees.py	`98.66% <100%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 28a56cb...74b1742. Read the comment docs.

codecov · 2019-08-22T10:55:39Z

Codecov Report

Merging #313 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
+ Coverage   86.45%   86.47%   +0.01%     
==========================================
  Files          20       20              
  Lines       14015    14035      +20     
  Branches     2748     2751       +3     
==========================================
+ Hits        12117    12137      +20     
  Misses        979      979              
  Partials      919      919

Flag	Coverage Δ
#c_tests	`87.55% <100%> (+0.02%)`	⬆️
#python_c_tests	`90.31% <100%> (+0.02%)`	⬆️
#python_tests	`99.23% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
python/tskit/trees.py	`98.66% <100%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c156b7...59ceea2. Read the comment docs.

jeromekelleher · 2019-08-22T15:12:44Z

I think we're nearly there with this @petrelharp. I've tidied up the loose ends, and written some high-level docs. I think I just need to make another pass through the methods, probably linking back to some general descipription of the behaviour or parameters one-way and k-way statistics to document the behaviour in various cases.

What do you think?

jeromekelleher · 2019-08-22T15:25:14Z

Note to self: need to update the AFS values in the tutorial, as these are now wrong (dimension stripping).

Also, does this close #229? They are examples --- if somewhat boring ones.

docs/stats.rst

python/tskit/trees.py

petrelharp · 2019-08-22T17:58:33Z

This looks great, especially the tutorial. I think this closes #229, also.

jeromekelleher · 2019-08-22T20:07:48Z

TODO:

Fix general stats interface like in Replace mean_descendants with a node stat #328 (also move around code like there, it's good to have the stats functions together).
Update documentation. Need to describe behaviour of, e.g., windows, concisely and precisely somewhere, and put in links to this in each function.

petrelharp · 2019-08-23T07:36:30Z

Most (all?) of those TODOs are now done over in #330; maybe you want to merge that one in here before continuing.

jeromekelleher commented Aug 20, 2019

View reviewed changes

jeromekelleher mentioned this pull request Aug 20, 2019

Fst gives a NoneType error when no indices are given (the default) #250

Closed

petrelharp mentioned this pull request Aug 20, 2019

check for error when indexes=None #315

Merged

jeromekelleher force-pushed the default-windows-strip-dims branch from 9dbbd41 to cc260d3 Compare August 21, 2019 13:09

petrelharp reviewed Aug 21, 2019

View reviewed changes

python/tskit/trees.py Outdated Show resolved Hide resolved

petrelharp reviewed Aug 21, 2019

View reviewed changes

python/tests/test_tree_stats.py Outdated Show resolved Hide resolved

petrelharp reviewed Aug 21, 2019

View reviewed changes

python/tests/test_tree_stats.py Outdated Show resolved Hide resolved

jeromekelleher force-pushed the default-windows-strip-dims branch from e86a72b to 59ceea2 Compare August 22, 2019 10:37

jeromekelleher force-pushed the default-windows-strip-dims branch from e86a72b to 59ceea2 Compare August 22, 2019 11:08

jeromekelleher force-pushed the default-windows-strip-dims branch from 0dbd172 to c8cb6d6 Compare August 22, 2019 15:13

jeromekelleher mentioned this pull request Aug 22, 2019

Add windows option to genealogical_nearest_neighbours #193

Open

petrelharp reviewed Aug 22, 2019

View reviewed changes

docs/stats.rst Show resolved Hide resolved

petrelharp reviewed Aug 22, 2019

View reviewed changes

python/tskit/trees.py Show resolved Hide resolved

petrelharp reviewed Aug 22, 2019

View reviewed changes

python/tskit/trees.py Show resolved Hide resolved

jeromekelleher mentioned this pull request Aug 22, 2019

Replace mean_descendants with a node stat #328

Closed

jeromekelleher force-pushed the default-windows-strip-dims branch from c8cb6d6 to 855e827 Compare August 22, 2019 20:06

This was referenced Aug 23, 2019

stats tests fixups jeromekelleher/tskit#5

Closed

Fix testing #330

Merged

jeromekelleher force-pushed the default-windows-strip-dims branch from 855e827 to 74b1742 Compare August 23, 2019 08:05

jeromekelleher added 5 commits August 23, 2019 09:07

Strip empty first dimension for default windows.

f7c311d

Optionally drop empty dimensions from stats.

29dc7ee

Fixed up tests and improved sample set parsing.

f40be31

Add dimension stripping to derived stats.

8e695cc

Documentation for dimension stripping.

74b1742

jeromekelleher merged commit 47ec75f into tskit-dev:master Aug 23, 2019

jeromekelleher deleted the default-windows-strip-dims branch August 23, 2019 08:24

This was referenced Aug 23, 2019

Default behaviour for stats dimension #200

Closed

write examples of stats computations for tutorial #229

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strip empty first dimension for default windows. #313

Strip empty first dimension for default windows. #313

jeromekelleher commented Aug 20, 2019

jeromekelleher Aug 20, 2019

petrelharp commented Aug 20, 2019

jeromekelleher commented Aug 20, 2019

petrelharp commented Aug 20, 2019

jeromekelleher commented Aug 21, 2019

petrelharp commented Aug 21, 2019

petrelharp commented Aug 21, 2019

jeromekelleher commented Aug 21, 2019

petrelharp commented Aug 21, 2019

codecov bot commented Aug 21, 2019

codecov bot commented Aug 21, 2019 •

edited

Loading

codecov bot commented Aug 22, 2019

jeromekelleher commented Aug 22, 2019

jeromekelleher commented Aug 22, 2019

petrelharp commented Aug 22, 2019

jeromekelleher commented Aug 22, 2019

petrelharp commented Aug 23, 2019

Strip empty first dimension for default windows. #313

Strip empty first dimension for default windows. #313

Conversation

jeromekelleher commented Aug 20, 2019

jeromekelleher Aug 20, 2019

Choose a reason for hiding this comment

petrelharp commented Aug 20, 2019

jeromekelleher commented Aug 20, 2019

petrelharp commented Aug 20, 2019

jeromekelleher commented Aug 21, 2019

petrelharp commented Aug 21, 2019

petrelharp commented Aug 21, 2019

jeromekelleher commented Aug 21, 2019

petrelharp commented Aug 21, 2019

codecov bot commented Aug 21, 2019

Codecov Report

codecov bot commented Aug 21, 2019 • edited Loading

Codecov Report

codecov bot commented Aug 22, 2019

Codecov Report

jeromekelleher commented Aug 22, 2019

jeromekelleher commented Aug 22, 2019

petrelharp commented Aug 22, 2019

jeromekelleher commented Aug 22, 2019

petrelharp commented Aug 23, 2019

codecov bot commented Aug 21, 2019 •

edited

Loading