-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strip empty first dimension for default windows. #313
Strip empty first dimension for default windows. #313
Conversation
docs/stats.rst
Outdated
@@ -70,6 +70,12 @@ e.g., the sites of the SNPs. | |||
Windowing | |||
********* | |||
|
|||
By default, statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started writing this, and then decided to see how easy it would be to implement before getting into it.
You are probably right. I do worry that it's strange that specifying a non-default argument changes the output dimensions, but at least Re: the other half; I don't think we need to drop dimensions for a default |
Hooray! OK, I'll flesh this out and ping you.
Hmm, that's a good question. I'm not sure --- maybe this is an argument for dropping the default indexes value for Fst until after 0.2.0? The way I'm thinking about it at the moment is that post 0.2.0 we can possibly refine the semantics at bit such that the dimensions of your arguments determines the dimensions of your output. So, for example, we'd have
Here, the user is giving clear information that they're only interested in one sample set in the second example, so we strip off the first dimension and just return the value. Possibly, the right behaviour then is for I know you have reservations about playing fast and loose with the dimensions like this, so I'm not saying this is something I'd definitely want to do. But, I don't definitely not want to do it either, so keeping the door open by not specifying default values for things we're not sure about might be good. |
Re: indexes - sounds good; also see #315 in the meantime. |
9dbbd41
to
cc260d3
Compare
OK, turns out we couldn't really punt this one down the road and need to deal with it now. Here's my proposal for how we strip off empty dimensions @petrelharp --- the new tests should explain what it's doing. If we think this is a good idea I'll tidy up the rest of the tests and fix up the documentation. I haven't looked at the derived stats like Fst, Tajimas D and the covariance stats etc. These may need to be treated differently. |
Ok, so in words, we drop a dimension if
This is nice and elegant. I like it. It also makes the interface a bit more confusing, but I say we go for it. One wrinkle is that in the future we will probably have statistics that can take variaible numbers of indexes, so that e.g. Do you have an opinion about dropping dimensions when |
They should work the same, but we might need different code to make that happen... |
Excellent!
Yes, this is already implemented --- we drop the first dimension when windows is None. See the |
Oh, duh. =) Yes, looks great! |
Codecov Report
@@ Coverage Diff @@
## master #313 +/- ##
==========================================
- Coverage 87.47% 86.44% -1.03%
==========================================
Files 19 20 +1
Lines 10282 14122 +3840
Branches 1902 2777 +875
==========================================
+ Hits 8994 12208 +3214
- Misses 764 987 +223
- Partials 524 927 +403
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #313 +/- ##
==========================================
+ Coverage 86.45% 86.47% +0.01%
==========================================
Files 20 20
Lines 14015 14035 +20
Branches 2748 2751 +3
==========================================
+ Hits 12117 12137 +20
Misses 979 979
Partials 919 919
Continue to review full report at Codecov.
|
e86a72b
to
59ceea2
Compare
Codecov Report
@@ Coverage Diff @@
## master #313 +/- ##
==========================================
+ Coverage 86.45% 86.47% +0.01%
==========================================
Files 20 20
Lines 14015 14035 +20
Branches 2748 2751 +3
==========================================
+ Hits 12117 12137 +20
Misses 979 979
Partials 919 919
Continue to review full report at Codecov.
|
e86a72b
to
59ceea2
Compare
I think we're nearly there with this @petrelharp. I've tidied up the loose ends, and written some high-level docs. I think I just need to make another pass through the methods, probably linking back to some general descipription of the behaviour or parameters one-way and k-way statistics to document the behaviour in various cases. What do you think? |
0dbd172
to
c8cb6d6
Compare
Note to self: need to update the AFS values in the tutorial, as these are now wrong (dimension stripping). Also, does this close #229? They are examples --- if somewhat boring ones. |
This looks great, especially the tutorial. I think this closes #229, also. |
c8cb6d6
to
855e827
Compare
TODO:
|
Most (all?) of those TODOs are now done over in #330; maybe you want to merge that one in here before continuing. |
855e827
to
74b1742
Compare
This implements a subset of #200. The idea is that, when we don't specify windows then it's a pain to have to write
stat[0]
to get what you actually want. This seems like a good usablilty feature to add, to me --- this is what you'd actually want the library to do, right?More concretely, take the following example:
gives
In the first case, we don't specify any windows so we're only interested in a single window and we remove the empty first dimension. In the second case, we explicitly specify the windows and so we keep the dimension in place.
I agree this is going to be a bit confusing to explain, but it seems worth it. The mistakes that are made will be less annoying, in the long run, than having to add an extra
[0]
to the end of each call when you just want one window. If, as @petrelharp says people are rarely interested in a single window, then they won't be affected by this default behaviour.The reason I'm bringing this up now is because it is a long-term decision. Because we've specified the default value for windows, we can't change the default behaviour after we ship 0.2.0. So we're making the decision here that we'll never want to do this, which to me seems a real shame as it's quite neat and elegant (IMO).
Because we don't have a default value for, e.g.,
diversity
now we don't have to worry about the other half of #200 (stripping off the empty dimensions when we're only looking at one sample set in, e.g.,diversity
until later).FWIW, the implementation is easy and it doesn't break many of the tests.