-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of variance function in stdlib_experimental_stats #144
Conversation
@fiolj Could you check the implementation for |
I"ll do, but I can't before this night or tomorrow morning
El jue., 6 de feb. de 2020 06:47, Jeremie Vandenplas <
notifications@github.com> escribió:
… @fiolj <https://github.com/fiolj> Could you check the implementation for
complex numbers? Currently there is no tests implemented, to limit the
number of tests (but I can implement some tests in a latter commit/PR).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#144?email_source=notifications&email_token=AAOTPJI45BEWWDFRHN76RZTRBPMEXA5CNFSM4KQUDHUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6SHXY#issuecomment-582820831>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOTPJMTE6L74EXCLGWIGXDRBPMEXANCNFSM4KQUDHUA>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is fine. Thanks.
In the long term, I wonder if fypp can be improved so that there is not as much repetition. There are still 8 different blocks to declare the signature of the function (real/integer and (x, mask)
/(x, dim, mask)
and mask
= scalar/array, all combinations 2x2x2=8). And even more long term, the Fortran language itself should be improved so that fypp is not needed.
I found it really good overall, but I think the variance for complex arrays is always a real number
and
and in rname("var",rank, t1, k1):
and later:
Also, if
Coincidentally, I was thinking that my solution to the ieee for complex was unnecessarily complex. Finally, Tomorrow I can look into it in more detail, but I was thinking that may be, even for real numbers we don't have to match the kind of the input. Indeed if the input is a large quantity of real(sp) it may be desirable that the variance (and mean) be calculated in double precision. I don't know if I am missing something about availability of double precision for some machines but I would think that nowadays that should not be an issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fiolj for your review.
This is indeed a good idea, much simpler than the previous implementation. I will implement it here and for
The idea was to get the same API as
I think it is the responsability of the user to check that (as it is already the case with |
From @fiolj comments, I push commits for:
Regarding the implementation, I did e.g., #:if t1[0] == 'r'
res = sum((x - mean)**2) / (n - 1._${k1}$)
#:else
res = sum(abs(x - mean)**2) / (n - 1._${k1}$)
#:endif to avoid the @fiolj could you confirm it was what you suggested, please? |
Thanks @jvdp1, that was exactly what I suggested. |
I agree with @certik here. I was thinking along the same lines. A possible solution for real/int interfaces does not seem to difficult. Something like (for instance for the function mean):
which is valid for all types (real, complex, int). For some simple functions we could write the implementation in some way that would be mostly the same also. For function
This would cut the code to a half, not solving the remaining four factor. |
Thank you for your review.
That will reduce the number of blocks, but it might give a For now, I would suggest that we merge this PR with the master. Then we can open a PR to (try to) reduce the number of blocks in |
El 9/2/20 a las 10:47, Jeremie Vandenplas escribió:
For now, I would suggest that we merge this PR with the master. Then we
can open a PR to (try to) reduce the number of blocks in |var| and
|mean| (but unfortunately, it will not reduce the number of generated
functions).
Yes, agree. It is not a completely satisfactory solution yet, and we
should try to come out with something better. I put it forward mainly to
start thinking about it.
|
src/common.fypp
Outdated
#! E.g., (:, :, :, i, :, :) | ||
#! | ||
|
||
#:def rankindice(varname, varname1, origrank, dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably rename this macro to have a more descriptive name. If it is only used to select subarrays by reducing the dimension, we could have:
#:def select_subarray(origrank, selectors)
#:assert origrank > 0
#:set seldict = dict(selectors)
#:call join_lines(joinstr=", ", prefix="(", suffix=")")
#:for i in range(1, origrank + 1)
$:seldict.get(i, ":")
#:endfor
#:endcall
#:enddef
and use it as
#! -> x(:, i, :)
x${select_subarray(3, [(2, 'i')])}$
It could also be used, if we need to reduce more than one rank, e.g.
#! -> x(:, :, i, j)
x${select_subarray(4, [(3, 'i'), (4, 'j')])}$
Also the description should be clarified a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented as suggested. The proposed macro is more general and better fit to its aim.
Could you have another review, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, too. I have only minor comments. It's a serviceable baseline implementation.
### Return value | ||
|
||
If `array` is of type `real` or `complex`, the result is of the same type as `array`. | ||
If `array` is of type `integer`, the result is of type `double precision`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will raise this issue elsewhere, but I do not agree with this API for the return type when the input is integer data. I only bring it up here because it is not quite correct to say that the return type is double precision
, when in fact the type is real(real64)
. I'm not suggesting any changes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nshaffer Thank you for your review.
I used double precision
because they are declared as dp
. But I agree they are actually real(real64)
. The issue with using real64
in the spec is that if the definition of dp
in stdlib_experimental_kinds
changes (there has been already discussions on that), then we will need to modify the spec too.
Would it be better to write "..... the result is of type dp
."?
real(${k1}$) :: n | ||
${t1}$ :: mean | ||
|
||
if (.not.optval(mask, .true.)) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird idiom to me. Here, I'd prefer the more obvious
if (present(mask)) then
if (mask .eqv. .false.) then
But this is a matter of style rather than substance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully none of both options will be needed in a future standard.
I just revisited this PR. It looks like everybody approved it. We can revisit any outstanding minor issues in a later PR. Merging, thanks @jvdp1!. |
Based on #137
With this PR, I propose to add a function for computing the variance of elements in arrays using the same API as
stdlib::mean
. The used algorithm is a two-pass algorithm (as discussed in #3).Based on #3 and #114, I avoided to use the function
mean
(and to create a new functioncenter
for doingx - mean
), to avoid loss in performance.select case
statement:and
I probably miss something that should be obvious!
Another issue is the compilation time needed with the Makefiles in the CI!
Note: Each new statistical function in
stdlib_stats
will potentially includes 600 additional functions. It really illustrates the issue of having no templates.