-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Probability of Direction (pd) #428
[Feature] Probability of Direction (pd) #428
Conversation
Thank you for the PR! I'd suggest, however, opening a PR to MCMCDiagnosticTools (first) - most (all?) diagnostics specialized in MCMCChains for |
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## master #428 +/- ##
==========================================
- Coverage 84.62% 84.07% -0.56%
==========================================
Files 20 21 +1
Lines 1067 1074 +7
==========================================
Hits 903 903
- Misses 164 171 +7
☔ View full report in Codecov by Sentry. |
Should I do that even though it's not technically a "diagnostic" index? It's more a summary index, similar to HPD. If so, could you point me to an example of a function implemented there that I can follow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to keep this here for reasons that @DominiqueMakowski mentioned.
I think MCMCDiagnosticTools would be preferable because it can be re-used easily by ArviZ, MCMCChains, and other array-based chain structures. The implementation would be even simpler than in this PR as you don't have to deal with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments that I think would be good to address, regardless of where the functionality is added.
Moreover, this would need a few tests.
@@ -0,0 +1,19 @@ | |||
function p_direction(x::Vector{Float64}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be problematic that this is only defined for Vector
- that means e.g. that it won't work for slices of rows or columns.
@@ -0,0 +1,19 @@ | |||
function p_direction(x::Vector{Float64}) | |||
return maximum([sum(x .> 0) ./ length(x), sum(x .< 0) ./ length(x)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this creates a few unnecessary allocations (the array inside of maximum
, x .> 0
, and x .< 0
). This could be fixed e.g. by
return maximum([sum(x .> 0) ./ length(x), sum(x .< 0) ./ length(x)]) | |
ntotal = 0 | |
npositive = 0 | |
nnegative = 0 | |
for xi in x | |
if xi > 0 | |
npositive += 1 | |
elseif xi < 0 | |
nnegative += 1 | |
end | |
ntotal += 1 | |
end | |
return max(npositive, nnegative) / ntotal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in disbelief that this code is more efficient than the above one-liner, but I checked with @btime
and indeed... 🤯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one-liner creates quite a few allocations (zero allocations in the alternative) and it iterates over the array at least twice (one loop in the suggestion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in disbelief that this code is more efficient than the above one-liner, but I checked with
@btime
and indeed... 🤯
If I had a nickle for each time I thought I had written the best code and then David has a better solution -- I'd be so rich
# Store everything. | ||
funs = [p_direction] | ||
func_names = [:p_direction] | ||
|
||
# Summarize. | ||
summary_df = summarize( | ||
chains, funs...; | ||
func_names=func_names, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid alloctions:
# Store everything. | |
funs = [p_direction] | |
func_names = [:p_direction] | |
# Summarize. | |
summary_df = summarize( | |
chains, funs...; | |
func_names=func_names, | |
# Summarize. | |
summary_df = summarize( | |
chains, p_direction; | |
func_names=[:p_direction], |
Closed in favour of arviz-devs/PosteriorStats.jl#10 |
Indices of effect significance/existence computed from posterior distributions are very common in some fields. This feature adds the probability of direction,
p_direction()
, a Bayesian equivalent of the p-value.I followed the implementation of
mean()
by using thesummarize()
workflow.