-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLE best-fit parameter disagreeing with pyhf
#345
Comments
@Moelf can you try with a Flat prior for theta? |
changing prior to: prior = BAT.NamedTupleDist(
μ = Uniform(0, 4),
θ = Uniform(-3, 3)
) fixed it: μ = 1.3064153351104848, θ = -0.06050344150637746 that's pretty unexpected, shouldn't the MLE() fit mostly not care about prior (cuz we're not sampling)? |
for comparison, if we use using Turing
function nll(μ, θ)
variations = [1,2,3,3]
v_data = [34,22,13,11] # observed data
v_sig = [2,3,4,5] # signal
v_bg = [30,19,9,4] # BKG
bg = @. v_bg *(1 + θ*variations/v_bg)
k = μ*v_sig + bg
n_logfac = map(x->sum(log, 1:x), v_data)
NM = Normal(0, 1)
sum(@. v_data * log(k) - n_logfac - k) + logpdf(NM, θ)
end
@model function binned_f(bincounts)
μ ~ Uniform(0, 6)
θ ~ Normal(0, 1)
Turing.@addlogprob! nll(μ, θ)
end
chain_f = optimize(binned_f(v_data), MLE())
ModeResult with maximized lp of -10.51
2-element Named Vector{Float64}
A │
───┼───────────
:μ │ 1.30648
:θ │ -0.0605151 |
MLE and posterior mode are only equivalent for flat priors |
I understand the posterior would deviate from MLE if prior is not flat, but I thought for the MLE it shouldn't care much if prior is flat or not. Turing gives same results regardless of: θ ~ Normal()
# or
θ ~ Uniform(-1, 1) |
Afaik bat gives you the posterior mode in any case , I don't know if there is a pure MLE api in BAT |
Ie findmode gives you maximum a posteriori estimates (MAP) not MLE |
that would make sense then, so I guess as advertised Then for doing Frequentist Procedure, I would use Turing.jl + pyhf or (PyCall +) pyhf for now! |
It doesn't sample, but yes, it find the global maximum (at least it tries to) of the posterior density. You can use BAT and ValueShape tools to do a MLE: using DensityInterface, InverseFunctions, ValueShapes, Optim
posterior = PosteriorDensity(likelihood, prior)
vshp = varshape(posterior.prior)
x_init = inverse(vshp)(rand(posterior.prior))
neg_unshaped_likelihood = BAT.negative(logdensityof(posterior.likelihood) ∘ vshp)
r = Optim.optimize(neg_unshaped_likelihood, x_init)
shaped_result = vshp(Optim.minimizer(r)) It's not really in line with BAT's philisophy as a Bayesian package, but we could add an MLE function to BAT, e.g. to enable comparison with frequentist results. The prior would then be used to inform the optimizer about the shape of the space, without influencing the location of the MLE. |
I think it's fine to require folks to use flat priors if they want to compare to MLE |
For posterity (moral of the story: learn more stats kids...), the real cause of disagreement is because the Frequentist likelihood and Bayesian likelihood should not equal to begin with: julia> function baye_nll(v)
variations = [1,2,3,3]
v_data = [34,22,13,11] # observed data
v_sig = [2,3,4,5] # signal
v_bg = [30,19,9,4] # BKG
(;μ, θ) = v
bg = @. v_bg *(1 + θ*variations/v_bg)
k = μ*v_sig + bg
n_logfac = map(x->sum(log, 1:x), v_data)
NM = Normal(0, 1)
LogDVal(sum(@. v_data * log(k) - n_logfac - k))
end
julia> posterior_BAT = PosteriorDensity(baye_nll, prior);
julia> best_fit_BAT = bat_findmode(posterior_BAT).result
[ Info: Using transform algorithm DensityIdentityTransform()
ShapedAsNT((μ = 1.306479238048563, θ = -0.06063208083677547)) notice the deliberate omission of |
It depends a bit which part you model with data and what you leave to the prior. The measurements in the constraint terms often represent actual measurements done by eg the collaboration that should be considered and not overridden by the prior. One way is either to have a first Bayesian step to derive a posterior on the NP given this aux measurement or alternatively model this in the likelihood. |
I guess it really depends on what you consider your "prior knowledge to be" - this can, of course, be a philosophical question. :-) |
the prior in LHC experiments is what Combined Performance groups measured. The CP tools give you the expected bin counts at +/- 1 sigma (of a nuisance parameter), which is why all the parameters have a prior of I think now I feel really weird we do this, namely, adding constraints to likelihood to mimic Bayesian prior (in Bayesian formulation this is naturally the result) |
We can discuss offline but I would posit it's natural either way At the Core the CP groups provide measurements of actual data not beliefs. You can either use that to update your prior on nuisance parameter that later then you use for your main measurement or you skip that step and model the joint measurement of the cp measurement and you main analysis I wouldn't say one is more natural than the other |
Related to:
Result
The text was updated successfully, but these errors were encountered: