Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly recompute PU weights in case of an upper bound #87

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ktht
Copy link
Contributor

@ktht ktht commented May 11, 2018

The current code has a severe bug that fills the PU weights with nan-s (not a numbers) if there exists a PU weight that needs to be cropped:

for(int i=0; i<(int)weights.size(); ++i) cropped.push_back(std::min(maxw,weights[i]));
float shift = checkIntegral(cropped,weights);

for(int i=0; i<(int)wgt1.size(); ++i) {
myint += wgt1[i]*refvals_[i];
refint += wgt2[i]*refvals_[i];

The problem is that when the while loop goes beyond the first iteration, N (= the number of bins in MC histogram) more values will be added to cropped variable. Then checkIntegral loops over the size of cropped variable and the running index i is used to retrieve the number of events (refvals_), but this obviously goes beyond the vector boundaries. The computed integrals and final weights become just pure garbage. Resetting cropped in every iteration didn't work either.

For these reasons I and @veelken decided to redo this logic by scaling the largest weight and adjusting the remaining weights until the largest weight approaches to hardmax (defaults to 3.) (or, equivalently, until the new integral is close enough to the original one):

  1. limit the weights to hardmax;
  2. compute the ratio (or scale factor) of integrals before and after cropping;
  3. multiply the weights with the scale factor;
  4. repeat 1.-3. until the scale factor approaches 1 within maxshift (defaults to 0.0025).
  5. return the latest adjusted values as new weights.

Below is an extreme illustration of how the algorithm works:

hardmax = 3.0000 ; maxshift = 0.0025
initial weights = [ 5.000, 1.000 ]
event count     = [ 1.000, 2.000 ]
reference integral = 7.0000
#1 cropped weights = [ 3.0000, 1.0000 ] ; cropped integral = 5.0000 ; sf = 1.4000 > 1.0025; current weights = [ 4.2000, 1.4000 ]
#2 cropped weights = [ 3.0000, 1.4000 ] ; cropped integral = 5.8000 ; sf = 1.2069 > 1.0025; current weights = [ 3.6207, 1.6897 ]
#3 cropped weights = [ 3.0000, 1.6897 ] ; cropped integral = 6.3793 ; sf = 1.0973 > 1.0025; current weights = [ 3.2919, 1.8541 ]
#4 cropped weights = [ 3.0000, 1.8541 ] ; cropped integral = 6.7081 ; sf = 1.0435 > 1.0025; current weights = [ 3.1305, 1.9347 ]
#5 cropped weights = [ 3.0000, 1.9347 ] ; cropped integral = 6.8695 ; sf = 1.0190 > 1.0025; current weights = [ 3.0570, 1.9715 ]
#6 cropped weights = [ 3.0000, 1.9715 ] ; cropped integral = 6.9430 ; sf = 1.0082 > 1.0025; current weights = [ 3.0246, 1.9877 ]
#7 cropped weights = [ 3.0000, 1.9877 ] ; cropped integral = 6.9754 ; sf = 1.0035 > 1.0025; current weights = [ 3.0106, 1.9947 ]
#8 cropped weights = [ 3.0000, 1.9947 ] ; cropped integral = 6.9894 ; sf = 1.0015 < 1.0025; current weights = [ 3.0045, 1.9977 ]
=> final weights = [ 3.0045, 1.9977 ]

In practice large PU weights are assigned to a handful of events and the above reweighting procedure basically won't even affect any other weight if the input Ntuple contains a reasonable number of events (which it should if we want to get an accurate PU profile).

@ktht ktht changed the title Correctly recompute PU weights if case of an upper bound Correctly recompute PU weights in case of an upper bound May 14, 2018
@arizzi
Copy link
Contributor

arizzi commented Sep 12, 2019

@fgolf @peruzzim can we decide on this PR ?
A fix for the problem is anyhow needed and people (including me) keep rediscovering the nan issue.

I'm not sure if the logic should be changes as proposed here, or simply the cropped vector should be reset at each iteration in the while

@AndrewLevin
Copy link
Contributor

@fgolf @peruzzim can we decide on this PR ?
A fix for the problem is anyhow needed and people (including me) keep rediscovering the nan issue.

I'm not sure if the logic should be changes as proposed here, or simply the cropped vector should be reset at each iteration in the while

I did exactly that in #197 before I saw this pull request. I am not sure why it is stated in #87 (comment) that "Resetting cropped in every iteration didn't work either." That seemed to work for me.

@ktht
Copy link
Contributor Author

ktht commented Oct 16, 2019

Too much time has passed since I submitted the PR, so I cannot remember what I tried at the time. It does look like resetting cropped variable fixes the nan problem, but the overall logic still seems a bit wonky to me, because it's not guaranteed that all weights are cropped to hardmax. At least this PR does what is actually intended. If the same example from my original post is plugged into the solution that is currently implemented in the master branch (plus #197), the results are:

hardmax = 3.0000 ; maxshift = 0.0025
initial weights = [ 5.0000, 1.0000 ]
event count     = [ 1.0000, 2.0000 ]
reference integral = 7.000000
#1 cropped weights = [ 5.0000, 1.0000 ] ; cropped integral = 7.0000 ; sf = 0.0000 <= 0.0025; current weights = [ 5.0000, 1.0000 ]
#2 cropped weights = [ 4.7500, 1.0000 ] ; cropped integral = 6.7500 ; sf = 0.0357 >  0.0025; current weights = [ 4.5804, 0.9643 ]
=> final weights = [ 5.0000, 1.0000 ]

So, the final weights are not actually cropped to 3.0 as one would expect.

@AndrewLevin
Copy link
Contributor

Well, the maxshift could be viewed as a mechanism to "not touch" pathological cases like the one you found, so in that sense it could be viewed as working correctly.

bainbrid pushed a commit to bainbrid/nanoAOD-tools that referenced this pull request Sep 19, 2022
fixing typo and adding electron cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants