Correctly recompute PU weights in case of an upper bound #87

ktht · 2018-05-11T17:17:14Z

The current code has a severe bug that fills the PU weights with nan-s (not a numbers) if there exists a PU weight that needs to be cropped:

nanoAOD-tools/src/WeightCalculatorFromHistogram.cc

Lines 103 to 104 in 2012f6f

 for(int i=0; i<(int)weights.size(); ++i) cropped.push_back(std::min(maxw,weights[i])); 

 float shift = checkIntegral(cropped,weights);

nanoAOD-tools/src/WeightCalculatorFromHistogram.cc

Lines 92 to 94 in 2012f6f

 for(int i=0; i<(int)wgt1.size(); ++i) { 

 myint += wgt1[i]*refvals_[i]; 

 refint += wgt2[i]*refvals_[i];

The problem is that when the while loop goes beyond the first iteration, N (= the number of bins in MC histogram) more values will be added to cropped variable. Then checkIntegral loops over the size of cropped variable and the running index i is used to retrieve the number of events (refvals_), but this obviously goes beyond the vector boundaries. The computed integrals and final weights become just pure garbage. Resetting cropped in every iteration didn't work either.

For these reasons I and @veelken decided to redo this logic by scaling the largest weight and adjusting the remaining weights until the largest weight approaches to hardmax (defaults to 3.) (or, equivalently, until the new integral is close enough to the original one):

limit the weights to hardmax;
compute the ratio (or scale factor) of integrals before and after cropping;
multiply the weights with the scale factor;
repeat 1.-3. until the scale factor approaches 1 within maxshift (defaults to 0.0025).
return the latest adjusted values as new weights.

Below is an extreme illustration of how the algorithm works:

hardmax = 3.0000 ; maxshift = 0.0025
initial weights = [ 5.000, 1.000 ]
event count     = [ 1.000, 2.000 ]
reference integral = 7.0000
#1 cropped weights = [ 3.0000, 1.0000 ] ; cropped integral = 5.0000 ; sf = 1.4000 > 1.0025; current weights = [ 4.2000, 1.4000 ]
#2 cropped weights = [ 3.0000, 1.4000 ] ; cropped integral = 5.8000 ; sf = 1.2069 > 1.0025; current weights = [ 3.6207, 1.6897 ]
#3 cropped weights = [ 3.0000, 1.6897 ] ; cropped integral = 6.3793 ; sf = 1.0973 > 1.0025; current weights = [ 3.2919, 1.8541 ]
#4 cropped weights = [ 3.0000, 1.8541 ] ; cropped integral = 6.7081 ; sf = 1.0435 > 1.0025; current weights = [ 3.1305, 1.9347 ]
#5 cropped weights = [ 3.0000, 1.9347 ] ; cropped integral = 6.8695 ; sf = 1.0190 > 1.0025; current weights = [ 3.0570, 1.9715 ]
#6 cropped weights = [ 3.0000, 1.9715 ] ; cropped integral = 6.9430 ; sf = 1.0082 > 1.0025; current weights = [ 3.0246, 1.9877 ]
#7 cropped weights = [ 3.0000, 1.9877 ] ; cropped integral = 6.9754 ; sf = 1.0035 > 1.0025; current weights = [ 3.0106, 1.9947 ]
#8 cropped weights = [ 3.0000, 1.9947 ] ; cropped integral = 6.9894 ; sf = 1.0015 < 1.0025; current weights = [ 3.0045, 1.9977 ]
=> final weights = [ 3.0045, 1.9977 ]

In practice large PU weights are assigned to a handful of events and the above reweighting procedure basically won't even affect any other weight if the input Ntuple contains a reasonable number of events (which it should if we want to get an accurate PU profile).

arizzi · 2019-09-12T07:33:13Z

@fgolf @peruzzim can we decide on this PR ?
A fix for the problem is anyhow needed and people (including me) keep rediscovering the nan issue.

I'm not sure if the logic should be changes as proposed here, or simply the cropped vector should be reset at each iteration in the while

AndrewLevin · 2019-10-16T12:23:45Z

@fgolf @peruzzim can we decide on this PR ?
A fix for the problem is anyhow needed and people (including me) keep rediscovering the nan issue.

I'm not sure if the logic should be changes as proposed here, or simply the cropped vector should be reset at each iteration in the while

I did exactly that in #197 before I saw this pull request. I am not sure why it is stated in #87 (comment) that "Resetting cropped in every iteration didn't work either." That seemed to work for me.

ktht · 2019-10-16T14:15:52Z

Too much time has passed since I submitted the PR, so I cannot remember what I tried at the time. It does look like resetting cropped variable fixes the nan problem, but the overall logic still seems a bit wonky to me, because it's not guaranteed that all weights are cropped to hardmax. At least this PR does what is actually intended. If the same example from my original post is plugged into the solution that is currently implemented in the master branch (plus #197), the results are:

hardmax = 3.0000 ; maxshift = 0.0025
initial weights = [ 5.0000, 1.0000 ]
event count     = [ 1.0000, 2.0000 ]
reference integral = 7.000000
#1 cropped weights = [ 5.0000, 1.0000 ] ; cropped integral = 7.0000 ; sf = 0.0000 <= 0.0025; current weights = [ 5.0000, 1.0000 ]
#2 cropped weights = [ 4.7500, 1.0000 ] ; cropped integral = 6.7500 ; sf = 0.0357 >  0.0025; current weights = [ 4.5804, 0.9643 ]
=> final weights = [ 5.0000, 1.0000 ]

So, the final weights are not actually cropped to 3.0 as one would expect.

AndrewLevin · 2019-10-16T14:53:51Z

Well, the maxshift could be viewed as a mechanism to "not touch" pathological cases like the one you found, so in that sense it could be viewed as working correctly.

fixing typo and adding electron cases

ktht changed the title ~~Correctly recompute PU weights if case of an upper bound~~ Correctly recompute PU weights in case of an upper bound May 14, 2018

arizzi requested a review from emanueledimarco May 24, 2018 14:13

ktht mentioned this pull request Mar 14, 2019

WeightCalculatorFromHistogram.cc gives NaN #119

Open

fgolf mentioned this pull request Oct 17, 2019

Fix bug in large pileup weight handling #197

Merged

Correctly limit the PU weights

6e8c0d5

ktht force-pushed the puCropFix branch from 4cafc87 to 6e8c0d5 Compare October 17, 2019 21:23

bainbrid pushed a commit to bainbrid/nanoAOD-tools that referenced this pull request Sep 19, 2022

Merge pull request cms-nanoAOD#87 from hayfasfar/HNL

0c7558d

fixing typo and adding electron cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly recompute PU weights in case of an upper bound #87

Correctly recompute PU weights in case of an upper bound #87

ktht commented May 11, 2018

arizzi commented Sep 12, 2019

AndrewLevin commented Oct 16, 2019

ktht commented Oct 16, 2019

AndrewLevin commented Oct 16, 2019

	for(int i=0; i<(int)weights.size(); ++i) cropped.push_back(std::min(maxw,weights[i]));
	float shift = checkIntegral(cropped,weights);

	for(int i=0; i<(int)wgt1.size(); ++i) {
	myint += wgt1[i]*refvals_[i];
	refint += wgt2[i]*refvals_[i];

Correctly recompute PU weights in case of an upper bound #87

Are you sure you want to change the base?

Correctly recompute PU weights in case of an upper bound #87

Conversation

ktht commented May 11, 2018

arizzi commented Sep 12, 2019

AndrewLevin commented Oct 16, 2019

ktht commented Oct 16, 2019

AndrewLevin commented Oct 16, 2019