-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RF] Make NLL creation in new BatchMode feature complete #11622
Conversation
f53f6f2
to
d6043c3
Compare
86750d0
to
6a5d2f1
Compare
The `RooAbsData::split()` methods had two overloads. One for general categories, and one for splitting according to the components of a RooSimultaneous. The code in these two functions was largely duplicated. This commit factors out common parts to anonymous functions, such that no code is duplicated. This improvement is done now, as I plan to use the same `split()` function also in the new BatchMode.
The `BatchMode()` so far did not support fitting to datasets that used the `RooCompositeDataStore`, which is commonly used for simultaneous datasets. To support this, the new BatchMode now also used `RooAbsData::split()` to split the data accorting to the category in simultaneous fits, because the datasets that are resulting from the split are using the `RooVectorDataStore` again.
There is one unit test that covers a problem that is specific to the old `RooNLLVar`, so when creating the NLL in that test the BatchMode should be explicitely switched off. Like this, the test would still work if the default backend would be changed to the BatchMode.
For some reason, the `nCPU` configuration argument for the test statistics was propagated differently to the component test statistics for ranged and non-ranged fits. In the range case, there was a sign flip if `_mpinterl` was `true`, but there is not reason to have this different multiprocessing logic for ranged fits. It only adds a further code branching that is not convenient to consider in code refactoring. Therefore, this commit suggests to treat the `nCPU` argument the same for ranged and non-ranged fits.
When `SplitRange()` is used, each subrange in a ranged fit needs to be suffixed with the category label. For example, `r1,r2` becomes `r1_cat0,r1_cat0`. Since this has also to be done in the new test statistics and the new BatchMode, the logic to do so is factored out into a helper function that will also be used by the BatchMode in the next commit.
The RooAbsPdf::fitTo() function has a `SplitRange()` command argument that was so far ignored by the new BatchMode. The effect is that for each component, the fit range is suffixed by the category label. Testing this functionality is covered by the `MultiRangeFitWithSplitRange` test in `testRooSimultaneous`, which is changed in this commit to also validate the new BatchMode.
In simultaneous fits, the combined PDF should be correctly normalized over the union of the components, weighted by the component data weight. This was not implemented in the new BatchMode so far and is added in this commit.
It is very useful for debugging to inspect the state of the `RooFitDriver`. For this, a new function is added that prints a nice table like this: ``` --- RooFit BatchMode evaluation --- -----------------------------------------------------------------------------| | Idx | Name | Class | Size | From Data | 1st value | -----------------------------------------------------------------------------| | 0 | _weight_sumW2 | RooRealVar | 1 | 1 | 1 | | 1 | _weight | RooRealVar | 1 | 1 | 1 | | 2 | b | RooRealVar | 1 | 0 | -0.5 | | 3 | x | RooRealVar | 1000 | 1 | 0.437143 | | 4 | c2 | RooGenericPdf | 1000 | 0 | 0.803666 | | 5 | c2_Int[x] | RooRealIntegral | 1 | 0 | 1.83583 | | 6 | c2_over_c2_Int[x] | RooAbsPdf | 1000 | 0 | 0.437767 | | 7 | a | RooRealVar | 1 | 0 | -0.2 | | 8 | c1 | RooGenericPdf | 1000 | 0 | 0.916284 | | 9 | c1_Int[x] | RooRealIntegral | 1 | 0 | 3.1606 | | 10 | c1_over_c1_Int[x] | RooAbsPdf | 1000 | 0 | 0.289908 | | 11 | mypdf | RooProdPdf | 1000 | 0 | 0.126912 | | 12 | RooNLLVarNew | RooAbsReal | 1 | 0 | 2633.6 | | 13 | nll_mypdf_mypdfData | RooAddition | 1 | 0 | 2633.6 | -----------------------------------------------------------------------------| ``` For any likelihood that is created with the new BatchMode, just call `Print("v")` on it and you will get this. This is meant for debugging and doesn't change any functionality.
The new BatchMode always had some trouble in simultaneous fits when the same argument was reused in different components, because intermediate values are stored by argument name and there are different results for each component. To solve this problem, the names of all arguments in a given component are now prefixed by the category label surrounded by underscores. Furthermore, the `getParameters` function for objects created with the new BatchMode is improved such that observables renamed with this prefixing are not confused with parameters. Closes root-project#10991.
To be consistent with the old test statistics classes, empty bins should not be skipped if the binned likelihood optimization is enabled.
The old test statistics did not apply the offset subtraction if `RooAbsReal::hideOffset()` was `true`, and the new BatchMode should do the same.
f879f48
to
aa51f65
Compare
Starting build on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you Jonas for implementing support for all the features in the new BatchMode!
Build failed on mac1015/cxx17. Failing tests: |
This PR applies many fixes to the new RooFit BatchMode to make sure that the NLL is created correctly, respecting all the command arguments of
RooAbsPdf::fitTo()
that were so far not considered.More detail in the individual commit descriptions.