[RF] Make NLL creation in new BatchMode feature complete #11622

guitargeek · 2022-10-21T12:56:01Z

This PR applies many fixes to the new RooFit BatchMode to make sure that the NLL is created correctly, respecting all the command arguments of RooAbsPdf::fitTo() that were so far not considered.

More detail in the individual commit descriptions.

The `RooAbsData::split()` methods had two overloads. One for general categories, and one for splitting according to the components of a RooSimultaneous. The code in these two functions was largely duplicated. This commit factors out common parts to anonymous functions, such that no code is duplicated. This improvement is done now, as I plan to use the same `split()` function also in the new BatchMode.

The `BatchMode()` so far did not support fitting to datasets that used the `RooCompositeDataStore`, which is commonly used for simultaneous datasets. To support this, the new BatchMode now also used `RooAbsData::split()` to split the data accorting to the category in simultaneous fits, because the datasets that are resulting from the split are using the `RooVectorDataStore` again.

There is one unit test that covers a problem that is specific to the old `RooNLLVar`, so when creating the NLL in that test the BatchMode should be explicitely switched off. Like this, the test would still work if the default backend would be changed to the BatchMode.

For some reason, the `nCPU` configuration argument for the test statistics was propagated differently to the component test statistics for ranged and non-ranged fits. In the range case, there was a sign flip if `_mpinterl` was `true`, but there is not reason to have this different multiprocessing logic for ranged fits. It only adds a further code branching that is not convenient to consider in code refactoring. Therefore, this commit suggests to treat the `nCPU` argument the same for ranged and non-ranged fits.

When `SplitRange()` is used, each subrange in a ranged fit needs to be suffixed with the category label. For example, `r1,r2` becomes `r1_cat0,r1_cat0`. Since this has also to be done in the new test statistics and the new BatchMode, the logic to do so is factored out into a helper function that will also be used by the BatchMode in the next commit.

The RooAbsPdf::fitTo() function has a `SplitRange()` command argument that was so far ignored by the new BatchMode. The effect is that for each component, the fit range is suffixed by the category label. Testing this functionality is covered by the `MultiRangeFitWithSplitRange` test in `testRooSimultaneous`, which is changed in this commit to also validate the new BatchMode.

In simultaneous fits, the combined PDF should be correctly normalized over the union of the components, weighted by the component data weight. This was not implemented in the new BatchMode so far and is added in this commit.

It is very useful for debugging to inspect the state of the `RooFitDriver`. For this, a new function is added that prints a nice table like this: ``` --- RooFit BatchMode evaluation --- -----------------------------------------------------------------------------| | Idx | Name | Class | Size | From Data | 1st value | -----------------------------------------------------------------------------| | 0 | _weight_sumW2 | RooRealVar | 1 | 1 | 1 | | 1 | _weight | RooRealVar | 1 | 1 | 1 | | 2 | b | RooRealVar | 1 | 0 | -0.5 | | 3 | x | RooRealVar | 1000 | 1 | 0.437143 | | 4 | c2 | RooGenericPdf | 1000 | 0 | 0.803666 | | 5 | c2_Int[x] | RooRealIntegral | 1 | 0 | 1.83583 | | 6 | c2_over_c2_Int[x] | RooAbsPdf | 1000 | 0 | 0.437767 | | 7 | a | RooRealVar | 1 | 0 | -0.2 | | 8 | c1 | RooGenericPdf | 1000 | 0 | 0.916284 | | 9 | c1_Int[x] | RooRealIntegral | 1 | 0 | 3.1606 | | 10 | c1_over_c1_Int[x] | RooAbsPdf | 1000 | 0 | 0.289908 | | 11 | mypdf | RooProdPdf | 1000 | 0 | 0.126912 | | 12 | RooNLLVarNew | RooAbsReal | 1 | 0 | 2633.6 | | 13 | nll_mypdf_mypdfData | RooAddition | 1 | 0 | 2633.6 | -----------------------------------------------------------------------------| ``` For any likelihood that is created with the new BatchMode, just call `Print("v")` on it and you will get this. This is meant for debugging and doesn't change any functionality.

The new BatchMode always had some trouble in simultaneous fits when the same argument was reused in different components, because intermediate values are stored by argument name and there are different results for each component. To solve this problem, the names of all arguments in a given component are now prefixed by the category label surrounded by underscores. Furthermore, the `getParameters` function for objects created with the new BatchMode is improved such that observables renamed with this prefixing are not confused with parameters. Closes root-project#10991.

To be consistent with the old test statistics classes, empty bins should not be skipped if the binned likelihood optimization is enabled.

The old test statistics did not apply the offset subtraction if `RooAbsReal::hideOffset()` was `true`, and the new BatchMode should do the same.

phsft-bot · 2022-10-26T09:06:46Z

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

lmoneta

LGTM!
Thank you Jonas for implementing support for all the features in the new BatchMode!

phsft-bot · 2022-10-26T11:16:17Z

Build failed on mac1015/cxx17.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

projectroot.net.netxng.test.gtest_net_netxng_test_RRawFileNetXNG

guitargeek added improvement in:RooFit/RooStats labels Oct 21, 2022

guitargeek self-assigned this Oct 21, 2022

guitargeek force-pushed the datasplitting branch 4 times, most recently from f53f6f2 to d6043c3 Compare October 22, 2022 00:08

root-project deleted a comment from phsft-bot Oct 22, 2022

guitargeek changed the title ~~[RF] Reduce code duplication in RooAbsData::split()~~ [RF] More support for simultaneous fits in new BatchMode Oct 22, 2022

guitargeek force-pushed the datasplitting branch 2 times, most recently from 86750d0 to 6a5d2f1 Compare October 25, 2022 10:50

root-project deleted a comment from phsft-bot Oct 25, 2022

root-project deleted a comment from phsft-bot Oct 26, 2022

guitargeek added 11 commits October 26, 2022 10:55

[RF] Correctly normalize simultaneous PDFs in BatchMode

372e4f3

In simultaneous fits, the combined PDF should be correctly normalized over the union of the components, weighted by the component data weight. This was not implemented in the new BatchMode so far and is added in this commit.

[RF] Don't ignore empty bins for binned PDF in BatchMode

be2dffe

To be consistent with the old test statistics classes, empty bins should not be skipped if the binned likelihood optimization is enabled.

[RF] Only subtract NLL offset when not hidden in new BatchMode

aa51f65

The old test statistics did not apply the offset subtraction if `RooAbsReal::hideOffset()` was `true`, and the new BatchMode should do the same.

guitargeek force-pushed the datasplitting branch from f879f48 to aa51f65 Compare October 26, 2022 09:06

guitargeek marked this pull request as ready for review October 26, 2022 09:06

guitargeek requested review from lmoneta and hageboeck as code owners October 26, 2022 09:06

lmoneta approved these changes Oct 26, 2022

View reviewed changes

guitargeek linked an issue Oct 26, 2022 that may be closed by this pull request

[RF] Batch mode with RooSimultaneous introduces spurious parameters #10991

Closed

1 task

guitargeek mentioned this pull request Oct 26, 2022

[RF] RooFit open issues and backports for 6.26.12 release #11534

Closed

13 tasks

guitargeek merged commit 36d6a97 into root-project:master Oct 26, 2022

guitargeek deleted the datasplitting branch October 26, 2022 10:16

guitargeek mentioned this pull request Nov 19, 2022

[v626][RF] Backports of RooFit PRs to v6-26-00-patches: Part 24 #11748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RF] Make NLL creation in new BatchMode feature complete #11622

[RF] Make NLL creation in new BatchMode feature complete #11622

guitargeek commented Oct 21, 2022 •

edited

Loading

phsft-bot commented Oct 26, 2022

lmoneta left a comment

phsft-bot commented Oct 26, 2022

[RF] Make NLL creation in new BatchMode feature complete #11622

[RF] Make NLL creation in new BatchMode feature complete #11622

Conversation

guitargeek commented Oct 21, 2022 • edited Loading

phsft-bot commented Oct 26, 2022

lmoneta left a comment

Choose a reason for hiding this comment

phsft-bot commented Oct 26, 2022

Failing tests:

guitargeek commented Oct 21, 2022 •

edited

Loading