Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] Make NLL creation in new BatchMode feature complete #11622

Merged
merged 11 commits into from
Oct 26, 2022

Conversation

guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Oct 21, 2022

This PR applies many fixes to the new RooFit BatchMode to make sure that the NLL is created correctly, respecting all the command arguments of RooAbsPdf::fitTo() that were so far not considered.

More detail in the individual commit descriptions.

@guitargeek guitargeek self-assigned this Oct 21, 2022
@guitargeek guitargeek force-pushed the datasplitting branch 4 times, most recently from f53f6f2 to d6043c3 Compare October 22, 2022 00:08
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@root-project root-project deleted a comment from phsft-bot Oct 22, 2022
@guitargeek guitargeek changed the title [RF] Reduce code duplication in RooAbsData::split() [RF] More support for simultaneous fits in new BatchMode Oct 22, 2022
@guitargeek guitargeek force-pushed the datasplitting branch 2 times, most recently from 86750d0 to 6a5d2f1 Compare October 25, 2022 10:50
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 25, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
@root-project root-project deleted a comment from phsft-bot Oct 26, 2022
The `RooAbsData::split()` methods had two overloads. One for general
categories, and one for splitting according to the components of a
RooSimultaneous.

The code in these two functions was largely duplicated. This commit
factors out common parts to anonymous functions, such that no code is
duplicated.

This improvement is done now, as I plan to use the same `split()`
function also in the new BatchMode.
The `BatchMode()` so far did not support fitting to datasets that used
the `RooCompositeDataStore`, which is commonly used for simultaneous
datasets.

To support this, the new BatchMode now also used `RooAbsData::split()`
to split the data accorting to the category in simultaneous fits,
because the datasets that are resulting from the split are using the
`RooVectorDataStore` again.
There is one unit test that covers a problem that is specific to the old
`RooNLLVar`, so when creating the NLL in that test the BatchMode should
be explicitely switched off. Like this, the test would still work if the
default backend would be changed to the BatchMode.
For some reason, the `nCPU` configuration argument for the test
statistics was propagated differently to the component test statistics
for ranged and non-ranged fits.

In the range case, there was a sign flip if `_mpinterl` was `true`, but
there is not reason to have this different multiprocessing logic for
ranged fits. It only adds a further code branching that is not
convenient to consider in code refactoring.

Therefore, this commit suggests to treat the `nCPU` argument the same
for ranged and non-ranged fits.
When `SplitRange()` is used, each subrange in a ranged fit needs to be
suffixed with the category label.

For example, `r1,r2` becomes `r1_cat0,r1_cat0`.

Since this has also to be done in the new test statistics and the new
BatchMode, the logic to do so is factored out into a helper function
that will also be used by the BatchMode in the next commit.
The RooAbsPdf::fitTo() function has a `SplitRange()` command argument
that was so far ignored by the new BatchMode. The effect is that for
each component, the fit range is suffixed by the category label.

Testing this functionality is covered by the
`MultiRangeFitWithSplitRange` test in `testRooSimultaneous`, which is
changed in this commit to also validate the new BatchMode.
In simultaneous fits, the combined PDF should be correctly normalized
over the union of the components, weighted by the component data weight.

This was not implemented in the new BatchMode so far and is added in
this commit.
It is very useful for debugging to inspect the state of the
`RooFitDriver`. For this, a new function is added that prints a nice
table like this:

```
--- RooFit BatchMode evaluation ---
-----------------------------------------------------------------------------|
| Idx | Name                | Class           | Size | From Data | 1st value |
-----------------------------------------------------------------------------|
| 0   | _weight_sumW2       | RooRealVar      | 1    | 1         | 1         |
| 1   | _weight             | RooRealVar      | 1    | 1         | 1         |
| 2   | b                   | RooRealVar      | 1    | 0         | -0.5      |
| 3   | x                   | RooRealVar      | 1000 | 1         | 0.437143  |
| 4   | c2                  | RooGenericPdf   | 1000 | 0         | 0.803666  |
| 5   | c2_Int[x]           | RooRealIntegral | 1    | 0         | 1.83583   |
| 6   | c2_over_c2_Int[x]   | RooAbsPdf       | 1000 | 0         | 0.437767  |
| 7   | a                   | RooRealVar      | 1    | 0         | -0.2      |
| 8   | c1                  | RooGenericPdf   | 1000 | 0         | 0.916284  |
| 9   | c1_Int[x]           | RooRealIntegral | 1    | 0         | 3.1606    |
| 10  | c1_over_c1_Int[x]   | RooAbsPdf       | 1000 | 0         | 0.289908  |
| 11  | mypdf               | RooProdPdf      | 1000 | 0         | 0.126912  |
| 12  | RooNLLVarNew        | RooAbsReal      | 1    | 0         | 2633.6    |
| 13  | nll_mypdf_mypdfData | RooAddition     | 1    | 0         | 2633.6    |
-----------------------------------------------------------------------------|
```

For any likelihood that is created with the new BatchMode, just call
`Print("v")` on it and you will get this.

This is meant for debugging and doesn't change any functionality.
The new BatchMode always had some trouble in simultaneous fits when the
same argument was reused in different components, because intermediate
values are stored by argument name and there are different results for
each component.

To solve this problem, the names of all arguments in a given component
are now prefixed by the category label surrounded by underscores.

Furthermore, the `getParameters` function for objects created with the
new BatchMode is improved such that observables renamed with this
prefixing are not confused with parameters.

Closes root-project#10991.
To be consistent with the old test statistics classes, empty bins should
not be skipped if the binned likelihood optimization is enabled.
The old test statistics did not apply the offset subtraction if
`RooAbsReal::hideOffset()` was `true`, and the new BatchMode should do
the same.
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac1015/cxx17, mac11/cxx14, windows10/cxx14
How to customize builds

@guitargeek guitargeek marked this pull request as ready for review October 26, 2022 09:06
Copy link
Member

@lmoneta lmoneta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thank you Jonas for implementing support for all the features in the new BatchMode!

@guitargeek guitargeek linked an issue Oct 26, 2022 that may be closed by this pull request
1 task
@guitargeek guitargeek merged commit 36d6a97 into root-project:master Oct 26, 2022
@guitargeek guitargeek deleted the datasplitting branch October 26, 2022 10:16
@phsft-bot
Copy link
Collaborator

Build failed on mac1015/cxx17.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RF] Batch mode with RooSimultaneous introduces spurious parameters
3 participants