Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] Batch mode with RooSimultaneous introduces spurious parameters #10991

Closed
1 task done
elusian opened this issue Jul 19, 2022 · 10 comments · Fixed by #11622
Closed
1 task done

[RF] Batch mode with RooSimultaneous introduces spurious parameters #10991

elusian opened this issue Jul 19, 2022 · 10 comments · Fixed by #11622

Comments

@elusian
Copy link
Contributor

elusian commented Jul 19, 2022

  • Checked for duplicates

Describe the bug

When fitting a RooSimultaneous pdf using BatchMode(true) (which should be cpu) the fit contains additional parameters, one for each observable, called _<first category label>_<obs name>, where the category is the one used in the RooSimultaneous.
The fit converges, but fails at the HESSE step, leading to an "approximation only" covariance matrix.
The label used is, from what I understood, based on label ordering and not the indices.
The mapping from category label to index however influences the fit result.

Expected behavior

No extra parameters, successful HESSE

To Reproduce

void test_batchModeCategory() {
    RooRealVar x("x", "", 0, 1);
    RooRealVar rnd("rnd", "", 0, 1);
    // change this mapping from labels to indices to change the fit result (slightly)
    RooThresholdCategory catThr("cat", "", rnd, "a", 2);
    catThr.addThreshold(1./3, "b", 0);
    catThr.addThreshold(2./3, "c", 1);
    
    RooRealVar m("m", "", 0.5, 0, 1);
    RooGaussian g("g", "", x, m, RooFit::RooConst(0.1));
    RooUniform rndPdf("rndPdf", "", rnd);
    RooProdPdf pdf("pdf", "", RooArgSet(g, rndPdf));
    
    auto ds = pdf.generate(RooArgSet(x, rnd), RooFit::Name("ds"), RooFit::NumEvents(100));
    auto cat = dynamic_cast<RooCategory*>(ds->addColumn(catThr));
    
    RooSimultaneous sim("sim", "", *cat);
    sim.addPdf(g, "a");
    sim.addPdf(g, "b");
    sim.addPdf(g, "c");
    
    sim.fitTo(*ds, RooFit::Save(true)
        , RooFit::BatchMode(true) // commenting this solves the issue
    )->Print("V");
    // _a_x parameter in results, cov matrix is "Approximation only"
    
    // works
    //g.fitTo(*ds, RooFit::Save(true), RooFit::BatchMode(true))->Print("V");
}

Setup

Centos7
ROOT 6.26.04 from LCG dev4

Additional context

@guitargeek
Copy link
Contributor

Hi, thanks a lot for opening this issue! I was aware of this, and in the next patch release these spurious free parameters will be gone (6.26.06 is expected in about 2 weeks).

However, it is still good that you opened this issue because now that you used Print("V"), I see that the spurious parameters are still there in the fit result with ROOT master, they are just constant!


  RooFitResult: minimized FCN value: 46.0037, estimated distance to minimum: 1.51615e-08
                covariance matrix quality: Full, accurate covariance matrix
                Status : MINIMIZE=0 HESSE=0 

    Constant Parameter    Value     
  --------------------  ------------
                  _a_x    5.0000e-01

    Floating Parameter  InitialValue    FinalValue +/-  Error     GblCorr.
  --------------------  ------------  --------------------------  --------
                     m    5.0000e-01    4.8765e-01 +/-  1.02e-02  <none>

So to close this issue for good, we also have to get rid of them in the "Constant Parameter" list. These spurious parameters should not be seen anywhere, as they are pure implementation details for the BatchMode.

@guitargeek
Copy link
Contributor

As it is important that the BatchMode has no possible disadvantage compared to the old RooFit, I put this issue into the list of issues that need to be resolved before the next patch release: #10758

@elusian
Copy link
Contributor Author

elusian commented Jul 19, 2022

Hi Jonas
thank you for your answer!

I switched to LCG dev3 to test on master and noticed that the fitted parameter value is equal to what it was in 6.26.04+batchmode, so still different from no batchmode or batchmode in 6.24.06
It is also still dependent on label mapping (the likelihood too). Is this expected?

@guitargeek
Copy link
Contributor

No, this is not expected at all. I will investigate this!

@guitargeek
Copy link
Contributor

guitargeek commented Jul 19, 2022

Ah I see now what is going on. In the BatchMode implementation, each RooAbsArg in your model must be unique (this was always preferred in RooFit, but the BatchMode is more strict with that). You are using the same pdf for each component of the RooSimultaneous, and it can't deal with that. This code would work:

void test_batchModeCategory() {
    RooRealVar x("x", "", 0, 1);
    RooRealVar rnd("rnd", "", 0, 1);
    // change this mapping from labels to indices to change the fit result (slightly)
    RooThresholdCategory catThr("cat", "", rnd, "a", 2);
    catThr.addThreshold(1./3, "b", 0);
    catThr.addThreshold(2./3, "c", 1);
    
    RooRealVar m("m", "", 0.5, 0, 1);
    RooGaussian g1("g1", "", x, m, RooFit::RooConst(0.1));
    RooGaussian g2("g2", "", x, m, RooFit::RooConst(0.1));
    RooGaussian g3("g3", "", x, m, RooFit::RooConst(0.1));
    RooUniform rndPdf("rndPdf", "", rnd);
    RooProdPdf pdf("pdf", "", RooArgSet(g1, rndPdf));
    
    auto ds = pdf.generate(RooArgSet(x, rnd), RooFit::Name("ds"), RooFit::NumEvents(100));
    auto cat = dynamic_cast<RooCategory*>(ds->addColumn(catThr));
    
    RooSimultaneous sim("sim", "", *cat);
    sim.addPdf(g1, "a");
    sim.addPdf(g2, "b");
    sim.addPdf(g3, "c");
    
    for(bool batchMode : {false, true}) {
        sim.fitTo(*ds
            , RooFit::Save(true)
            , RooFit::PrintLevel(-1)
            , RooFit::BatchMode(batchMode) // commenting this solves the issue

        )->Print("V");
        m.setVal(0.5);
        m.setError(0.0);
    }
}

But your example shows that repeated pdfs in different RooSimultaneous components should better be supported to not cause confusion like this. And in any other case, there should be a strong warning or thrown exception if that happens.

@elusian
Copy link
Contributor Author

elusian commented Jul 19, 2022

Could you expand a bit on what exactly the limitation is? I'd like to understand whether my actual model is affected or not.

@guitargeek
Copy link
Contributor

guitargeek commented Jul 19, 2022

The limitation right now is: RooAbsArgs that are in different channels of the RooSimultaneous and depend on data are not allowed to have the same name (by the RooAbsArgs I don't only mean the top-level channel pdf but also anything that is serving that pdf).

In your initial example, there was a Gaussian "g" in each channel. But I hope to eliminate this limitation by the next patch release.

@elusian
Copy link
Contributor Author

elusian commented Jul 19, 2022

Oh, ok, it's linked to the RooSimultaneous. That makes sense, thank you.
If I understood correctly my model is affected. By the next patch release you mean 6.26.06 (meaning in the next 2 weeks)?

@guitargeek
Copy link
Contributor

Yes exactly!

guitargeek added a commit to guitargeek/root that referenced this issue Oct 26, 2022
The new BatchMode always had some trouble in simultaneous fits when the
same argument was reused in different components, because intermediate
values are stored by argument name and there are different results for
each component.

To solve this problem, the names of all arguments in a given component
are now prefixed by the category label surrounded by underscores.

Furthermore, the `getParameters` function for objects created with the
new BatchMode is improved such that observables renamed with this
prefixing are not confused with parameters.

Closes root-project#10991.
guitargeek added a commit to guitargeek/root that referenced this issue Oct 26, 2022
The new BatchMode always had some trouble in simultaneous fits when the
same argument was reused in different components, because intermediate
values are stored by argument name and there are different results for
each component.

To solve this problem, the names of all arguments in a given component
are now prefixed by the category label surrounded by underscores.

Furthermore, the `getParameters` function for objects created with the
new BatchMode is improved such that observables renamed with this
prefixing are not confused with parameters.

Closes root-project#10991.
@guitargeek guitargeek linked a pull request Oct 26, 2022 that will close this issue
@guitargeek
Copy link
Contributor

guitargeek commented Oct 26, 2022

Hi! This problem is now fixed in master. There should be no spurious parameters anymore, and naming collisions are avoided by renaming all RooAbsArgs for each component with a prefix.

The fix will make it to the next patch release, as tracked in this issue:
#11534

guitargeek added a commit that referenced this issue Oct 26, 2022
The new BatchMode always had some trouble in simultaneous fits when the
same argument was reused in different components, because intermediate
values are stored by argument name and there are different results for
each component.

To solve this problem, the names of all arguments in a given component
are now prefixed by the category label surrounded by underscores.

Furthermore, the `getParameters` function for objects created with the
new BatchMode is improved such that observables renamed with this
prefixing are not confused with parameters.

Closes #10991.
@root-project root-project deleted a comment from github-actions bot Oct 31, 2022
@root-project root-project deleted a comment from github-actions bot Oct 31, 2022
@root-project root-project deleted a comment from github-actions bot Oct 31, 2022
guitargeek added a commit to guitargeek/root that referenced this issue Nov 19, 2022
The new BatchMode always had some trouble in simultaneous fits when the
same argument was reused in different components, because intermediate
values are stored by argument name and there are different results for
each component.

To solve this problem, the names of all arguments in a given component
are now prefixed by the category label surrounded by underscores.

Furthermore, the `getParameters` function for objects created with the
new BatchMode is improved such that observables renamed with this
prefixing are not confused with parameters.

Closes root-project#10991.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants