Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] HistFactory PreprocessFunction::PrintXML() needs to escape special characters to produce valid XML #10840

Closed
1 task done
Tracked by #10758
TomasDado opened this issue Jun 28, 2022 · 5 comments · Fixed by #10884
Closed
1 task done
Tracked by #10758

Comments

@TomasDado
Copy link

  • Checked for duplicates

Describe the bug

Calling RooStats::HistFactory::MakeModelAndMeasurementFast(meas) leads to a seg violation in root 6.26.04 with gcc11 on CentOS7 machine (lxplus), while the same code worked with 6.24.06 with gcc8 on CentOS7 machine.
This is the relevant stack trace:

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f43c411c60c in waitpid () from /lib64/libc.so.6
#1  0x00007f43c4099f62 in do_system () from /lib64/libc.so.6
#2  0x00007f43c6fe5a5c in TUnixSystem::StackTrace() () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libCore.so
#3  0x00007f43c6fe3155 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libCore.so
#4  <signal handler called>
#5  0x00007f43c481adef in __dynamic_cast () from /cvmfs/sft.cern.ch/lcg/releases/gcc/11.2.0-8a51a/x86_64-centos7/lib64/libstdc++.so.6
#6  0x00007f43c51b2ff1 in RooAbsCollection::insert(RooAbsArg*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libRooFitCore.so
#7  0x00007f43c51b83db in RooAbsCollection::RooAbsCollection(RooAbsCollection const&, char const*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libRooFitCore.so
#8  0x00007f43c52480f9 in RooArgList::RooArgList(RooArgList const&, char const*) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libRooFitCore.so
#9  0x00007f43c573c6f5 in RooStats::HistFactory::HistoToWorkspaceFactoryFast::createStatConstraintTerms(RooWorkspace*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, ParamHistFunc&, TH1 const*, RooStats::HistFactory::Constraint::Type, double) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libHistFactory.so
#10 0x00007f43c57409d7 in RooStats::HistFactory::HistoToWorkspaceFactoryFast::MakeSingleChannelWorkspace(RooStats::HistFactory::Measurement&, RooStats::HistFactory::Channel&) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libHistFactory.so
#11 0x00007f43c5746984 in RooStats::HistFactory::HistoToWorkspaceFactoryFast::MakeSingleChannelModel(RooStats::HistFactory::Measurement&, RooStats::HistFactory::Channel&) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libHistFactory.so
#12 0x00007f43c574f3d9 in RooStats::HistFactory::MakeModelAndMeasurementFast(RooStats::HistFactory::Measurement&) () from /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos7-gcc11-opt/lib/libHistFactory.so

Note that this happens only for some setups, so I tried to debug it a bit and see what is special about the setups that crash. I will briefly describe what I found out:

We use AddPreprocessFunction() method to reparametrise our likelihood, this generally seems to work, but for some special cases this seems to cause problems. I created the XML files for the problematic setups and tried to use hist2workspace to try to identify the issues. I found out that the executable cannot read the XML files produced if it contains & and < or > as these are not valid XML characters and need to be escaped. So e.g. if the top level XML contains function like this:

<Function Name="morph_topWidth_0.700000" Expression="(0.+(((topWidth-0.700000)>=0)&&(fabs(topWidth-0.700000)<0.300000))*(1.-(fabs(topWidth-0.700000))/0.300000))" Dependents="topWidth[0.700000,0.700000,2.000000]" />

The hist2workspace executable will complain that it cannot parse the XML. I tried replacing the problematic characters with the XML replacements like &gt; etc. This makes the hist2workspace code run and it generates the workspace. However, if I try to replace the characters in the same way for the string passed to AddPrepropcessFunction, it doesnt parse the string and complains:

input_line_48:2:89: error: use of undeclared identifier 'gt'
Double_t TFormula____id12625155560414669645(Double_t *x){ return (0.+(((x[0]-0.700000)&{gt;}=0)&#38;&#38;(fabs(x[0]-0.700000)&{lt;0.300000}))*(1.-(fabs(x[0]-0.700000))/0.300000)) ; }
                                                                                        ^

I am not 100% sure these two issue - crash in the MakeModelAndMeasurementFast code and the XML parsing are related, but I think they could be.

Expected behavior

The code should not crash when calling MakeModelAndMeasurementFast as was the case in ROOT 6.24.06. The XML generated should be a valid XML that can be processed with histo2workspace.

To Reproduce

Adding any Expression to the top level XML file that contains characters like &, < breaks histo2workpace and it seems that using formuale containing these characters in AddPrepropcessFunction crashes MakeModelAndMeasurementFast.

Setup

  1. ROOT 6.26.04, but the histo2workspace issue not being able to read ther XML file is there also (at least) in 6.24.06
  2. CentOS7 (lxplus), tested with gcc8 and gcc11
  3. ROOT setup from LCG release, 101 (6.24.06) and 102 (6.26.04)
@TomasDado TomasDado added the bug label Jun 28, 2022
@guitargeek guitargeek changed the title Some Roofit Workspaces cannot be created in 6.26.04 [RF] Some Roofit Workspaces cannot be created in 6.26.04 Jun 28, 2022
guitargeek added a commit to guitargeek/root that referenced this issue Jul 2, 2022
Some improvements are made to the `PreprocessFunction` class:

  * add `const` to all the relevant member functions
  * remove the `fCommand` member, because it can be inferred from the
    other 3 members and it should not be set independently
  * use `std::string` by const-reference when possible
  * follow the RooFit coding style of using lower-case vor function
    argument names

Furthermore, a bugfix is also done:

  * in `PreprocessFunction::PrintXML`, replace the XML special
    characters which almost always appear in any formula with the XML
    escape codes

The bugfix addresses a problem where it was not possible to read an XML
generated by `Measurement::PrintXML` because the special characters in
the formula expression were not properly escaped.

With all these changes applied, the source files for this class changed
almost completely, and this opportunity was taken to reformat the code
with the ROOT `clang-format` style.

Closes root-project#10840.
@guitargeek guitargeek changed the title [RF] Some Roofit Workspaces cannot be created in 6.26.04 [RF] HistFactory PreprocessFunction::PrintXML() can produce invalid XML code with non-escaped special characters Jul 2, 2022
@guitargeek guitargeek changed the title [RF] HistFactory PreprocessFunction::PrintXML() can produce invalid XML code with non-escaped special characters [RF] HistFactory PreprocessFunction::PrintXML() needs to escape special characters to produce valid XML Jul 2, 2022
@guitargeek
Copy link
Contributor

Hi @TomasDado, thanks for opening this issue!

Indeed, it's not good that you get broken XML code when you print the XML for a measurement with a preprocessing function.

I have opened a PR to fix this, and also changed the title to describe the issue more accurately (also removing the v6.26.04 from the title because the issue was around forever and has nothing to do with the last release).

The patch should make it in the next 6.26.06 patch release, which I hope will help you with your workflows!

Then the right way to proceed for you will be to use the escape characters only when you write XML by hand, and in AddPrepropcessFunction you should just use the unescaped formula.

@TomasDado
Copy link
Author

Hi @guitargeek,

Thank you very much for updating the code! Just for my understanding, do the changes also fix the seg fault inMakeModelAndMeasurementFast? The seg fault seem to happen only in 6.26 and not in 6.24.

@guitargeek
Copy link
Contributor

Hi! I could not reproduce the segfault. I think it already got reported by somebody else and fixed by this PR:
#10740

To confirm this, it would be helpful if you'd source one of the ROOT nightlies on lxplus and check if the segfault is gone.

@TomasDado
Copy link
Author

Hi, I tested the code with the current master branch and I can confirm the crash is fixed. Thanks a lot!

@guitargeek
Copy link
Contributor

Great, thanks for checking!

guitargeek added a commit that referenced this issue Jul 4, 2022
Some improvements are made to the `PreprocessFunction` class:

  * add `const` to all the relevant member functions
  * remove the `fCommand` member, because it can be inferred from the
    other 3 members and it should not be set independently
  * use `std::string` by const-reference when possible
  * follow the RooFit coding style of using lower-case vor function
    argument names

Furthermore, a bugfix is also done:

  * in `PreprocessFunction::PrintXML`, replace the XML special
    characters which almost always appear in any formula with the XML
    escape codes

The bugfix addresses a problem where it was not possible to read an XML
generated by `Measurement::PrintXML` because the special characters in
the formula expression were not properly escaped.

With all these changes applied, the source files for this class changed
almost completely, and this opportunity was taken to reformat the code
with the ROOT `clang-format` style.

Closes #10840.
guitargeek added a commit to guitargeek/root that referenced this issue Jul 18, 2022
Some improvements are made to the `PreprocessFunction` class:

  * add `const` to all the relevant member functions
  * remove the `fCommand` member, because it can be inferred from the
    other 3 members and it should not be set independently
  * use `std::string` by const-reference when possible
  * follow the RooFit coding style of using lower-case vor function
    argument names

Furthermore, a bugfix is also done:

  * in `PreprocessFunction::PrintXML`, replace the XML special
    characters which almost always appear in any formula with the XML
    escape codes

The bugfix addresses a problem where it was not possible to read an XML
generated by `Measurement::PrintXML` because the special characters in
the formula expression were not properly escaped.

With all these changes applied, the source files for this class changed
almost completely, and this opportunity was taken to reformat the code
with the ROOT `clang-format` style.

Closes root-project#10840.
guitargeek added a commit that referenced this issue Jul 19, 2022
Some improvements are made to the `PreprocessFunction` class:

  * add `const` to all the relevant member functions
  * remove the `fCommand` member, because it can be inferred from the
    other 3 members and it should not be set independently
  * use `std::string` by const-reference when possible
  * follow the RooFit coding style of using lower-case vor function
    argument names

Furthermore, a bugfix is also done:

  * in `PreprocessFunction::PrintXML`, replace the XML special
    characters which almost always appear in any formula with the XML
    escape codes

The bugfix addresses a problem where it was not possible to read an XML
generated by `Measurement::PrintXML` because the special characters in
the formula expression were not properly escaped.

With all these changes applied, the source files for this class changed
almost completely, and this opportunity was taken to reformat the code
with the ROOT `clang-format` style.

Closes #10840.
vepadulano pushed a commit to vepadulano/root that referenced this issue Jul 20, 2022
Some improvements are made to the `PreprocessFunction` class:

  * add `const` to all the relevant member functions
  * remove the `fCommand` member, because it can be inferred from the
    other 3 members and it should not be set independently
  * use `std::string` by const-reference when possible
  * follow the RooFit coding style of using lower-case vor function
    argument names

Furthermore, a bugfix is also done:

  * in `PreprocessFunction::PrintXML`, replace the XML special
    characters which almost always appear in any formula with the XML
    escape codes

The bugfix addresses a problem where it was not possible to read an XML
generated by `Measurement::PrintXML` because the special characters in
the formula expression were not properly escaped.

With all these changes applied, the source files for this class changed
almost completely, and this opportunity was taken to reformat the code
with the ROOT `clang-format` style.

Closes root-project#10840.
j-mathe pushed a commit to j-mathe/root that referenced this issue Jul 26, 2022
Some improvements are made to the `PreprocessFunction` class:

  * add `const` to all the relevant member functions
  * remove the `fCommand` member, because it can be inferred from the
    other 3 members and it should not be set independently
  * use `std::string` by const-reference when possible
  * follow the RooFit coding style of using lower-case vor function
    argument names

Furthermore, a bugfix is also done:

  * in `PreprocessFunction::PrintXML`, replace the XML special
    characters which almost always appear in any formula with the XML
    escape codes

The bugfix addresses a problem where it was not possible to read an XML
generated by `Measurement::PrintXML` because the special characters in
the formula expression were not properly escaped.

With all these changes applied, the source files for this class changed
almost completely, and this opportunity was taken to reformat the code
with the ROOT `clang-format` style.

Closes root-project#10840.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants