Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated root to tip of branch master #9034

Conversation

iarspider
Copy link
Contributor

No description provided.

@iarspider
Copy link
Contributor Author

please test for CMSSW_14_1_ROOT6_X

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @iarspider for branch IB/CMSSW_14_1_X/rootmaster.

@iarspider, @smuzaffar, @aandvalenzuela can you please review it and eventually sign? Thanks.
@sextonkennedy, @antoniovilela, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 27, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-030311/37708/summary.html
COMMIT: 9a07e7f
CMSSW: CMSSW_14_1_ROOT6_X_2024-02-26-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37708/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

Entering library rule at src/PhysicsTools/NanoAOD/plugins
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTupleOutputModule.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/EventStringOutputFields.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/SummaryTableOutputFields.cc
src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTupleOutputModule.cc:18:10: fatal error: ROOT/RNTupleOptions.hxx: No such file or directory
   18 | #include 
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TriggerOutputFields.cc


@makortel
Copy link
Contributor

Did they change the RNTuple API again?

@guitargeek
Copy link

Yes, but to quote Jakob from the RNTuple mattermost channel from last Friday:

The larger interface changes are now merged, so things are supposed to be relatively stable again.

@iarspider
Copy link
Contributor Author

test parameters:

@iarspider
Copy link
Contributor Author

please test

@iarspider
Copy link
Contributor Author

please abort

@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-030311/37789/summary.html
COMMIT: 9a07e7f
CMSSW: CMSSW_14_1_ROOT6_X_2024-02-26-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37789/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/SummaryTableOutputFields.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TriggerOutputFields.cc
In file included from src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc:1:
src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.h:14:27: error: 'RCollectionNTupleWriter' has not been declared in 'ROOT::Experimental'
   14 | using ROOT::Experimental::RCollectionNTupleWriter;
      |                           ^~~~~~~~~~~~~~~~~~~~~~~
src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.h:113:19: error: 'RCollectionNTupleWriter' was not declared in this scope
  113 |   std::shared_ptr m_collection;
      |                   ^~~~~~~~~~~~~~~~~~~~~~~


@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-030311/37799/summary.html
COMMIT: 9a07e7f
CMSSW: CMSSW_14_1_ROOT6_X_2024-02-26-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37799/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-030311/37799/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-030311/37799/git-merge-result

Build

I found compilation error when building:

>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/SummaryTableOutputFields.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc
>> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TriggerOutputFields.cc
In file included from src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc:1:
src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.h:15:27: error: 'RCollectionNTupleWriter' has not been declared in 'ROOT::Experimental'
   15 | using ROOT::Experimental::RCollectionNTupleWriter;
      |                           ^~~~~~~~~~~~~~~~~~~~~~~
src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.h:16:27: error: 'RNTupleWriter' has not been declared in 'ROOT::Experimental'
   16 | using ROOT::Experimental::RNTupleWriter;
      |                           ^~~~~~~~~~~~~


@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 1, 2024

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-030311/37844/summary.html
COMMIT: 9a07e7f
CMSSW: CMSSW_14_1_ROOT6_X_2024-02-29-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

  • 9.0A fatal system signal has occurred: floating point exception
  • 140.53A fatal system signal has occurred: floating point exception
  • 25.0A fatal system signal has occurred: floating point exception
Expand to see more relval errors ...

@makortel
Copy link
Contributor

makortel commented Mar 1, 2024

Umm

A fatal system signal has occurred: floating point exception

Thread 1 (Thread 0x14b2ced87640 (LWP 1369933) "cmsRun"):
#3  0x000014b2c7bc5730 in sig_dostack_then_abort () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000014b2439f30f2 in RooBatchCompute::AVX2::RooBatchComputeClass::compute(RooBatchCompute::Config const&, RooBatchCompute::Computer, double*, unsigned long, std::vector<std::__ROOT::span<double const>, std::allocator<std::__ROOT::span<double const> > > const&, std::vector<double, std::allocator<double> >&) () from /cvmfs/cms-ci.cern.ch/week0/PR_b13b48ff/el8_amd64_gcc12/lcg/root/6.31.01-fe529b8a38f69e54a8bff6884e030efa/lib/libRooBatchCompute_AVX2.so
#6  0x000014b27be12f54 in RooBreitWigner::computeBatch(double*, unsigned long, RooFit::Detail::DataMap const&) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFit.so
#7  0x000014b27b904201 in RooFit::Evaluator::run() () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#8  0x000014b27b8de635 in RooEvaluatorWrapper::getValV(RooArgSet const*) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#9  0x000014b27b98ae3a in RooMinimizerFcn::operator()(double const*) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#10 0x000014b2d1b7b20f in ROOT::Math::Functor::DoEval(double const*) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libMathCore.so
#11 0x000014b27ee2c40e in ROOT::Minuit2::MnUserFcn::operator()(ROOT::Minuit2::LAVector const&) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libMinuit2.so
#12 0x000014b27ee28883 in ROOT::Minuit2::MnSeedGenerator::operator()(ROOT::Minuit2::MnFcn const&, ROOT::Minuit2::GradientCalculator const&, ROOT::Minuit2::MnUserParameterState const&, ROOT::Minuit2::MnStrategy const&) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libMinuit2.so
#13 0x000014b27ee33d40 in ROOT::Minuit2::ModularFunctionMinimizer::Minimize(ROOT::Minuit2::FCNBase const&, ROOT::Minuit2::MnUserParameterState const&, ROOT::Minuit2::MnStrategy const&, unsigned int, double) const () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libMinuit2.so
#14 0x000014b27edf40f5 in ROOT::Minuit2::Minuit2Minimizer::Minimize() () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libMinuit2.so
#15 0x000014b2d1bd5f4e in ROOT::Fit::Fitter::DoMinimization(ROOT::Math::IBaseFunctionMultiDimTempl<double> const*) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libMathCore.so
#16 0x000014b27b9867c6 in RooMinimizer::minimize(char const*, char const*) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#17 0x000014b27b7a60d6 in RooFit::FitHelpers::minimize(RooAbsReal&, RooAbsReal&, RooAbsData const&, RooCmdConfig const&) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#18 0x000014b27b7add53 in RooFit::FitHelpers::fitTo(RooAbsReal&, RooAbsData&, RooLinkedList const&, bool) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#19 0x000014b27b81515c in RooAbsPdf::fitToImpl(RooAbsData&, RooLinkedList const&) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/external/el8_amd64_gcc12/lib/libRooFitCore.so
#20 0x000014b27c001255 in PhotonDataCertification::invMassZtest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, TString, dqm::implementation::IGetter&) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/lib/el8_amd64_gcc12/pluginDQMOfflineEGammaPlugins.so
#21 0x000014b27c001712 in PhotonDataCertification::dqmEndJob(dqm::implementation::IBooker&, dqm::implementation::IGetter&) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/lib/el8_amd64_gcc12/pluginDQMOfflineEGammaPlugins.so
#22 0x000014b27bfcf6f9 in non-virtual thunk to DQMEDHarvester::endProcessBlockProduce(edm::ProcessBlock&) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/lib/el8_amd64_gcc12/pluginDQMOfflineEGammaPlugins.so
#23 0x000014b2d2386b7b in edm::one::EDProducerBase::doEndProcessBlock(edm::ProcessBlockPrincipal const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9034/37844/CMSSW_14_1_ROOT6_X_2024-02-29-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so

First, why are floating point exceptions enabled? Second, why is (presumably) AVX2 code being run?

The node had Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz (Broadwell) which supports AVX2, but the build setup should target SSE3(?)

guitargeek added a commit to guitargeek/root that referenced this pull request Mar 4, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
guitargeek added a commit to guitargeek/root that referenced this pull request Mar 4, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
guitargeek added a commit to guitargeek/root that referenced this pull request Mar 4, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
@guitargeek
Copy link

Hi @iarspider and @makortel, sorry for causing you so much trouble with ROOT recently!

There were two big changes in master merged about the same time. The RNTuple interface changes, and making the new vectorizing CPU evaluation backend in RooFit the default.

The latter will have a big impact on the users, speeding RooFit likelihood minimizations up by up to a factor 10. The new evaluation backend was carefully validated in the last years, and I have fixed all problems I was aware of.

But thanks to your syncing efforts with ROOT master, I see that there are still some problems left that need to be fixed. If I remember correctly, CMSSW does multiple RooFit fits concurrently, e.g. because they are done in different producers. So I think the current crash must be because of missing thread safety of our vectorized pdf evaluation library, which I suggest to fix:
root-project/root#14877

The reason why AVX2 code is executed, is because RooFit ships with the evaluation library compiled multiple times for different SIMD instruction sets. Then at runtime, RooFit dynamically loads the fastest version of the library that is supported by the CPU:
https://github.com/root-project/root/blob/master/roofit/batchcompute/src/Initialisation.cxx#L68

In that logic, AVX is preferred over SSE. Is that a problem for CMSSW?

guitargeek added a commit to guitargeek/root that referenced this pull request Mar 4, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
guitargeek added a commit to guitargeek/root that referenced this pull request Mar 4, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
guitargeek added a commit to guitargeek/root that referenced this pull request Mar 4, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
@makortel
Copy link
Contributor

makortel commented Mar 4, 2024

The reason why AVX2 code is executed, is because RooFit ships with the evaluation library compiled multiple times for different SIMD instruction sets. Then at runtime, RooFit dynamically loads the fastest version of the library that is supported by the CPU: https://github.com/root-project/root/blob/master/roofit/batchcompute/src/Initialisation.cxx#L68

In that logic, AVX is preferred over SSE. Is that a problem for CMSSW?

We would generally want to be in full control of the vectorization target (or as close as we can get). Our baseline is still SSE3, but there is work ongoing towards deploying a "multi-architecture" build of CMSSW (plus some select externals), some more information in cms-sw/cmssw#43652.

We have some exceptions to this general approach

  • Tensorflow (and I believe also ONNX) are allowed to use their more dynamic mechanisms for wider-than-sse3 vectorization targets
  • We don't try to prevent any dynamic behavior of glibc

With Tensorflow we have had quite some trouble, mostly but not only in special cases (some of the story is recorded in cms-sw/cmssw#42444 and other issues linked there). On a somewhat related note cms-sw/cmssw#44188 shows some "fun" we are currently dealing with Eigen (I hope is not very relevant for our use of ROOT).

I see there is already a way for a user to select the target binary, so minimally we could use that. Do I understand correctly that SSE3 would correspond to generic?

I'm quite sure CMS would e.g. want to skip the original AVX implementation because of the frequency scaling behavior of that era of CPUs.

Anyway, I think in CMS we need to discuss more how we want to deal with the by-default dynamic behavior of RooFit. What kind of guarantees for reproducibility of the fit results does RooFit give between different vectorization targets?

@makortel
Copy link
Contributor

makortel commented Mar 4, 2024

I opened a separate issue cms-sw/cmssw#44308 for the use of the vectorized backend. I'd suggest we move that part of the discussion there, and leave this PR for the more technical side (crashes etc) and leave the default behavior of RooFit as it is for now.

guitargeek added a commit to root-project/root that referenced this pull request Mar 5, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
@iarspider iarspider closed this Mar 6, 2024
kristupaspranc pushed a commit to kristupaspranc/root that referenced this pull request Apr 10, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
lobis pushed a commit to lobis/root that referenced this pull request Apr 10, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
kristupaspranc pushed a commit to kristupaspranc/root that referenced this pull request May 21, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
silverweed pushed a commit to silverweed/root that referenced this pull request Aug 19, 2024
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.

To avoid frequent memory allocations, this buffer was made a `static`
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.

This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034

This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple
computation graphs can be concurrently evaluated.

It was tested with the ATLAS benchmarks in `rootbench` that the fitting
performance remains the same.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants