-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated root to tip of branch master #9034
Updated root to tip of branch master #9034
Conversation
please test for CMSSW_14_1_ROOT6_X |
A new Pull Request was created by @iarspider for branch IB/CMSSW_14_1_X/rootmaster. @iarspider, @smuzaffar, @aandvalenzuela can you please review it and eventually sign? Thanks. |
cms-bot internal usage |
-1 Failed Tests: Build BuildI found compilation error when building: Entering library rule at src/PhysicsTools/NanoAOD/plugins >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTupleOutputModule.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/EventStringOutputFields.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/SummaryTableOutputFields.cc src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTupleOutputModule.cc:18:10: fatal error: ROOT/RNTupleOptions.hxx: No such file or directory 18 | #include | ^~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TriggerOutputFields.cc |
Did they change the RNTuple API again? |
Yes, but to quote Jakob from the RNTuple mattermost channel from last Friday:
|
test parameters:
|
please test |
please abort |
please test |
-1 Failed Tests: Build BuildI found compilation error when building: >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/SummaryTableOutputFields.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TriggerOutputFields.cc In file included from src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc:1: src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.h:14:27: error: 'RCollectionNTupleWriter' has not been declared in 'ROOT::Experimental' 14 | using ROOT::Experimental::RCollectionNTupleWriter; | ^~~~~~~~~~~~~~~~~~~~~~~ src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.h:113:19: error: 'RCollectionNTupleWriter' was not declared in this scope 113 | std::shared_ptr m_collection; | ^~~~~~~~~~~~~~~~~~~~~~~ |
please test |
-1 Failed Tests: Build The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: BuildI found compilation error when building: >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/SummaryTableOutputFields.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TableOutputFields.cc >> Compiling edm plugin src/PhysicsTools/NanoAOD/plugins/rntuple/TriggerOutputFields.cc In file included from src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.cc:1: src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.h:15:27: error: 'RCollectionNTupleWriter' has not been declared in 'ROOT::Experimental' 15 | using ROOT::Experimental::RCollectionNTupleWriter; | ^~~~~~~~~~~~~~~~~~~~~~~ src/PhysicsTools/NanoAOD/plugins/rntuple/NanoAODRNTuples.h:16:27: error: 'RNTupleWriter' has not been declared in 'ROOT::Experimental' 16 | using ROOT::Experimental::RNTupleWriter; | ^~~~~~~~~~~~~ |
please test |
-1 Failed Tests: RelVals RelVals |
Umm
First, why are floating point exceptions enabled? Second, why is (presumably) AVX2 code being run? The node had |
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
Hi @iarspider and @makortel, sorry for causing you so much trouble with ROOT recently! There were two big changes in The latter will have a big impact on the users, speeding RooFit likelihood minimizations up by up to a factor 10. The new evaluation backend was carefully validated in the last years, and I have fixed all problems I was aware of. But thanks to your syncing efforts with ROOT master, I see that there are still some problems left that need to be fixed. If I remember correctly, CMSSW does multiple RooFit fits concurrently, e.g. because they are done in different producers. So I think the current crash must be because of missing thread safety of our vectorized pdf evaluation library, which I suggest to fix: The reason why AVX2 code is executed, is because RooFit ships with the evaluation library compiled multiple times for different SIMD instruction sets. Then at runtime, RooFit dynamically loads the fastest version of the library that is supported by the CPU: In that logic, AVX is preferred over SSE. Is that a problem for CMSSW? |
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
We would generally want to be in full control of the vectorization target (or as close as we can get). Our baseline is still SSE3, but there is work ongoing towards deploying a "multi-architecture" build of CMSSW (plus some select externals), some more information in cms-sw/cmssw#43652. We have some exceptions to this general approach
With Tensorflow we have had quite some trouble, mostly but not only in special cases (some of the story is recorded in cms-sw/cmssw#42444 and other issues linked there). On a somewhat related note cms-sw/cmssw#44188 shows some "fun" we are currently dealing with Eigen (I hope is not very relevant for our use of ROOT). I see there is already a way for a user to select the target binary, so minimally we could use that. Do I understand correctly that SSE3 would correspond to I'm quite sure CMS would e.g. want to skip the original AVX implementation because of the frequency scaling behavior of that era of CPUs. Anyway, I think in CMS we need to discuss more how we want to deal with the by-default dynamic behavior of RooFit. What kind of guarantees for reproducibility of the fit results does RooFit give between different vectorization targets? |
I opened a separate issue cms-sw/cmssw#44308 for the use of the vectorized backend. I'd suggest we move that part of the discussion there, and leave this PR for the more technical side (crashes etc) and leave the default behavior of RooFit as it is for now. |
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
No description provided.