Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid read in TChain #17259

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vepadulano
Copy link
Member

@vepadulano vepadulano commented Dec 12, 2024

After merging #17301 , there is no need to create the logic to suppress the TChainIndex build error presently. The invalid read found by valgrind is still there, so this PR includes a test for the global join case and fixes the memory error.

Copy link

github-actions bot commented Dec 12, 2024

Test Results

    18 files      18 suites   4d 0h 58m 2s ⏱️
 2 681 tests  2 681 ✅ 0 💤 0 ❌
46 536 runs  46 536 ✅ 0 💤 0 ❌

Results for commit d1bc656.

♻️ This comment has been updated with latest results.

@vepadulano vepadulano force-pushed the rdf-suppress-tchainindex-error branch from 260dce9 to 60f2317 Compare December 17, 2024 16:11
@vepadulano
Copy link
Member Author

@pcanal investigating the test error led me to finding some faulty behaviour in the case of a chain moving to a new tree but not properly updating the information of the underlying TTreeIndex (specifically regarding the attached TTreeFormulas). You can find some description in the first commit message, and here is the full valgrind stacktrace

valgrind_tchain_friend_updateformulaleaves.txt

Let's see if this was all or just part of the reason for the CI failures.

To avoid following valgrind error leading to crashes in multithreaded RDF runs:

```
==735217== Invalid read of size 8
==735217==    at 0x6A4517F: TTreeFormula::UpdateFormulaLeaves() (TTreeFormula.cxx:5121)
==735217==    by 0x6A5B9AF: TTreeIndex::UpdateFormulaLeaves(TTree const*) (TTreeIndex.cxx:625)
==735217==    by 0x67A5447: TChain::LoadTree(long long) (TChain.cxx:1670)
==735217==    by 0x6A94DBD: TTreeReader::SetEntryBase(long long, bool) (TTreeReader.cxx:669)
==735217==    by 0x6A96027: TTreeReader::SetEntry(long long) (TTreeReader.h:225)
==735217==    by 0x7576BAA: TTreeReader::Next() (TTreeReader.h:217)
==735217==    by 0x756D591: (anonymous namespace)::validTTreeReaderRead(TTreeReader&) (RLoopManager.cxx:538)
==735217==    by 0x756D9A9: ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}::operator()(TTreeReader&) const (RLoopManager.cxx:575)
==735217==    by 0x7574F7E: void std::__invoke_impl<void, ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}&, TTreeReader&>(std::__invoke_other, ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}&, TTreeReader&) (invoke.h:61)
==735217==    by 0x757480E: std::enable_if<is_invocable_r_v<void, ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}&, TTreeReader&>, void>::type std::__invoke_r<void, ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}&, TTreeReader&>(ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}&, TTreeReader&) (invoke.h:111)
==735217==    by 0x757433A: std::_Function_handler<void (TTreeReader&), ROOT::Detail::RDF::RLoopManager::RunTreeProcessorMT()::{lambda(TTreeReader&)#1}>::_M_invoke(std::_Any_data const&, TTreeReader&) (std_function.h:290)
==735217==    by 0x6ABCE3E: std::function<void (TTreeReader&)>::operator()(TTreeReader&) const (std_function.h:591)
==735217==  Address 0x15b4e000 is 0 bytes inside a block of size 712 free'd
==735217==    at 0x48456C6: operator delete(void*) (vg_replace_malloc.c:1131)
==735217==    by 0x4E9494B: TStorage::ObjectDealloc(void*) (TStorage.cxx:325)
==735217==    by 0x4E6B181: TObject::operator delete(void*) (TObject.cxx:1113)
==735217==    by 0x67F47E9: TTree::~TTree() (TTree.cxx:1023)
==735217==    by 0x4EED8F1: TCollection::GarbageCollect(TObject*) (TCollection.cxx:736)
==735217==    by 0x4EF5D4C: TList::Delete(char const*) (TList.cxx:535)
==735217==    by 0x4EF0428: THashList::Delete(char const*) (THashList.cxx:215)
==735217==    by 0x5507848: TDirectoryFile::Close(char const*) (TDirectoryFile.cxx:585)
==735217==    by 0x5523835: TFile::Close(char const*) (TFile.cxx:986)
==735217==    by 0x5521226: TFile::~TFile() (TFile.cxx:563)
==735217==    by 0x55215FF: TFile::~TFile() (TFile.cxx:600)
==735217==    by 0x67A4990: TChain::LoadTree(long long) (TChain.cxx:1490)
==735217==  Block was alloc'd at
==735217==    at 0x4841FEC: operator new(unsigned long) (vg_replace_malloc.c:487)
==735217==    by 0x4E948DF: TStorage::ObjectAlloc(unsigned long) (TStorage.cxx:293)
==735217==    by 0x48C648E: TObject::operator new(unsigned long) (TObject.h:181)
==735217==    by 0x672E04D: ROOT::new_TTree(void*) (G__Tree.cxx:4363)
==735217==    by 0x4F4AA25: TClass::NewObject(TClass::ENewType, bool) const (TClass.cxx:5084)
==735217==    by 0x4F4A932: TClass::New(TClass::ENewType, bool) const (TClass.cxx:5061)
==735217==    by 0x556146A: TKey::ReadObj() (TKey.cxx:809)
==735217==    by 0x55091D3: TDirectoryFile::Get(char const*) (TDirectoryFile.cxx:988)
==735217==    by 0x67A4CA4: TChain::LoadTree(long long) (TChain.cxx:1542)
==735217==    by 0x6A94DBD: TTreeReader::SetEntryBase(long long, bool) (TTreeReader.cxx:669)
==735217==    by 0x6A96027: TTreeReader::SetEntry(long long) (TTreeReader.h:225)
==735217==    by 0x6A94855: TTreeReader::SetEntriesRange(long long, long long) (TTreeReader.cxx:560)
```
@vepadulano vepadulano force-pushed the rdf-suppress-tchainindex-error branch from 60f2317 to 14b8763 Compare December 19, 2024 08:37
@vepadulano vepadulano changed the title [df] Suppress duplicate errors from TChainIndex Fix invalid read in TChain Dec 19, 2024
@vepadulano vepadulano requested a review from pcanal December 19, 2024 08:39
@vepadulano vepadulano force-pushed the rdf-suppress-tchainindex-error branch from 14b8763 to 237629b Compare December 19, 2024 14:17
@vepadulano vepadulano closed this Dec 19, 2024
@vepadulano vepadulano reopened this Dec 19, 2024
A new battery of four tests is added:

1. Try to build TChainIndex, fail, fallback to building TTreeIndex, check
correct matching. For both the same cardinality and different cardinality cases
(auxiliary smaller than primary).

2. Try to build TChainIndex and succeed. For both the same cardinality and
different cardinality cases (auxiliary smaller than primary).
@vepadulano vepadulano force-pushed the rdf-suppress-tchainindex-error branch from 237629b to d1bc656 Compare December 20, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants