Add a tool to merge several podio files into a single one #681

jmcarcell · 2024-09-19T13:38:37Z

BEGINRELEASENOTES

Add a tool to merge several podio files into a single one:
- Metadata for every event can be saved or not
- The same format as the first input file will be used (TTree or RNTuple)
- Metadata about the input file names will be saved

ENDRELEASENOTES

Useful not to have to deal with many small files, even though the readers can read them fine.

tmadlener

This does a lot of heavy lifting that might not be strictly necessary, as it unpacks and than repacks every collection in every frame, whereas a simple hadd would almost be enough for merging TTree based files. Additionally, it will do schema evolution (if applicable), so that there could be subtle changes to the produced output file. On the other hand, this will correctly handle input files with different schema versions, so that might be not too bad.

How much slower is this than a c++ implementation? There is #620 after all to solve a quite significant performance issue with podio-dump, and the difference is not only the long startup time in that case.

tools/podio-merge-files

jmcarcell · 2024-09-19T18:54:11Z

Ah I didn't think about hadd, this is a tool I have used a couple of times (well only the for loop) so I thought it would be nice to have. On small files (1 MB TTree files) it's much faster than in python, but that's the case with the worst relative overhead I would imagine. For RNTuples it crashes with the ones I tried.

tmadlener · 2024-09-20T07:08:24Z

For RNTuples it crashes with the ones I tried.

Can you specify what crashes in this case? hadd or the c++ version of this script?

jmcarcell · 2024-09-20T10:59:16Z

hadd is crashing

tmadlener · 2024-09-24T07:12:53Z

Maybe add a bit of metadata that keeps track of which files the the merged files comes from.

jmcarcell · 2024-11-18T14:46:35Z

This should be ready and now write some metadata with the names of the files as passed in the arguments (full paths being saved as full paths):

Key                           Value
--------------------------------------------------------------------------------
MergeInputFiles               [example_frame.root, example_frame2.root]

tools/podio-merge-files

tmadlener reviewed Sep 19, 2024

View reviewed changes

tools/podio-merge-files Outdated Show resolved Hide resolved

jmcarcell force-pushed the merge-tool branch from 4118849 to 59d9dff Compare September 20, 2024 19:54

jmcarcell force-pushed the merge-tool branch from 8868a1e to f11cd49 Compare November 18, 2024 14:33

tmadlener reviewed Nov 18, 2024

View reviewed changes

tools/podio-merge-files Outdated Show resolved Hide resolved

tmadlener reviewed Nov 19, 2024

View reviewed changes

tools/podio-merge-files Outdated Show resolved Hide resolved

jmcarcell added 12 commits November 19, 2024 09:50

Add a tool to merge several podio files into a single one

0e32711

Fix format

1037351

Fix format

6af93d1

Improve message

0e0fe5c

Generate a metadata frame if it doesn't exist

61fca90

Format with black

7efcdee

Add configuration for the metadata parameter name

02c0ebb

Fix pre-commit

aec7f7f

Fix pre-commit

926812f

Disable pylint check

74e55d6

Hardcode the metadata parameters

5bd511b

Add a comment

5fddc2a

jmcarcell force-pushed the merge-tool branch from 049efef to 5fddc2a Compare November 19, 2024 08:50

Fix f-string

2ffa8c8

andresailer reviewed Nov 19, 2024

View reviewed changes

tools/podio-merge-files Show resolved Hide resolved

Fix pre-commit

977c75b

tmadlener approved these changes Nov 20, 2024

View reviewed changes

tmadlener merged commit 960795b into AIDASoft:master Nov 20, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a tool to merge several podio files into a single one #681

Add a tool to merge several podio files into a single one #681

jmcarcell commented Sep 19, 2024 •

edited

Loading

tmadlener left a comment

jmcarcell commented Sep 19, 2024

tmadlener commented Sep 20, 2024

jmcarcell commented Sep 20, 2024

tmadlener commented Sep 24, 2024

jmcarcell commented Nov 18, 2024

Add a tool to merge several podio files into a single one #681

Add a tool to merge several podio files into a single one #681

Conversation

jmcarcell commented Sep 19, 2024 • edited Loading

tmadlener left a comment

Choose a reason for hiding this comment

jmcarcell commented Sep 19, 2024

tmadlener commented Sep 20, 2024

jmcarcell commented Sep 20, 2024

tmadlener commented Sep 24, 2024

jmcarcell commented Nov 18, 2024

jmcarcell commented Sep 19, 2024 •

edited

Loading