[ntuple] Merger: fix handling of compression-related options #16949

silverweed · 2024-11-14T14:51:03Z

Currently the RNTupleMerger has a pretty confusing and unintuitive handling of hadd's compression options.
To simplify the situation, this PR implements these simpler rules:

the output RNTuple and its container file always have the same compression
if the user explicitly passes a compression settings (-f[0-9]), use that compression
if the user doesn't pass any compression flag, use 505 as the output compression
if the user passes -ff or -fk, open the first source file, grab the RNTuple inside it, peek the first column range we can find and use its compression settings as the output compression. This is different from the current behavior of using the same compression as the first source file, as that behavior is probably not what the user wants.

This requires passing more information from hadd to the RNTupleMerger. I added a couple of TString merge options to do so, which won't impact the existing merging code as they only get interpreted by the RNTupleMerger.

Checklist:

tested changes locally
updated the docs (if necessary)

jblomer

In principle looks good but we should add tests.

jblomer · 2024-11-14T16:48:04Z

tree/ntuple/v7/src/RNTupleMerger.cxx

-      mergeInfo->fOptions.Contains("fast") ? kUnknownCompressionSettings : outFile->GetCompressionSettings();
-
-   RNTupleWriteOptions writeOpts;
-   writeOpts.SetUseBufferedWrite(false);


This option got lost, didn't it?

Yes, a victim of the rebase 😅

github-actions · 2024-11-14T19:47:30Z

Test Results

18 files 18 suites 4d 9h 48m 48s ⏱️
2 680 tests 2 677 ✅ 0 💤 3 ❌
46 378 runs 46 358 ✅ 0 💤 20 ❌

For more details on these failures, see this check.

Results for commit 3aaf05a.

♻️ This comment has been updated with latest results.

silverweed · 2024-11-15T08:05:12Z

@jblomer added a couple of tests for the new rules: root-project/roottest#1220

hahnjo · 2024-11-15T08:34:17Z

main/src/hadd.cxx

The commit message seems to have two title lines, is this intentional?

It's a squash of two commits; I could remove one of the lines as it's not really that relevant

hahnjo · 2024-11-15T08:35:09Z

tree/ntuple/v7/src/RNTupleMerger.cxx

+      // user passed no compression-related options: use default
+      compression = RCompressionSetting::EDefaults::kUseGeneralPurpose;
+      Info("RNTuple::Merge", "Using the default compression: %d", compression);


(cross-posting from #16944 (comment))
Can you remind me what is the default if I just do hadd out.root in1.root in2.root? From a users perspective, I would not expect this to change the compression / recompress, but the code seems to suggest that I have to pass -ff or -fk to get "fast" merging?

hahnjo · 2024-11-15T08:37:04Z

tree/ntuple/v7/src/RNTupleMerger.cxx

+   // Always write the RNTuple and the file with the same compression.
+   outFile->SetCompressionSettings(compression);


Should we "always" do this, or only when there is exactly one RNTuple? What about a file that has one RNTuple (505) and one histogram (101)?

I think I agree. At this point, we simply diverged and the default TFile compression is different from the default RNTuple compression. We can discuss if/how we want to address it but I don't think we need extra code in the merger. We have the same situation (different compression algorithms) if you write a new RNTuple.

hahnjo · 2024-11-15T08:38:22Z

tree/ntuple/v7/src/RNTupleMerger.cxx

   RNTupleWriteOptions writeOpts;
   assert(compression != kUnknownCompressionSettings);
+   writeOpts.SetUseBufferedWrite(false);


This should go to the previous commit... On the other hand it probably doesn't do anything since we are not using RPageSinkBuf in the first place, but construct a RPageSinkFile manually, so might as well just drop it (in a separate commit)

This requires passing more information from hadd to the RNTupleMerger. I added a couple of TString merge options to do so, which won't impact the existing merging code as they only get interpreted by the RNTupleMerger.

…uple

the option got lost in a rebase

If a compression settings different from the one used by the sink is given to RNTupleMerger::Merge, the resulting RNTuple would currently be wrong as the merger cannot handle this situation correctly right now. Therefore for now we refuse to do the merging if the compression passed via the merge opts differs from the one used by the sink.

silverweed added the in:RNTuple label Nov 14, 2024

silverweed requested review from hahnjo, vepadulano and enirolf November 14, 2024 14:51

silverweed self-assigned this Nov 14, 2024

silverweed requested review from jblomer and pcanal as code owners November 14, 2024 14:51

silverweed mentioned this pull request Nov 14, 2024

[ntuple] Merger: fix merging RNTuples with projected fields and handling of the output file compression #16944

Open

jblomer reviewed Nov 14, 2024

View reviewed changes

silverweed mentioned this pull request Nov 15, 2024

[hadd] add tests for RNTupleMerger compression root-project/roottest#1220

Open

silverweed requested a review from dpiparo November 15, 2024 08:09

hahnjo reviewed Nov 15, 2024

View reviewed changes

silverweed added 2 commits November 15, 2024 09:53

[ntuple] Merger: fix handling of compression-related options

67a460d

This requires passing more information from hadd to the RNTupleMerger. I added a couple of TString merge options to do so, which won't impact the existing merging code as they only get interpreted by the RNTupleMerger.

[ntuple] merger: give the output file the same compression as the RNT…

ee1d289

…uple

silverweed force-pushed the ntuple_merge_compression_fix branch 2 times, most recently from c46dd28 to 2bddca6 Compare November 15, 2024 08:59

silverweed added 4 commits November 15, 2024 09:59

[ntuple] merger: re-add SetBufferedWrite(false)

2bddca6

the option got lost in a rebase

[ntuple] change VerifyPageCompression to use R__getCompressionAlgorithm

c5404a0

[ntuple] add merger test with mixed column compressions

3aaf05a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ntuple] Merger: fix handling of compression-related options #16949

[ntuple] Merger: fix handling of compression-related options #16949

silverweed commented Nov 14, 2024 •

edited

Loading

jblomer left a comment

jblomer Nov 14, 2024

silverweed Nov 15, 2024

github-actions bot commented Nov 14, 2024 •

edited

Loading

silverweed commented Nov 15, 2024

hahnjo Nov 15, 2024

silverweed Nov 15, 2024

hahnjo Nov 15, 2024

hahnjo Nov 15, 2024

jblomer Nov 15, 2024

hahnjo Nov 15, 2024

		// Always write the RNTuple and the file with the same compression.
		outFile->SetCompressionSettings(compression);

[ntuple] Merger: fix handling of compression-related options #16949

Are you sure you want to change the base?

[ntuple] Merger: fix handling of compression-related options #16949

Conversation

silverweed commented Nov 14, 2024 • edited Loading

Checklist:

jblomer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 14, 2024 • edited Loading

Test Results

silverweed commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

silverweed commented Nov 14, 2024 •

edited

Loading

github-actions bot commented Nov 14, 2024 •

edited

Loading