-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] Merger: fix handling of compression-related options #16949
base: master
Are you sure you want to change the base?
[ntuple] Merger: fix handling of compression-related options #16949
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle looks good but we should add tests.
mergeInfo->fOptions.Contains("fast") ? kUnknownCompressionSettings : outFile->GetCompressionSettings(); | ||
|
||
RNTupleWriteOptions writeOpts; | ||
writeOpts.SetUseBufferedWrite(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option got lost, didn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a victim of the rebase 😅
Test Results 18 files 18 suites 4d 9h 48m 48s ⏱️ For more details on these failures, see this check. Results for commit 3aaf05a. ♻️ This comment has been updated with latest results. |
@jblomer added a couple of tests for the new rules: root-project/roottest#1220 |
main/src/hadd.cxx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message seems to have two title lines, is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a squash of two commits; I could remove one of the lines as it's not really that relevant
// user passed no compression-related options: use default | ||
compression = RCompressionSetting::EDefaults::kUseGeneralPurpose; | ||
Info("RNTuple::Merge", "Using the default compression: %d", compression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(cross-posting from #16944 (comment))
Can you remind me what is the default if I just do hadd out.root in1.root in2.root
? From a users perspective, I would not expect this to change the compression / recompress, but the code seems to suggest that I have to pass -ff
or -fk
to get "fast" merging?
tree/ntuple/v7/src/RNTupleMerger.cxx
Outdated
// Always write the RNTuple and the file with the same compression. | ||
outFile->SetCompressionSettings(compression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we "always" do this, or only when there is exactly one RNTuple? What about a file that has one RNTuple (505) and one histogram (101)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree. At this point, we simply diverged and the default TFile compression is different from the default RNTuple compression. We can discuss if/how we want to address it but I don't think we need extra code in the merger. We have the same situation (different compression algorithms) if you write a new RNTuple.
RNTupleWriteOptions writeOpts; | ||
assert(compression != kUnknownCompressionSettings); | ||
writeOpts.SetUseBufferedWrite(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to the previous commit... On the other hand it probably doesn't do anything since we are not using RPageSinkBuf
in the first place, but construct a RPageSinkFile
manually, so might as well just drop it (in a separate commit)
This requires passing more information from hadd to the RNTupleMerger. I added a couple of TString merge options to do so, which won't impact the existing merging code as they only get interpreted by the RNTupleMerger.
c46dd28
to
2bddca6
Compare
the option got lost in a rebase
If a compression settings different from the one used by the sink is given to RNTupleMerger::Merge, the resulting RNTuple would currently be wrong as the merger cannot handle this situation correctly right now. Therefore for now we refuse to do the merging if the compression passed via the merge opts differs from the one used by the sink.
Currently the RNTupleMerger has a pretty confusing and unintuitive handling of
hadd
's compression options.To simplify the situation, this PR implements these simpler rules:
-f[0-9]
), use that compression-ff
or-fk
, open the first source file, grab the RNTuple inside it, peek the first column range we can find and use its compression settings as the output compression. This is different from the current behavior of using the same compression as the first source file, as that behavior is probably not what the user wants.This requires passing more information from hadd to the RNTupleMerger. I added a couple of TString merge options to do so, which won't impact the existing merging code as they only get interpreted by the RNTupleMerger.
Checklist: