Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] update write option defaults #16877

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tree/ntuple/v7/doc/BinaryFormatSpecification.md
Original file line number Diff line number Diff line change
Expand Up @@ -1049,8 +1049,8 @@ This section summarizes default settings of `RNTupleWriteOptions`.

| Default | Value |
|----------------------------------------|------------------------------|
| Approximate Zipped Cluster | 100 MB |
| Max Unzipped Cluster | 1 GiB |
| Approximate Zipped Cluster | 128 MiB |
| Max Unzipped Cluster | 1280 MiB |
| Max Unzipped Page | 1 MiB |

## Glossary
Expand Down
4 changes: 2 additions & 2 deletions tree/ntuple/v7/doc/tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ A cluster contains all the data of a given event range.
As clusters are usually compressed and tied to event boundaries, an exact size cannot be enforced.
Instead, RNTuple uses a *target size* for the compressed data as a guideline for when to flush a cluster.

The default cluster target size is 100 MB of compressed data.
The default cluster target size is 128 MiB of compressed data.
The default can be changed by the `RNTupleWriteOptions`.
The default should work well in the majority of cases.
In general, larger clusters provide room for more and larger pages and should improve compression ratio and speed.
However, clusters also need to be buffered during write and (partially) during read,
so larger clusters increase the memory footprint.

A second option in `RNTupleWriteOptions` specifies the maximum uncompressed cluster size.
The default is 1 GiB.
The default is 10x the default cluster target size, i.e. ~1.2 GiB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
The default is 10x the default cluster target size, i.e. ~1.2 GiB.
The default is 8x the default cluster target size, i.e. 1.0 GiB.

?
Or why the factor 10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it was before a factor ~10 (100 MB --> 1 GiB)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the reason for choosing 10 something else than 'get to a round number'? If it is just to get to a round number we should switch to 8, if it is something else we should explain it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the largest compression factor we have seen in an experiment EDM so far was MiniAOD with a compression factor >9.

This setting acts as an "emergency break" and should prevent very compressible clusters from growing too large.

Given the two settings, writing works as follows:
Expand Down
4 changes: 2 additions & 2 deletions tree/ntuple/v7/inc/ROOT/RNTupleWriteOptions.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ public:
protected:
int fCompression{RCompressionSetting::EDefaults::kUseGeneralPurpose};
/// Approximation of the target compressed cluster size
std::size_t fApproxZippedClusterSize = 100 * 1000 * 1000;
std::size_t fApproxZippedClusterSize = 128 * 1024 * 1024;
/// Memory limit for committing a cluster: with very high compression ratio, we need a limit
/// on how large the I/O buffer can grow during writing.
std::size_t fMaxUnzippedClusterSize = 1024 * 1024 * 1024;
std::size_t fMaxUnzippedClusterSize = 10 * fApproxZippedClusterSize;
/// Initially, columns start with a page large enough to hold the given number of elements. The initial
/// page size is the given number of elements multiplied by the column's element size.
/// If more elements are needed, pages are increased up until the byte limit given by fMaxUnzippedPageSize
Expand Down
Loading