[ntuple] Improve RPagePool #16859

jblomer · 2024-11-07T17:03:43Z

Improve the lookup complexity for pages in the page pool from linear to constant in well-behaved cases, i.e. if there is a small number of pages per column and cluster. Some smaller cleanups around the RPage/RPagePool logic.

github-actions · 2024-11-07T22:59:28Z

Test Results

18 files 18 suites 4d 4h 10m 19s ⏱️
2 678 tests 2 678 ✅ 0 💤 0 ❌
46 342 runs 46 342 ✅ 0 💤 0 ❌

Results for commit 15b33c0.

♻️ This comment has been updated with latest results.

hahnjo

This is great! Some stylistic comments inline.

It would be interesting to run the limits test, especially Limits_ManyFields, Limits_ManyPages, and Limits_ManyPagesOneEntry. It's possible that this PR addresses the quadratic complexity seen in there.

tree/ntuple/v7/src/RPagePool.cxx

tree/ntuple/v7/inc/ROOT/RPage.hxx

tree/ntuple/v7/src/RPagePool.cxx

Instead of mapping all synthezised zero pages to the same memory buffer, use real allocated and zeroed out pages. That makes sure no special logic is required when adding and removing pages to and from the page pool.

Allows for O(1) page lookup when a page is returned to the page pool, instead of the O(n) linear search.

Use a hash map to filter the pages in the page pool by column ID and on-disk type on access.

Co-authored-by: Jonas Hahnfeld <hahnjo@hahnjo.de>

jblomer · 2024-11-14T22:27:35Z

No changes to the "many pages" test. The "many fields" unit test got significantly faster. The overall complexity is still super-linear but much more benign.

hahnjo · 2024-11-15T08:01:43Z

No changes to the "many pages" test. The "many fields" unit test got significantly faster. The overall complexity is still super-linear but much more benign.

Ah right, now I remember that I had already profiled this before: The "many pages" tests are actually bound by RPageRange::Find, which has a TODO to use binary search. The case we are speeding up here is many pages distributed over many fields, in which case performance was bound by the page pool.

Edit: Hm, the "many pages" tests will also hit the linear loop over the page set in RPagePool::GetPage... For a future PR though.

hahnjo

LGTM, thanks!

hahnjo · 2024-11-15T08:15:52Z

tree/ntuple/v7/test/ntuple_limits.cxx

   FileRaii fileGuard("test_ntuple_limits_manyFields.root");

-   static constexpr int NumFields = 40'000;
+   static constexpr int NumFields = 100'000;


This is fantastic that we can now process models with 100k fields in reasonable time!

jblomer added the in:RNTuple label Nov 7, 2024

jblomer requested review from hahnjo, pcanal, silverweed and enirolf November 7, 2024 17:03

jblomer self-assigned this Nov 7, 2024

jblomer force-pushed the ntuple-fix-page-pool branch from 0350870 to ce1da4f Compare November 7, 2024 17:07

hahnjo reviewed Nov 8, 2024

View reviewed changes

tree/ntuple/v7/src/RPagePool.cxx Outdated Show resolved Hide resolved

tree/ntuple/v7/src/RPagePool.cxx Outdated Show resolved Hide resolved

tree/ntuple/v7/src/RPagePool.cxx Outdated Show resolved Hide resolved

silverweed reviewed Nov 8, 2024

View reviewed changes

tree/ntuple/v7/inc/ROOT/RPage.hxx Show resolved Hide resolved

tree/ntuple/v7/src/RPagePool.cxx Show resolved Hide resolved

jblomer and others added 13 commits November 14, 2024 22:55

[ntuple] move RPagePool's refcounter to page info

9eeeeb1

[ntuple] add RPagePool::RKey

85859df

[ntuple] remove RPage::fColumnId

af5ccec

[ntuple] remove RPage::MakePageZero()

6f1fa5c

Instead of mapping all synthezised zero pages to the same memory buffer, use real allocated and zeroed out pages. That makes sure no special logic is required when adding and removing pages to and from the page pool.

[ntuple] add RPagePool::fLookupByBuffer

8e4ad89

Allows for O(1) page lookup when a page is returned to the page pool, instead of the O(n) linear search.

[ntuple] merge RPagePool::fPages and RPagePool::fPageInfos

c2a46e0

[ntuple] add RPagePool::fLookupByKey

2db1cd5

Use a hash map to filter the pages in the page pool by column ID and on-disk type on access.

[ntuple] minor code simplification

55fc1ca

Co-authored-by: Jonas Hahnfeld <hahnjo@hahnjo.de>

[ntuple] make RPagePool::REntry::fRefCounter 64bit

605e6db

[ntuple] consolidate more logic in RPagePool::AddPage()

6057f54

[ntuple] minor performance improvement in RPagePool::ReleasePage()

a3a2322

[ntuple] add assert()s in RPage(Pool) logic

1635fcd

[ntuple] update limits test

15b33c0

jblomer force-pushed the ntuple-fix-page-pool branch from 91e1921 to 15b33c0 Compare November 14, 2024 22:26

jblomer requested review from hahnjo and silverweed November 14, 2024 22:28

hahnjo approved these changes Nov 15, 2024

View reviewed changes

jblomer merged commit ca0d725 into root-project:master Nov 15, 2024
21 checks passed

jblomer deleted the ntuple-fix-page-pool branch November 15, 2024 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ntuple] Improve RPagePool #16859

[ntuple] Improve RPagePool #16859

jblomer commented Nov 7, 2024 •

edited

Loading

github-actions bot commented Nov 7, 2024 •

edited

Loading

hahnjo left a comment

jblomer commented Nov 14, 2024

hahnjo commented Nov 15, 2024 •

edited

Loading

hahnjo left a comment

hahnjo Nov 15, 2024

[ntuple] Improve RPagePool #16859

[ntuple] Improve RPagePool #16859

Conversation

jblomer commented Nov 7, 2024 • edited Loading

github-actions bot commented Nov 7, 2024 • edited Loading

Test Results

hahnjo left a comment

Choose a reason for hiding this comment

jblomer commented Nov 14, 2024

hahnjo commented Nov 15, 2024 • edited Loading

hahnjo left a comment

Choose a reason for hiding this comment

hahnjo Nov 15, 2024

Choose a reason for hiding this comment

jblomer commented Nov 7, 2024 •

edited

Loading

github-actions bot commented Nov 7, 2024 •

edited

Loading

hahnjo commented Nov 15, 2024 •

edited

Loading