-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Infrastructure for writing of RNTuple (incomplete functionality) #705
Conversation
The bytes blob inside a RNTuple is pre-pended by not TKey but In [34]: t = ROOT.TFile("/home/akako/Documents/github/scikit-hep-testdata/src/skhep_testdata/data/test_ntuple_int_10.root")
In [35]: t.Map()
20220628/101853 At:100 N=112 TFile
20220628/101853 At:212 N=405 StreamerInfo CX = 2.73
20220628/101853 At:617 N=120 ROOT::Experimental::RNTuple
20220628/101853 At:737 N=95 KeysList
20220628/101853 At:832 N=167 RBlob CX = 1.16
20220628/101853 At:999 N=74 RBlob
20220628/101853 At:1073 N=94 RBlob
20220628/101853 At:1167 N=115 RBlob CX = 1.20
20220628/101853 At:1282 N=39 FreeSegments
20220628/101853 At:1321 N=1 END
In [49]: rn = up.open("/home/akako/Documents/github/scikit-hep-testdata/src/skhep_testdata/data/test_ntuple_int_10.root")["ntuple"]
In [50]: rn._members
Out[50]:
{'fCheckSum': 1700499286,
'fVersion': 0,
'fSize': 48,
'fSeekHeader': 866,
'fNBytesHeader': 133,
'fLenHeader': 159,
'fSeekFooter': 1201,
'fNBytesFooter': 81,
'fLenFooter': 104,
'fReserved': 0} First
|
In [20]: akform = ak._v2.from_iter([{"one": 1, "two": 2.0}]).layout.form
In [21]: with up.recreate("/tmp/test.root") as file:
...: file.mktree("Events", {"one":"int"})
...: file.mkntuple("ntuple", akform)
In [23]: up.open("/tmp/test.root")["ntuple"]._members
Out[23]:
{'fCheckSum': 1700499286,
'fVersion': 0,
'fSize': 48,
'fSeekHeader': 866,
'fNBytesHeader': 133,
'fLenHeader': 159,
'fSeekFooter': 1201,
'fNBytesFooter': 81,
'fLenFooter': 104,
'fReserved': 0}
In [22]: rn = up.open("/home/akako/Documents/github/scikit-hep-testdata/src/skhep_testdata/data/test_ntuple_int_10.root")["ntuple"]
In [24]: rn._members
Out[24]:
{'fCheckSum': 1700499286,
'fVersion': 0,
'fSize': 48,
'fSeekHeader': 866,
'fNBytesHeader': 133,
'fLenHeader': 159,
'fSeekFooter': 1201,
'fNBytesFooter': 81,
'fLenFooter': 104,
'fReserved': 0} |
for more information, see https://pre-commit.ci
in a different file, the
In [48]: rn.page_list_envelopes.pagelinklist[0][0].chunk
Out[48]: <Chunk 0-1104>
In [49]: len(rn.page_list_envelopes.pagelinklist[0])
Out[49]: 30 |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
In [227]: with up.recreate("/tmp/test.root") as file:
...: file.mkntuple("ntuple", akform)
In [228]: up.open("/tmp/test.root")["ntuple"].header
Out[228]: MetaData('HeaderReader', env_header={'env_version': 1, 'min_version': 1}, feature_flag=0, rc_tag=1, name='ntuple', ntuple_description='', writer_identifier='uproot 5.0.0rc2', field_records=[MetaData('FieldRecordFrame', field_version=0, type_version=0, parent_field_id=0, struct_role=0, flags=0, field_name='one_integers', type_name='std::int32_t', type_alias='', field_desc='')], column_records=[MetaData('ColumnRecordFrame', type=11, nbits=32, field_id=0, flags=0)], alias_columns=[], extra_type_infos=[], crc32=3689245460) |
for more information, see https://pre-commit.ci
In [444]: with up.recreate("/tmp/test.root") as file:
...: akform = ak._v2.forms.RecordForm([ak._v2.forms.NumpyForm('float64'), ak._v2.forms.NumpyForm('int32'), ak._v2.forms.Numpy
...: Form('bool')], ['one', 'two', 'three'])
...: file.mkntuple("ntuple", akform)
up
In [445]: up.open("/tmp/test.root")["ntuple"].header.field_records
Out[445]:
[MetaData('FieldRecordFrame', field_version=0, type_version=0, parent_field_id=0, struct_role=0, flags=0, field_name='one', type_name='double', type_alias='', field_desc=''),
MetaData('FieldRecordFrame', field_version=0, type_version=0, parent_field_id=1, struct_role=0, flags=0, field_name='two', type_name='std::int32_t', type_alias='', field_desc=''),
MetaData('FieldRecordFrame', field_version=0, type_version=0, parent_field_id=2, struct_role=0, flags=0, field_name='three', type_name='bit', type_alias='', field_desc='')]
In [446]: up.open("/tmp/test.root")["ntuple"].header.column_records
Out[446]:
[MetaData('ColumnRecordFrame', type=7, nbits=64, field_id=0, flags=0),
MetaData('ColumnRecordFrame', type=11, nbits=32, field_id=1, flags=0),
MetaData('ColumnRecordFrame', type=6, nbits=1, field_id=2, flags=0)] |
In [149]: array
Out[149]: <Array [{one_integers: 9}, {...}, ..., {...}] type='10 * {one_integers: int32}'>
In [150]: with up.recreate("/tmp/test.root") as file:
...: akform = ak._v2.forms.RecordForm([ak._v2.forms.NumpyForm('int32')], ['one_integers'])
...: file.mkntuple("ntuple", akform)
...: a = file["ntuple"]
...: a.extend(array)
...: file.close()
...: rn = up.open("/tmp/test.root")["ntuple"]
...: assert ak.all(rn.arrays()["one_integers"] == np.array([9,8,7,6,5,4,3,2,1,0])) comment:now we can do round trip with ourselves but ROOT doesn't like it, there are two visible problems:
for 1, I tried to look at the source code, and I found that https://github.com/root-project/root/blob/ef62da7335eecdea3df98269f2d0cccbda600b98/tree/ntuple/v7/inc/ROOT/RNTuple.hxx#L196, where the fSource is a pointer to a RPageSource , and that eventually calls https://github.com/root-project/root/blob/ef62da7335eecdea3df98269f2d0cccbda600b98/tree/ntuple/v7/src/RPageStorage.cxx#L94 the problem: I've never seen "descriptor" and I'm not sure what they are referring to / if it's part of the old world or not for 2, I suspect it has something to do with 1, if ROOT doesn't detect our rntuple has X number of events, we probably have some metadata regarding clusters incorrect, because I would expect the total entry numbers is just a sum of cluster_summaries in the footer. |
for more information, see https://pre-commit.ci
@Moelf, I'm marking this PR as "inactive." When the RNTuple-writing feature is eventually implemented, it might need to be overlaid on a more recent version of |
looks like we can still just update? I will try to make test pass somewhat, but since it's not usable I won't try to advocate for a merge |
Got it, thanks! We might be doing a lot of bug-fixes in the coming weeks when a wave of non-early-adopter users get Uproot 5. It shouldn't impact the RNTuple submodule, but |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of merging this PR as soon as the tests pass.
- It includes important code for reading
std::array<T>
. We want to get all of the reading updates in. - We don't want the code for writing to get out of date with
main
, even if it is incomplete, so that it will be easier to finish later. - None of this should break things for regular Uproot users, since they'd have to go out of their way to even try to use this. At the very least, they'd have to get access to a file with an RNTuple in it, and those are well-advertised as being experimental.
To add to the above, it only touches a few files used by the rest of Uproot:
None of this should cause any problems for current users of Uproot. |
seems weird
only happens on >3.7 |
dask-awkward only runs in Python 3.7+. However, it shouldn't fail. I'm going to run the tests in We'll see if it's broken overall or only in this branch. |
looks like manually triggered CI also failed on |
No description provided.