Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

[WIP] Drop ROOT files in tests/samples in favor of scikit-hep-testdata #237

Closed
wants to merge 10 commits into from

Conversation

jpivarski
Copy link
Member

Most files should come from a version dependence on scikit-hep-testdata, but some, like Event.root, has to come from an HTTP request. (tests_require includes requests, so uproot.open("http://...") will work.)

@jpivarski jpivarski changed the title Drop ROOT files in tests/samples in favor of scikit-hep-testdata [WIP] Drop ROOT files in tests/samples in favor of scikit-hep-testdata Feb 27, 2019
@jpivarski
Copy link
Member Author

We're going to need

  • issue187.root
  • issue213.root
  • issue232.root

and it would be really nice to find a more consistent solution to Event.root, but I don't think it can be put in any PyPI repository.

@jpivarski
Copy link
Member Author

I think this test failure is due to scikit-hep-testdata. Posting as issue scikit-hep/scikit-hep-testdata#14.

@jpivarski
Copy link
Member Author

Many bugs have been fixed, but ultimately, this will have to wait for scikit-hep/scikit-hep-testdata#14.

jpivarski added a commit that referenced this pull request Feb 28, 2019
…for-character identical, I hope that the PR can be merged automatically.
jpivarski added a commit that referenced this pull request Feb 28, 2019
Just take the first commit of #237 because the rest won't work until scikit-hep-testdata is fixed
@jpivarski
Copy link
Member Author

This is happening in uproot4.

@jpivarski jpivarski closed this Jun 4, 2020
@tamasgal
Copy link
Contributor

tamasgal commented Jun 4, 2020

Ah nice, I didn’t know about this test-data repo. I should use it for UnROOT.jl

@jpivarski
Copy link
Member Author

That way, we don't have to all have the same test files clogging up our git repos.

I also found uses for it in writing tutorials. When I needed an example of complex data, there was uproot-issue399.root with a jagged array of histograms!

Also very useful in developing a ROOT-file reader: user-submitted issues are the only examples of files I have with TBaskets embedded inside the TTree. It's a situation I haven't been able to cause—I don't know how the files ended up this way. But the following "issues files" have good examples:

  • uproot-issue21.root has flat arrays (e.g. nllscan/mH)
  • uproot-issue327.root has jagged arrays (e.g. DstTree/fTracks.fCharge)
  • uproot-issue232.root and uproot-issue187.root have jagged arrays (e.g. fTreeV0/V0s.fV0pt and MCparticles.nbodies)
  • uproot-from-geant4.root has jagged arrays (e.g. Details/numgood and TrackedRays/Event and phi)

The way jaggedness (offsets, the fNevBuffer) is encoded is different in embedded baskets than free baskets, so it's important to have these tests. As for branches, I tend to go for the integer ones because small (< 256) integers are easy to recognize in a byte stream. Also, the files made by Geant4 are all a little weird—they have obscure class versions because the Geant4 file-writer is an independent implementation, not ROOT.

@jpivarski jpivarski deleted the use-scikit-hep-testfiles branch June 4, 2020 18:56
@tamasgal
Copy link
Contributor

tamasgal commented Jun 4, 2020

Yes I see, thanks for the explanations! It's indeed extremely useful.

The jaggedness is still something I struggle with, so it’s very helpful to have these examples consolidated.

@jpivarski
Copy link
Member Author

This may be helpful:

https://github.com/scikit-hep/uproot4/blob/9f2c50466fb781f6f0a1fff9ed1b1a30e360c5c6/uproot4/models/TBasket.py#L30-L103

In Uproot 1‒3, basket handling was distributed in many places because I was still figuring it out. This time, I managed to consolidate it. Strictly speaking, this is both the TBasket and its TKey, but it makes sense to handle this TKey in a special way, mixed with the TBasket handling.

Even though I could consolidate it, the embedded/non-embedded cases have to be partially split in a big if/else clause. After the header, they're just different.

Once you've identified what I call the data (array of uint8) and the byte_offsets (array of int32, specifying byte positions), you're ready to move on. These are the only two things you need to know from a TBasket. Even if you're struggling with interpreting bytes within an entry, knowing absolutely where the entry starts is a big advantage.

@tamasgal
Copy link
Contributor

tamasgal commented Jun 4, 2020

I got it thanks, that looks quite consise! Yes, so currently the data and the byte_offsets is something which I already got working for many cases (not the TTree Baskets but I put that on low priority for now). The interpretation is really what's missing but for that I need to go through the streamer interpretation process first. It's still quite foggy how to see if something is jagged or not, but I got a few clues, like the classname (when it includes std::vector) or a specific kConst value etc.
I'll see how far I get next week. So far I am happy that I can do my physics analysis for my PhD with UnROOT.jl already, so I try to focus on some results first, I have to finish until October ;)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants