Skip to content

Commit

Permalink
promote LazyBranch as go-to accessing pattern
Browse files Browse the repository at this point in the history
  • Loading branch information
Moelf committed Jul 8, 2021
1 parent 0338002 commit d5a2c0f
Show file tree
Hide file tree
Showing 8 changed files with 151 additions and 81 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "UnROOT"
uuid = "3cd96dde-e98d-4713-81e9-a4a1b0235ce9"
authors = ["Tamas Gal", "Jerry Ling"]
version = "0.2.0"
version = "0.2.1"

[deps]
CodecLz4 = "5ba52731-8f18-5e0d-9241-30f10d1ec561"
Expand Down
59 changes: 29 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,52 +27,50 @@ documentation on uproot's issue page.
## Status
We support reading all scalar branch and jagged branch of "basic" types, provide
indexing interface (thus iteration too) with basket-cache. As
a metric, UnROOT can read all branches of CMS NanoAOD:
a metric, UnROOT can read all branches of CMS NanoAOD.

The most easy way to access data is through `LazyBranch` which will be constructed
when you index a `ROOTFile` with `"treename/branchname"`. It acts just like an array --
you can index it, iterate through it, `map` over it etc:

``` julia
using UnROOT

julia> t = ROOTFile("test/samples/NanoAODv5_sample.root")
ROOTFile("test/samples/NanoAODv5_sample.root") with 2 entries and 21 streamers.

julia< b = rf["Events/Electron_dxy"];
julia> t = ROOTFile("test/samples/NanoAODv5_sample.root");

julia> BA = LazyBranch(rf, b);
julia> LB = t["Events/Electron_dxy"]
LazyBranch{Vector{Float32}, UnROOT.Nooffsetjagg}:
File: ./test/samples/NanoAODv5_sample.root
Branch: Electron_dxy
Description: dxy (with sign) wrt first PV, in cm
NumEntry: 1000
Entry Type: Vector{Float32}

# you can access a branch by index, this is fairly fast, memory footprint ~ single basket
# while `t["tree"]["branch"]` will give you the branch object itself
julia> LB = t["Events/Electron_dxy"]

julia> for i = 5:8
@show BA[i]
@show LB[i]
end
ab[i] = Float32[]
ab[i] = Float32[-0.0012559891]
ab[i] = Float32[0.06121826, 0.00064229965]
ab[i] = Float32[0.005870819, 0.00054883957, -0.00617218]

# or a range
julia> BA[5:8]
julia> LB[5:8]
4-element Vector{Vector{Float32}}:
[]
[-0.0012559891]
[0.06121826, 0.00064229965]
[0.005870819, 0.00054883957, -0.00617218]

# or dump a branch
julia> array(t, "Events/HLT_Mu3_PFJet40")
1000-element BitVector:
0
1
0
0
0
...

# a jagged branch
julia> array(t, "Events/Electron_dxy")
julia> collect(LB)
1000-element Vector{Vector{Float32}}:
[0.00037050247]
[-0.009819031]
[]
[-0.0015697479]
...

# reading branch is also thread-safe, although may not be much faster depending to disk I/O and cache
Expand All @@ -81,17 +79,15 @@ julia> using ThreadsX
julia> branch_names = keys(t["Events"])

julia> all(
map(bn->array(rf, "Events/$bn"; raw=true), branch_names) .==
ThreadsX.map(bn->array(rf, "Events/$bn"; raw=true), branch_names)
map(bn->UnROOT.array(rf, "Events/$bn"; raw=true), branch_names) .==
ThreadsX.map(bn->UnROOT.array(rf, "Events/$bn"; raw=true), branch_names)
)
true
```

If you have custom C++ struct inside you branch, reading raw data is also possible.
The raw data consists of two vectors: the bytes
and the offsets and are available using the
`UnROOT.array(f::ROOTFile, path; raw=true)` method. This data can
be reinterpreted using a custom type with the method
If you have custom C++ struct inside you branch, reading raw data is also possible
using the `UnROOT.array(f::ROOTFile, path; raw=true)` method. The output can
be then reinterpreted using a custom type with the method
`UnROOT.splitup(data, offsets, T::Type; skipbytes=0)`.

You can then define suitable Julia `type` and `readtype` method for parsing these data.
Expand All @@ -114,7 +110,9 @@ julia> UnROOT.splitup(data, offsets, UnROOT.KM3NETDAQHit)
[UnROOT.KM3NETDAQHit(1073742790, 0x00, 9, 0x60)......
```
<details><summary>This is what happens behind the scenes with some additional debug output: </summary>
#3 Behind the scene
<details><summary>Some additional debug output: </summary>
<p>
Expand Down Expand Up @@ -214,10 +212,11 @@ Pick one ;)
- [x] Parsing the file header
- [x] Read the `TKey`s of the top level dictionary
- [x] Reading the available trees
- [ ] Reading the available streamers
- [x] Reading the available streamers
- [x] Reading a simple dataset with primitive streamers
- [x] Reading of raw basket bytes for debugging
- [ ] Automatically generate streamer logic
- [ ] Clean up `Cursor` use
- [x] Reading `TNtuple` #27
## Acknowledgements
Expand Down
18 changes: 10 additions & 8 deletions src/UnROOT.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module UnROOT

export ROOTFile, array, LazyBranch
export ROOTFile, LazyBranch

import Base: keys, get, getindex, show, length, iterate, position, ntoh, lock, unlock
using Base.Threads: SpinLock
Expand All @@ -12,6 +12,14 @@ using Mixers
using Parameters
using StaticArrays

@static if VERSION < v"1.1"
fieldtypes(T::Type) = [fieldtype(T, f) for f in fieldnames(T)]
end

@static if VERSION < v"1.2"
hasproperty(x, s::Symbol) = s in fieldnames(typeof(x))
end

include("constants.jl")
include("io.jl")
include("types.jl")
Expand All @@ -22,13 +30,7 @@ include("root.jl")
include("arrayapi.jl")
# include("itr.jl")
include("custom.jl")
include("precompile.jl")

@static if VERSION < v"1.1"
fieldtypes(T::Type) = [fieldtype(T, f) for f in fieldnames(T)]
end

@static if VERSION < v"1.2"
hasproperty(x, s::Symbol) = s in fieldnames(typeof(x))
end

end # module
16 changes: 13 additions & 3 deletions src/arrayapi.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ end
Reads an array from a branch. Set `raw=true` to return raw data and correct offsets.
"""
array(f::ROOTFile, path::AbstractString; raw=false) = array(f::ROOTFile, f[path]; raw=raw)
array(f::ROOTFile, path::AbstractString; raw=false) = array(f::ROOTFile, _getindex(f, path); raw=raw)

function array(f::ROOTFile, branch; raw=false)
ismissing(branch) && error("No branch found at $path")
Expand Down Expand Up @@ -79,7 +79,7 @@ julia> ab[begin:end]
...
```
"""
mutable struct LazyBranch{T, J}
mutable struct LazyBranch{T, J} <: AbstractVector{T}
f::ROOTFile
b::Union{TBranch, TBranchElement}
L::Int64
Expand All @@ -95,10 +95,20 @@ mutable struct LazyBranch{T, J}
new{T, J}(f, b, length(b), b.fBasketEntry, -1, T[])
end
end
Base.size(ba::LazyBranch) = (ba.L,)
Base.length(ba::LazyBranch) = ba.L
Base.firstindex(ba::LazyBranch) = 1
Base.lastindex(ba::LazyBranch) = ba.L
Base.length(ba::LazyBranch) = ba.L
Base.eltype(ba::LazyBranch{T,J}) where {T,J} = T
function Base.show(io::IO, ba::LazyBranch)
summary(io, ba)
println(":")
println(" File: $(ba.f.filename)")
println(" Branch: $(ba.b.fName)")
println(" Description: $(ba.b.fTitle)")
println(" NumEntry: $(ba.L)")
print(" Entry Type: $(eltype(ba))")
end

function Base.getindex(ba::LazyBranch{T, J}, idx::Integer) where {T, J}
# I hate 1-based indexing
Expand Down
42 changes: 42 additions & 0 deletions src/precompile.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Use
# @warnpcfail precompile(args...)
# if you want to be warned when a precompile directive fails
macro warnpcfail(ex::Expr)
modl = __module__
file = __source__.file === nothing ? "?" : String(__source__.file)
line = __source__.line
quote
$(esc(ex)) || @warn """precompile directive
$($(Expr(:quote, ex)))
failed. Please report an issue in $($modl) (after checking for duplicates) or remove this directive.""" _file=$file _line=$line
end
end


Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TLeafB}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TBranch}})
Base.precompile(Tuple{typeof(decompress_datastreambytes),Vector{UInt8},TBasketKey})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerBasicPointer}})
Base.precompile(Tuple{typeof(compressed_datastream),IOStream,TBasketKey})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerSTL}})
Base.precompile(Tuple{typeof(interped_data),Vector{UInt8},Vector{Int32},TBranch_13,Type{Nooffsetjagg},Type{Vector{Float32}}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerBasicType}})
Base.precompile(Tuple{typeof(getindex),ROOTFile,String})
Base.precompile(Tuple{typeof(getindex),TTree,String})
Base.precompile(Tuple{typeof(interp_jaggT),TBranch_13,TLeafF})
Base.precompile(Tuple{Type{ROOTFile},String})
Base.precompile(Tuple{typeof(getindex),LazyBranch{Vector{Float32}, Nooffsetjagg},UnitRange{Int64}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TLeafF}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerObject}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TLeafL}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerObjectPointer}})
Base.precompile(Tuple{typeof(basketarray),ROOTFile,TBranch_13,Int64})
Base.precompile(Tuple{Type{TTree},IOStream,TKey32,Dict{Int32, Any}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerString}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TLeafO}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerObjectAny}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TLeafI}})
Base.precompile(Tuple{Type{LazyBranch},ROOTFile,TBranch_13})
Base.precompile(Tuple{typeof(readfields!),Cursor,Dict{Symbol, Any},Type{TBranch_13}})
Base.precompile(Tuple{typeof(unpack),IOBuffer,TKey32,Dict{Int32, Any},Type{TStreamerBase}})
Base.precompile(Tuple{Core.kwftype(typeof(Type)),NamedTuple{(:cursor, :fFirstEntry, :fIOFeatures, :fFillColor, :fMaxBaskets, :fWriteBasket, :fEntryOffsetLen, :fBaskets, :fTitle, :fZipBytes, :fSplitLevel, :fCompress, :fBasketSize, :fName, :fTotBytes, :fBasketEntry, :fLeaves, :fBasketSeek, :fFillStyle, :fBasketBytes, :fEntries, :fBranches, :fFileName, :fEntryNumber, :fOffset), Tuple{Cursor, Int64, ROOT_3a3a_TIOFeatures, Int16, UInt32, Int32, Int32, TObjArray, String, Int64, Int32, Int32, Int32, String, Int64, Vector{Int64}, TObjArray, Vector{Int64}, Int16, Vector{Int32}, Int64, TObjArray, String, Int64, Int32}},Type{TBranch_13}})
13 changes: 12 additions & 1 deletion src/root.jl
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,18 @@ function streamerfor(f::ROOTFile, name::AbstractString)
end


@memoize LRU(maxsize = 2000) function Base.getindex(f::ROOTFile, s::AbstractString)
function Base.getindex(f::ROOTFile, s::AbstractString)
S = _getindex(f, s)
if S isa Union{TBranch, TBranchElement}
try # if we can't construct LazyBranch, just give up (maybe due to custom class)
return LazyBranch(f, S)
catch
end
end
S
end

@memoize LRU(maxsize = 2000) function _getindex(f::ROOTFile, s)
if '/' s
@debug "Splitting path '$s' and getting items recursively"
paths = split(s, '/')
Expand Down
24 changes: 24 additions & 0 deletions test/precompile_script.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
using UnROOT
const HERE = @__DIR__
const a = ROOTFile("$HERE/samples/NanoAODv5_sample.root")
const b = a["Events"]["Electron_dxy"]
const lb = a["Events/Electron_dxy"]

@show a,b,lb
lb[1:3]

for i in lb
i
break
end

function f()
for n in keys(a["Events"])
lb = a["Events/$n"]
lb[1]
for i in lb
break
end
end
end

Loading

2 comments on commit d5a2c0f

@Moelf
Copy link
Member Author

@Moelf Moelf commented on d5a2c0f Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/40530

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.2.1 -m "<description of version>" d5a2c0fff5d42663ca2528f555aacb7d4a1dd06d
git push origin v0.2.1

Please sign in to comment.