-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDataFrame (awkward?) interface to input data to RootInteracive as a replacement of tree2Panda (& uproot) JIRA ATO-263 #243
Comments
@miranov25 - FYI, Awkward <-> RdataFrame has been completed and fully integrated to Awkward master branch. Please, see a tutorial draft here. |
Dear @ianna Thank you for the link. I went through the tutorial briefly. I have an idea for further steps. We could simplify the creation of a reusable library with the functions, similar to what we did in the past with the declaration of TTree::SetAlias. Perhaps this is already possible in ROOT v6.26. If this works as we envisioned, we could offer our RootInteractive visualisation and interactive ND histogramming/aggregation package (+ ML) for general use. Before offering it within scikit-hep, we need to improve the documentation and clearly state whether it is an experimental (ML) or stable part of our package. We have considered PyHep 2022, but we are too late https://indico.cern.ch/e/PyHEP2022. We can use the time now for full, roundabout integration of RDataFrame < -- > into our RootInteractive tool, including the possibilities of client queries via joins. If you are interested, I would like to ask for your advice and help. To get an idea of what the project is about, I would like to refer you to the RootIntreactive tutorial (March 2022): There are quite a few use cases, so I would like to refer to a more detailed example presentation/Jupyter Notebook/Dashboard/accompanying video:
The tutorial is 6 months old in meantime we improved the speed with the aim of interactive detector physics and physics analysis:
I plan to visit CERN in September. In case of interest, we can meet before on zoom or at CERN. Marian |
Hi Marian,
Let’s meet at CERN. Please, contact me by e-mail iannadotosborneatcerndotch
Kind regards,
Ianna
On 2 Sep 2022, at 15:13, miranov25 ***@***.******@***.***>> wrote:
Dear @ianna<https://github.com/ianna>
Thank you for the link. I went through the tutorial briefly. I have an idea for further steps. We could simplify the creation of a reusable library with the functions, similar to what we did in the past with the declaration of TTree::SetAlias. Perhaps this is already possible in ROOT v6.26. If this works as we envisioned, we could offer our RootInteractive visualisation and interactive ND histogramming/aggregation package (+ ML) for general use.
Before offering it within scikit-hep, we need to improve the documentation and clearly state whether it is an experimental (ML) or stable part of our package. We have considered PyHep 2022, but we are too late https://indico.cern.ch/e/PyHEP2022. We can use the time now for full, roundabout integration of RDataFrame < -- > into our RootInteractive tool, including the possibilities of client queries via joins.
If you are interested, I would like to ask for your advice and help. To get an idea of what the project is about, I would like to refer you to the RootIntreactive tutorial (March 2022):
https://indico.cern.ch/event/1135398/
There are quite a few use cases, so I would like to refer to a more detailed example presentation/Jupyter Notebook/Dashboard/accompanying video:
* Interactive dashboard:
* https://indico.cern.ch/event/1135398/contributions/4764024/subcontributions/370740/attachments/2402507/4114266/CMITSimulationsGEMTPC.html
* Presentation:
* https://indico.cern.ch/event/1135398/contributions/4764014/attachments/2405453/4114991/PWGPP-485NDPipelineRootInteractive_Tutorial10032022.pdf
* https://indico.cern.ch/event/1135398/contributions/4764024/subcontributions/370740/attachments/2402507/4114272/CMITSimulGEMTPC_RootInteractiveTutorial10032022.pdf
* Demo film - RooInteractive part at 16:00
* https://indico.cern.ch/event/1135398/contributions/4764024/subcontributions/370740/attachments/2402507/4109039/CMITSimulationsGEMTPC.mp4
The tutorial is 6 months old in meantime we improved the speed with the aim of interactive detector physics and physics analysis:
* Dashboard :
*
https://indico.cern.ch/event/1135398/contributions/4950038/attachments/2474468/4245987/test_EffTrack.html*
* Notebook:
*
https://indico.cern.ch/event/1135398/#preview:4265612
I plan to visit CERN in September. In case of interest, we can meet before on zoom or at CERN.
Marian
—
Reply to this email directly, view it on GitHub<#243 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAKTQWQY2VT7IWARDETMJZTV4H4QPANCNFSM574WHWDA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Related JIRA ticket testing using "ToyMC" dEdx algorithm scantested in RI Notebook using :
Non-trivial RDataframe used for the dEdx algorithm optimization - translated to the RDataFrame <-> awkward
CPU times: user 1min 44s, sys: 884 ms, total: 1min 45s
Wall time: 10.2 s
|
RDataframe <-> awkward questions:Support Jupyter notebook and dashboard:
Open questions:
|
Answers to questions:
|
Please, try: >>> import awkward as ak
>>> from sklearn.ensemble import RandomForestClassifier
>>> clf = RandomForestClassifier(random_state=0)
>>> xarr = ak.Array([[1,2,3], [11,22,33]])
>>> yarr = ak.Array([0,1])
>>> clf.fit(xarr, yarr)
RandomForestClassifier(random_state=0)
>>> clf.predict(xarr)
array([0, 1])
>>> clf.predict([[4, 5, 6], [14, 15, 16]])
array([0, 1])
>>> |
https://awkward-array.readthedocs.io/en/latest/_auto/ak.pad_none.html >>> xarr_padded = ak.pad_none(xarr, 3)
>>> xarr_padded
<Array [[1, 2, 3], [11, 22, None]] type='2 * var * ?int64'>
>>> xarr_padded_n = ak.fill_none(xarr_padded, 0)
>>> xarr_padded_n
<Array [[1, 2, 3], [11, 22, 0]] type='2 * var * int64'> Please, also see this answer to the question if there is a way to use existing ML libraries with Awkward Array? |
Link to tutorials: |
Link to chat in scikit-hep - discussion about limitations and possible improvement - array of structures
|
RDataFrame column filter:In [86]: filterRDFColumns?
Signature:
filterRDFColumns(
rdf,
selectList=['.*'],
excludeList=[],
selectTypeList=['.*'],
excludeTypeList=['.*AliExternal.*'],
verbose=0,
)
Docstring:
function to filter available columns in RDataFrame
:param rdf: - input RDataFrame
:param selectList: - columns to select (regExp)
:param excludeList: - columns to reject (regExp)
:param selectTypeList - types to accept (regExp)
:param excludeTypeList: - types to reject (regExp)
:param verbose: - verbosity 0x1 -print all status 0x2 - print selected , 0x4 print rejected
:return: filtered list of columns
example:
filterRDFColumns(rdf1, ["param.*","delta","covar"],["part.",".*Refit.*"],[".*"],[""], verbose=1) |
Discussion in gitter chat :*https://gitter.im/matrix/5ba1f93bd73408ce4fa8a265/@agoose77:matrix.org?at=639e235d967c8305843b106c List of current supported features: |
Discussion in gitter -simplifying RDataFrame:https://gitter.im/matrix/5ba1f93bd73408ce4fa8a265/@agoose77:matrix.org?at=639f83faa151003b5a7550f4 |
Indices in the RDataFrame: Finding relation array1 array2 arrar1 (N1)-> array2(N2) - indeces (N1) pointing to closet values in N2 |
The functionality to the tree- >Draw queries (used internally in tree2Panda) and RDataFrame does not overlap. To be able to use complex C++ structures in O2, we have to switch to RDataFrame.
New functionality within the scikit-hep - scikit-hep/awkward#1295
tree->Draw
-- (+) dependency trees created automaticaly
-- (-) caching not supported
-- (-) only simple variables supported
-- (-) slower as an C interpreter used
RDataframe
The text was updated successfully, but these errors were encountered: