-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RF][PyROOT] New functions for conversion between RooDataHist and NumPy arrays #8784
Conversation
4c4db88
to
61570e4
Compare
61570e4
to
61f7c05
Compare
61f7c05
to
ae0a5cc
Compare
7c99dc8
to
6118536
Compare
|
||
@property | ||
def shape(self): | ||
import ROOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this import as it is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! You're right.
return self._to_array(self.sumW2Array()) | ||
|
||
@staticmethod | ||
def from_numpy(hist_weights, variables, bins, ranges=None, weights_squared_sum=None, name=None, title=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we name it from_numpy
and to_numpy
or FromNumpy
and ToNumpy
to comply with the C++ naming (and also other non-C++ cases such as RDataFrame.AsNumpy
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but let me argue why I'd rather stick with from_numpy
and to_numpy
.
- It gives a warm fuzzy feeling to the Python user that the function names follow the established Python convention (maybe sounds ridiculous but I think it is psychologically very important because it makes ROOT feel more native to Python)
- It highlights the distinction between pythonized C++ functions and pure Python functions
- We have already advertised these names in the PPP and ACAT:
- For RooDataSet, we already used the
snake_case
names: [RF][PyROOT] New functions for conversion between RooDataSet and NumPy arrays or Pandas dataframes #9346 - At this point, the
to_numpy
andfrom_numpy
names are almost conventional in the Python ecosystem, see e.g. Pandas or PyTorch. So a user would tryd.to_numpy()
intuitively, just like they would tryd.shape
and assume that it works
By the way, RooFit follows camelCase
, unlike the rest of ROOT. So if we try to comply with the C++ names, it would rather be toNumpy
and fromNumpy
.
def _to_array(self, buffer): | ||
import numpy as np | ||
|
||
# check if buffer is nullptr (for some reason comparing with ROOT.nullptr doesn't work) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the type of buffer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps adding some docs would be good too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is meant to create a numpy array from a double *
. The type of buffer usually a cppyy.LowLevelView
, which is what you get when you call a C++ function that returns a double *
. I'll write some doc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks, my question was more in the direction: if a cppyy.LowLevelView
points to null, it's not equal to ROOT.nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked that, and indeed:
>>> ROOT.gInterpreter.Declare("double * foo() { return nullptr; }")
True
>>> buffer = ROOT.foo()
>>> buffer == ROOT.nullptr
False
but what you can do is use the boolean value of the cppyy.LowLevelView
:
if not buffer: return None
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Nice, I will to de boolean check then, as it's much cleaner 👍
return a.reshape(self.shape) | ||
|
||
def _var_is_category(self): | ||
""" """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want some docs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Indeed it looked like I forgot that...
Build failed on mac11/cxx17. Failing tests:
And 4 more |
Starting build on |
Build failed on ROOT-debian10-i386/cxx14. |
Build failed on ROOT-performance-centos8-multicore/default. |
I think the CI failing is unrelated to the changes in this PR and should be fixed by #8784. |
Starting build on |
Build failed on ROOT-debian10-i386/cxx14. |
Build failed on ROOT-performance-centos8-multicore/default. |
The interface of these functions was strongly inspired by `numpy.histogramdd`: https://numpy.org/doc/stable/reference/generated/numpy.histogramdd.html#numpy.histogramdd A unit test that makes a closure check by converting a RooDataHist to numpy and back was also implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you Jonas for this nice improvement
Build failed on mac11/cxx17. Failing tests:
And 4 more |
ab55ed0
to
3222a6b
Compare
Starting build on |
Autosquashed fixup commits. |
Build failed on ROOT-performance-centos8-multicore/default. Warnings:
|
Build failed on mac11/cxx17. Failing tests:
And 4 more |
Build failed on windows10/cxx14. Errors:
|
This PR introduces new PyROOT features that allow for conversion between the RooDataHist and the NumPy arrays, following up a PR that already introduced similar functionality for the RooDataSet:
#9346
The new methods are (with checkmarks if they are already implemented in this PR):
RooDataHist.to_numpy()
RooDataHist.from_numpy()
These new methods are also advertised in the release notes, and the existing
rf409_NumPyPandasToRooFit.py
tutorial is extended to also explain the functionality introduced in this PR.Note that any new Python functions prefixed with an underscore are not meant to be part of the public stable user interface, which is why they are not advertised in the release notes and also don't have docstrings.