awkward to rdataframe #1374

ianna · 2022-03-17T17:15:50Z

The user side looks as follows:

    array = ak._v2.Array(
        [
            [{"x": 1, "y": [1.1]}, {"x": 2, "y": [2.0, 0.2]}],
            [],
            [{"x": 3, "y": [3.0, 0.3, 3.3]}],
        ]
    )
    ak_array_1 = array["x"]
    ak_array_2 = array["y"]

    # An 'awkward' namespace will be added to the column name
    data_frame = ak._v2.operations.convert.to_rdataframe(
        {"x": ak_array_1, "y": ak_array_2}
    )

    done = compiler(
        """
        template <typename Array>
        struct MyFunctor {
            void operator()(const Array& a) {
                cout << "#6. Call user function: " << a << endl;
            }
        };
        """
    )
    assert done is True

    f = ROOT.MyFunctor[data_frame.GetColumnType("awkward:x")]()
    data_frame.Foreach(f, ["awkward:x"])

The array passed as an "x" column is:

ak_array_1 = <Array [[1, 2], [], [3]] type='3 * var * int64'>

An AwkwardArrayColumnReader iterates over 3 ranges to apply Foreach functor:

CONSTRUCTED AwkwardArrayColumnReader_NumpyArray_int64_Nt5aOZLMgX4_x of a ROOT::VecOps::RVec<Long64_t> at 0x7fefaec0ef48
#5. SetEntry
#6. Call user function: { 1, 2 }
#5. SetEntry
#6. Call user function: {  }
#5. SetEntry
#6. Call user function: { 3 }
#4. GetEntryRanges
DESTRUCTED AwkwardArrayColumnReader_NumpyArray_int64_Nt5aOZLMgX4_x of a ROOT::VecOps::RVec<Long64_t> at 0x7fefaec0ef48

codecov · 2022-03-18T17:24:37Z

Codecov Report

Merging #1374 (57ede36) into main (edfce38) will increase coverage by 0.65%.
The diff coverage is 52.66%.

Impacted Files	Coverage Δ
src/awkward/_v2/_connect/cuda/__init__.py	`0.00% <0.00%> (ø)`
src/awkward/_v2/_connect/jax/__init__.py	`0.00% <0.00%> (ø)`
...c/awkward/_v2/_connect/rdataframe/to_rdataframe.py	`0.00% <0.00%> (ø)`
src/awkward/_v2/contents/recordarray.py	`82.26% <ø> (+0.76%)`	⬆️
src/awkward/_v2/contents/unionarray.py	`86.50% <ø> (ø)`
src/awkward/_v2/operations/convert/ak_from_cupy.py	`50.00% <0.00%> (+23.33%)`	⬆️
src/awkward/_v2/operations/convert/ak_from_iter.py	`93.75% <ø> (ø)`
src/awkward/_v2/operations/convert/ak_from_jax.py	`50.00% <0.00%> (-25.00%)`	⬇️
...wkward/_v2/operations/convert/ak_to_arrow_table.py	`100.00% <ø> (ø)`
src/awkward/_v2/operations/convert/ak_to_cupy.py	`33.33% <0.00%> (+23.95%)`	⬆️
... and 77 more

ianna · 2022-04-20T15:29:51Z

@jpivarski - I think, this is pretty much done. I'm still not quite sure how to select the range for the different length ak.Arrays. So far it's defined by the length of the first column's ak.Array:
https://github.com/scikit-hep/awkward-1.0/pull/1374/files#diff-2e47d1ea25ddbe4f77fe23a2e0cc5897d74b5389c659e527d7c6b07a8166588bR254-R256

jpivarski

So, I'm trying this out (using the .py files in the ROOT RDF tutorials as a guide), and it hasn't been designed in such a way that RDataFrame users can use it as they would any other RDataFrame.

For context, this is basic usage:

>>> import ROOT
>>> ROOT.RDF.MakeTrivialDataFrame(50).AsNumpy(["col0"])
{'col0': ndarray([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
         16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
         32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
         48, 49], dtype=uint64)}

where MakeTrivialDataFrame is essentially np.arange (with dtype=np.uint64) and AsNumpy lets you select a subset of columns and convert them into NumPy arrays. If the columns have primitive types, these are regular NumPy arrays, otherwise they'll be dtype=object arrays with PyROOT objects.

Here's another one that takes a NumPy array as input:

>>> import numpy as np
>>> rdf = ROOT.RDF.MakeNumpyDataFrame({"x": np.arange(10) * 1.1})
>>> rdf.AsNumpy(["x"])
{'x': ndarray([0. , 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])}

Here's an example of defining a new column from an old column:

>>> rdf.Define("y", "10 * x").AsNumpy(["y"])
{'y': ndarray([ 0., 11., 22., 33., 44., 55., 66., 77., 88., 99.])}

At any point, we can look at the head of the RDataFrame, for all defined columns. That's a good diagnostic.

>>> rdf.Display().Print()
x         | 
0.0000000 | 
1.1000000 | 
2.2000000 | 
3.3000000 | 
4.4000000 | 
>>> rdf.Define("y", "10 * x").Display().Print()
y         | x         | 
0.0000000 | 0.0000000 | 
11.000000 | 1.1000000 | 
22.000000 | 2.2000000 | 
33.000000 | 3.3000000 | 
44.000000 | 4.4000000 |

The first Awkward Array that I tried had primitive type, and it didn't work. I know, this only becomes useful when the types are non-primitive, but I'm starting with the basics.

>>> import awkward as ak
>>> array = ak._v2.Array([1.1, 2.2, 3.3, 4.4, 5.5])
>>> rdf = ak._v2.to_rdataframe({"some_array": array})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/operations/convert/ak_to_rdataframe.py", line 24, in to_rdataframe
    return _impl(
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/operations/convert/ak_to_rdataframe.py", line 36, in _impl
    rdf = ak._v2._connect.rdataframe.to_rdataframe.to_rdataframe(
  File "/home/jpivarski/irishep/awkward-1.0/awkward/_v2/_connect/rdataframe/to_rdataframe.py", line 86, in to_rdataframe
    type(rdf_array_view_entries[key]).__cpp_name__,
AttributeError: type object 'float' has no attribute '__cpp_name__'

Okay, it's easy to see how that slipped by if all of the arrays used in tests happen to be record arrays. Line 86, type(rdf_array_view_entries[key]).__cpp_name__ presumes that rdf_array_view_entries[key] is a PyROOT object, but PyROOT only gives you one of its objects (lines 80‒82) if it can't be resolved to a Python builtin. Actually, relying on that is dangerous because many C++ types convert to a single Python type (float and double going to a Python float or all the int*_t going to a Python int). Fortunately, you don't have to create an entry (which might not work in a dataset with zero entries, anyway) to find out what its type is going to be; you can use the generator's entry_type method:

https://github.com/scikit-hep/awkward-1.0/blob/ec9eec9aa5fae80dac44c300c6a48e3766afd1cc/src/awkward/_v2/_connect/cling.py#L524-L526

While we're at it, you also don't need to create a get_entry_{generated_type}_{key}_{flatlist_as_rvec} function just to use it in the AwkwardArrayColumnReader_{generated_type}_{key}_{flatlist_as_rvec} class. You can create the entry directly in GetColumnReadersImpl.

(I also don't understand the need for either of the two levels of wrapping: ArrayWrapper and AwkwardArrayColumnReader_{generated_type}_{key}_{flatlist_as_rvec}. All you really need is the one {rdf_array_data_source_class_name} class, which supplies a batch of entries from GetColumnReadersImpl as a vector of void* pointers that the RDataSource superclass will cast based on the return value of GetTypeName.)

So I made another simple example that has record structure so that it will pass the __cpp_name__ check.

>>> array = ak._v2.Array([{"x": 1.1}, {"x": 2.2}, {"x": 3.3}, {"x": 4.4}, {"x": 5.5}])
>>> rdf = ak._v2.to_rdataframe({"some_array": array})

So far, so good.

>>> list(rdf.GetColumnNames())
['awkward:some_array']

Okay—if the column name has a colon in it, we can't use it as an argument of any functions, for instance

>>> rdf.Define("y", "some_array.x()")
input_line_122:2:28: error: use of undeclared identifier 'some_array'
auto lambda1 = [](){return some_array.x()

doesn't work because some_array is not a column name, and

>>> rdf.Define("y", "awkward:some_array.x()")
input_line_130:2:35: error: unexpected ':' in nested name specifier; did you mean '::'?
auto lambda1 = [](){return awkward:some_array.x()
                                  ^
                                  ::
input_line_130:2:36: error: no member named 'some_array' in namespace 'awkward'

RDataFrame generates code from the string passed to its functors (Define, Filter, etc.) that uses identifiers to look up columns and know what to put in the arguments of the functions.

The current tests build functions manually, and presumably they work because arguments are passed in the right order, but that's not how users would use RDataFrame—they'd generate code from strings like the examples in the tutorials directory.

I'm going to see if I can write up a minimally working example.

jpivarski · 2022-04-21T00:25:14Z

I managed to get a fully working example by modifying the "WonkyDS" from #588 (comment). It's emulating a two-column Awkward RDataSource, in which one column contains a primitive type and the other contains records.

import json

import ROOT
import awkward as ak
import awkward._v2._connect.cling

example1 = ak._v2.Array([1.1, 2.2, 3.3, 4.4, 5.5])
example2 = ak._v2.Array([{"x": 1.1}, {"x": 2.2}, {"x": 3.3}, {"x": 4.4}, {"x": 5.5}])

assert len(example1) == len(example2)

It's essential that they have the same lengths, something that ak.to_rdataframe should check.

First, I'll generate the ArrayViews and RecordViews and get the data types for them as strings. RDataFrame is not involved yet.

generator1 = ak._v2._connect.cling.togenerator(example1.layout.form)
lookup1 = ak._v2._lookup.Lookup(example1.layout)
generator1.generate(ROOT.gInterpreter.Declare, flatlist_as_rvec=True)

generator2 = ak._v2._connect.cling.togenerator(example2.layout.form)
lookup2 = ak._v2._lookup.Lookup(example2.layout)
generator2.generate(ROOT.gInterpreter.Declare, flatlist_as_rvec=True)

dataset_type_one = generator1.class_type((True,))
entry_type_one = generator1.entry_type(flatlist_as_rvec=True)
dataset_type_two = generator2.class_type((True,))
entry_type_two = generator2.entry_type(flatlist_as_rvec=True)

The following code is going to be generated in the awkward namespace to ensure that it doesn't collide with anything else. The type strings above are unqualified, so they'll work in the namespace, but they won't work in the RDataFrame code, when it tries to cast the column values. To be concrete, here's what the dataset type names look like:

>>> dataset_type_one, dataset_type_two
('NumpyArray_float64_S3W7Jysru0Y', 'RecordArray_TdrfhV0YO8')

and here's what the entry type names look like:

>>> entry_type_one, entry_type_two
('double', 'Record_SlxdvzA8mjY')

When providing RDataFrame with these type names (i.e. the string that GetTypeName returns), the Record will need to be qualified with "awkward::" but the double must not. You can tell which to qualify by checking the Python type of the generators: generator1 is a NumpyArrayGenerator, so its entry type will always be a primitive (must not qualify) and generator2 is a RecordArrayGenerator, so its entry type will be something in the awkward namespace (must qualify). Since we're using flatlist_as_rvec=True everywhere (and it's important to do that consistently), a ListArrayGenerator whose content is a NumpyArrayGenerator also must not be qualified because it will be an RVec, rather than an ArrayView. There should be tests in which the top-level type is a list of primitive and in which the top-level type is a list of records.

(Also, note that the hashes aren't the same in different Python processes.)

The following is almost entirely hard-coded, but it should be clear how to generate the parts that depend on "one" and "two".

ROOT.gInterpreter.Declare(f"""
namespace awkward {{

class RWonkyDS final : public ROOT::RDF::RDataSource {{
private:
   unsigned int fNSlots = 0U;
   std::vector<std::pair<ULong64_t, ULong64_t>> fEntryRanges;

   ULong64_t fSize = 0ULL;
   ULong64_t fPtrs_one = 0;
   ULong64_t fPtrs_two = 0;

   std::vector<std::string> fColNames{{"one", "two"}};
   std::vector<{entry_type_one}>  slots_one;
   std::vector<{entry_type_one}*> addrs_one;
   std::vector<{entry_type_two}>  slots_two;
   std::vector<{entry_type_two}*> addrs_two;

   std::vector<void *> GetColumnReadersImpl(std::string_view name, const std::type_info &);

protected:
   std::string AsString() {{ return "trivial data source"; }};

public:
   RWonkyDS(ULong64_t size, ULong64_t ptrs_one, ULong64_t ptrs_two);
   RWonkyDS();
   ~RWonkyDS();
   const std::vector<std::string> &GetColumnNames() const;
   bool HasColumn(std::string_view colName) const;
   std::string GetTypeName(std::string_view) const;
   std::vector<std::pair<ULong64_t, ULong64_t>> GetEntryRanges();
   bool SetEntry(unsigned int slot, ULong64_t entry);
   void SetNSlots(unsigned int nSlots);
   void Initialise();
   std::string GetLabel();
}};

// Make a RDF wrapping a RWonkyDS with the specified amount of entries
ROOT::RDF::RInterface<ROOT::RDF::RDFDetail::RLoopManager, RWonkyDS> MakeWonkyDataFrame(ULong64_t size, ULong64_t ptrs_one, ULong64_t ptrs_two);
// Make a RDF wrapping a broken RWonkyDS... because we need a zero-argument constructor?
ROOT::RDF::RInterface<ROOT::RDF::RDFDetail::RLoopManager, RWonkyDS> MakeWonkyDataFrame();

std::vector<void *> RWonkyDS::GetColumnReadersImpl(std::string_view colName, const std::type_info &ti) {{
   std::vector<void *> ret;

   if (colName == "one") {{
      for (auto i : ROOT::TSeqU(fNSlots)) {{
         addrs_one[i] = &slots_one[i];
         ret.emplace_back((void *)(&addrs_one[i]));
      }}
   }}
   else if (colName == "two") {{
      for (auto i : ROOT::TSeqU(fNSlots)) {{
         addrs_two[i] = &slots_two[i];
         ret.emplace_back((void *)(&addrs_two[i]));
      }}
   }}
   else {{
      for (auto i : ROOT::TSeqU(fNSlots)) {{
         ret.emplace_back(nullptr);
      }}
   }}

   return ret;
}}

RWonkyDS::RWonkyDS(ULong64_t size, ULong64_t ptrs_one, ULong64_t ptrs_two) : fSize(size), fPtrs_one(ptrs_one), fPtrs_two(ptrs_two) {{
}}

RWonkyDS::RWonkyDS() : fSize(0), fPtrs_one(0), fPtrs_two(0) {{
}}

RWonkyDS::~RWonkyDS() {{
}}

const std::vector<std::string> &RWonkyDS::GetColumnNames() const {{
   return fColNames;
}}

bool RWonkyDS::HasColumn(std::string_view colName) const {{
   for (auto name : fColNames) {{
      if (colName == name) {{
         return true;
      }}
   }}
   return false;
}}

std::string RWonkyDS::GetTypeName(std::string_view colName) const {{
   if (colName == "one") {{
      return {json.dumps(entry_type_one)};
   }}
   else if (colName == "two") {{
      return {json.dumps("awkward::" + entry_type_two)};
   }}
   else {{
      return "no such column";   // should break whatever tries to use it as a type
   }}
}}

std::vector<std::pair<ULong64_t, ULong64_t>> RWonkyDS::GetEntryRanges() {{
   // empty fEntryRanges so we'll return an empty vector on subsequent calls
   auto ranges = std::move(fEntryRanges);
   return ranges;
}}

bool RWonkyDS::SetEntry(unsigned int slot, ULong64_t entry) {{

   slots_one[slot] = {dataset_type_one}(0, fSize, 0, reinterpret_cast<ssize_t*>(fPtrs_one))[entry];
   slots_two[slot] = {dataset_type_two}(0, fSize, 0, reinterpret_cast<ssize_t*>(fPtrs_two))[entry];

   return true;
}}

void RWonkyDS::SetNSlots(unsigned int nSlots) {{
   R__ASSERT(0U == fNSlots && "Setting the number of slots even if the number of slots is different from zero.");
   fNSlots = nSlots;

   slots_one.resize(fNSlots);
   addrs_one.resize(fNSlots);
   slots_two.resize(fNSlots);
   addrs_two.resize(fNSlots);

}}

void RWonkyDS::Initialise() {{
   // initialize fEntryRanges
   const auto chunkSize = fSize / fNSlots;
   auto start = 0UL;
   auto end = 0UL;
   for (auto i : ROOT::TSeqUL(fNSlots)) {{
      start = end;
      end += chunkSize;
      fEntryRanges.emplace_back(start, end);
      (void)i;
   }}
   // TODO: redistribute reminder to all slots
   fEntryRanges.back().second += fSize % fNSlots;
}}

std::string RWonkyDS::GetLabel() {{
   return "WonkyDS";
}}

ROOT::RDF::RInterface<ROOT::RDF::RDFDetail::RLoopManager, RWonkyDS> MakeWonkyDataFrame(ULong64_t size, ULong64_t ptrs_one, ULong64_t ptrs_two) {{
   auto lm = std::make_unique<ROOT::RDF::RDFDetail::RLoopManager>(std::make_unique<RWonkyDS>(size, ptrs_one, ptrs_two), ROOT::RDF::RDFInternal::ColumnNames_t{{}});
   return ROOT::RDF::RInterface<ROOT::RDF::RDFDetail::RLoopManager, RWonkyDS>(std::move(lm));
}}

}} // namespace awkward
""")

rdf = ROOT.awkward.MakeWonkyDataFrame(
    len(example1), lookup1.arrayptrs.ctypes.data, lookup2.arrayptrs.ctypes.data
)

Now we can use this rdf as an RDataFrame.

>>> rdf.Display().Print()
one       | two             | 
1.1000000 | @0x564bafb68100 | 
2.2000000 | @0x564bafb68100 | 
3.3000000 | @0x564bafb68100 | 
4.4000000 | @0x564bafb68100 | 
5.5000000 | @0x564bafb68100 |

The "one" column has readable numbers. It seems that RDataFrame always prints objects as pointers. It's the same pointer in each entry because RDataFrame gets an entry by overwriting memory. The above implementation should work with multiple slots (parallel processing), but I haven't tested it. With two ROOT threads, the entries should get divided among the threads with one pointer for half the entries and another pointer for the other half of the entries, and the two threads only modify their own data in place.

Now let's try calculating some new variables on these columns. Column "one" really ought to work because we can see that the numbers are properly represented in the display above, but let's do it anyway:

>>> rdf.Define("newvar1", "one * 10").AsNumpy(["newvar1"])
{'newvar1': ndarray([11., 22., 33., 44., 55.])}

Okay, good. Now for column "two", we can't pass the record-valued data into NumPy (unless it makes PyROOT objects? I'll try that next), but we can get numbers from it through its x method:

>>> rdf.Define("newvar2", "two.x()").AsNumpy(["newvar2"])
{'newvar2': ndarray([1.1, 2.2, 3.3, 4.4, 5.5])}

Let's be adventurous and try actually sending the record objects to NumPy. In principle, it should know how to make PyROOT objects for our ArrayViews and RecordViews because they were compiled by Cling.

>>> will_it_crash = rdf.AsNumpy(["two"])
>>> will_it_crash
{'two': ndarray([<cppyy.gbl.awkward.Record_dlJfZCDGSwY object at 0x564bb4fac690>,
         <cppyy.gbl.awkward.Record_dlJfZCDGSwY object at 0x564bb4fac6a8>,
         <cppyy.gbl.awkward.Record_dlJfZCDGSwY object at 0x564bb4fac6c0>,
         <cppyy.gbl.awkward.Record_dlJfZCDGSwY object at 0x564bb4fac6d8>,
         <cppyy.gbl.awkward.Record_dlJfZCDGSwY object at 0x564bb4fac6f0>],
        dtype=object)}
>>> will_it_crash["two"][2]
<cppyy.gbl.awkward.Record_dlJfZCDGSwY object at 0x564bb4fac6c0>
>>> will_it_crash["two"][2].x()
3.3

It works! However, we'll probably need some significant disclaimers on these objects because they're based on borrowed references. If the lookups ever go out of scope, will_it_crash["two"][2].x() will definitely crash.

In fact, that reminds me, the outgoing RDataFrame that ak.to_rdataframe returns will have to maintain a reference to the lookups. Fortunately, this is Python and we can just stash them in the object:

rdf._lookups = (lookup1, lookup2)

As long as this rdf object (which ak.to_rdataframe returns) is alive, the ArrayViews and RecordViews it makes in internal calculations and PyROOT objects in dtype=object NumPy arrays will be valid. As soon as the rdf goes out of scope, the lookups will be deleted and there are no memory leaks. That covers all of the cases in which a user interacts with it through the rdf methods (which is the proper way, and the only way that runs at compiled speed), but it doesn't cover cases in which the user extracted a dtype=object NumPy array, then deleted the rdf, then interacts with the data in the NumPy array. If that ever comes up, we'll tell them not to do that.

Anyway, I think the above example should help. As you can see, there was no need to define any wrappers, only the single RWonkyDS class and its "Make" function. There are no templates anywhere. The class and function names need to be generated with hashes so that a different set of column names and types don't try to reuse this one. The hash should be based on a tuple of name, generator pairs:

>>> hash((("one", generator1), ("two", generator2)))
6386967581103326234

which is unchanged when we recreate them with the same Forms (but not different Forms, and not in a different Python process):

>>> generator1 = ak._v2._connect.cling.togenerator(example1.layout.form)
>>> generator2 = ak._v2._connect.cling.togenerator(example2.layout.form)
>>> hash((("one", generator1), ("two", generator2)))
6386967581103326234

Be sure to use the helper functions to generate the code, so that the code-generator remains readable:

>>> generator1.entry(length="fSize", ptrs="fPtrs_one", entry="entry", flatlist_as_rvec=True)
'awkward::NumpyArray_float64_uXYDMncQqaw(0, fSize, 0, reinterpret_cast<ssize_t*>(fPtrs_one))[entry]'
>>> generator2.entry(length="fSize", ptrs="fPtrs_two", entry="entry", flatlist_as_rvec=True)
'awkward::RecordArray_nd3xGJcxmAg(0, fSize, 0, reinterpret_cast<ssize_t*>(fPtrs_two))[entry]'

Preparing generated code parts before the big f-string and inserting them with single-curly brackets will help to keep the code-generator readable. Inline expressions when they're simple enough is better than following names.

A good process would be to reproduce the example described in this comment, with all its hard-codedness, and slowly replace the hard-coded parts with auto-generated parts, testing between every invocation. Segfaults can usually be debugged by inserting std::cout in the right places, including the code generators in cling.py, since the code generated by cling.py is used in the RDataFrame loop, and you can get closer to the actual error.

Let me know if there's anything else I can help with.

…nction

ianna · 2022-04-25T16:19:18Z

@jpivarski - please, have a look when you have time. The flatlist_as_rvec is a property of the generators now, so there should not be possible to mix RVecs and vectors. I did not test this change with numba, but I did not see explicit use of the flag there. The default value of it is still the same.
I've removed the ArrayWrapper and ArrayReader classes. The column dependent code is generated in a Python loop. These extra fragments can be implemented as functions, though, I hope, it is more or less readable.
I've finished for today, please, feel free to modify the PR. Thanks.

…od argument (to ensure consistency).

…efault.

jpivarski

This is much better!

There are still some rough edges, so this will need more testing. I saw some cases in which nonsense values came out of an integer field and repeated calculations had history dependence, but couldn't reproduce them. Many of the tests here are print-outs, rather than automated checks to ensure that the output is right. We should at least test the full suite of structures, the way that the test for #1300 does, using RDataFrame's AsNumpy to get data back out.

I completed the flatlist_as_rvec refactoring (which someday should become "options", maybe a namedtuple to keep it hashable). I also made the inputs a bit more robust (use ak._v2.to_layout rather than just assuming ak._v2.Arrays). The default (for ROOT) should be flatlist_as_rvec=True.

The testing/hardening can be in a new PR, in which case, you can merge this PR now.

ianna · 2022-04-26T21:10:53Z

@jpivarski - thanks! I have automated the test checks and will add the changes to this PR.

ianna marked this pull request as draft March 17, 2022 17:16

ianna changed the title ~~awkward to rdaframe~~ awkward to rdataframe Mar 17, 2022

ianna marked this pull request as ready for review April 20, 2022 15:30

ianna requested a review from jpivarski April 20, 2022 15:30

jpivarski requested changes Apr 20, 2022

View reviewed changes

ianna force-pushed the ianna/awkward_to_rdataframe branch from b6ce7c8 to 1f7fb24 Compare April 22, 2022 09:26

ianna added 21 commits April 22, 2022 14:28

awkward to rdaframe

4bdf6dc

hide root import and rename the test

3fde565

correct a typo

03c8851

correct a typo

941a16e

pass generated types to rdf source

ff5a4ea

test rdf source from awkward array and try to apply a user-defined fu…

b7af4a5

…nction

add debugging info

4f08e7c

last commit

9449eeb

more debugging printout

a5f7764

move array view out of the loop

c92c260

add pointer holder models

6f385ae

hide dicts

c76546f

forbid copy and assignment

eeb16ed

forbid copy and assignment

f203d47

extend lifetime of the array views

f81ced6

check array view pointers

ec6c83c

column readers

8e29ba0

column readers

5ace2e7

add a simple one column data source example

49481d4

wrap array view entry in a unique_ptr

6200eae

wrap array view entry in a unique_ptr

753318e

ianna added 9 commits April 22, 2022 14:28

test functors for both columns

30f73bf

remove all remaining debugging printouts

7e29abe

add a test without rvecs

482122e

one more test including records

81b6239

change default away from rvecs

7a3990b

check that there is at least one array passed as column

eeff724

check on primitive type and add Jim's examples

7a6381f

fix tests

99969a2

add reminder to has the class name

3e1467d

ianna force-pushed the ianna/awkward_to_rdataframe branch from a289334 to 3e1467d Compare April 22, 2022 12:30

jpivarski and others added 7 commits April 22, 2022 08:51

Zero-argument constructors for all ArrayView and RecordView classes.

bc9a70b

address Jim's comments

af3b052

fix pylint warning

52aae0b

fix all tests

6df3753

flatlist_as_rvec as property

0e74901

final cleanup

130da66

restructure for better readability

b443f57

ianna requested a review from jpivarski April 25, 2022 16:19

jpivarski added 5 commits April 26, 2022 10:33

Merge branch 'main' into ianna/awkward_to_rdataframe

f17c643

Make flatlist_as_rvec a Generator class attribute, rather than a meth…

23c4e41

…od argument (to ensure consistency).

Ensure that it works for flatlist_as_rvec=True, which should be the d…

18cb6af

…efault.

Documentation and interface.

fa995e0

Put auto-generated class and function into the awkward namespace.

57ede36

jpivarski approved these changes Apr 26, 2022

View reviewed changes

jpivarski mentioned this pull request Apr 26, 2022

Awkward to RDataFrame: to start a discussion #1295

Closed

ianna merged commit 73c24fa into main Apr 27, 2022

ianna deleted the ianna/awkward_to_rdataframe branch April 27, 2022 06:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awkward to rdataframe #1374

awkward to rdataframe #1374

ianna commented Mar 17, 2022 •

edited

Loading

codecov bot commented Mar 18, 2022 •

edited

Loading

ianna commented Apr 20, 2022

jpivarski left a comment

jpivarski commented Apr 21, 2022

ianna commented Apr 25, 2022

jpivarski left a comment

ianna commented Apr 26, 2022

awkward to rdataframe #1374

awkward to rdataframe #1374

Conversation

ianna commented Mar 17, 2022 • edited Loading

codecov bot commented Mar 18, 2022 • edited Loading

Codecov Report

ianna commented Apr 20, 2022

jpivarski left a comment

Choose a reason for hiding this comment

jpivarski commented Apr 21, 2022

ianna commented Apr 25, 2022

jpivarski left a comment

Choose a reason for hiding this comment

ianna commented Apr 26, 2022

ianna commented Mar 17, 2022 •

edited

Loading

codecov bot commented Mar 18, 2022 •

edited

Loading