-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Templated LayoutBuilder #1494
Templated LayoutBuilder #1494
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Please, see minor comments. Thanks.
Codecov Report
|
Done. Should I add the descriptions of the methods too? |
Yes, please, it's a good idea! |
This is looking good! One thing, though: the GrowableBuffer will always (?) be used by external code that wants to do "copy to an external buffer" and "concatenation" as one operation. That's why we think the panels approach is a step up in performance: we know that external code will have to copy the data out of a GrowableBuffer so that that external code can own it, and if we have to do a copy anyway, we might as well concatenate in the same operation. With concatenation being "free," having no extra performance cost, that's why we can let the data be discontiguous as it grows and letting it be discontiguous as it grows lets us avoid more expensive ways of growing a buffer (reallocating and copying). So the class needs to have an interface like void concatenate(PRIMITIVE* external_pointer) const noexcept {
// always iterates over panels, copying data into external_pointer
// assumes that the buffer; external code should have allocated it with the right length()
// this function does not change any of the GrowableBuffer's internal panels; they're left as-is
} for external code to be able to use concatenate-and-copy in one operation. For use in ArrayBuilder, it will need ArrayBuilder will be an immediate use-case, AwkwardForth is down the road. Only worry about the ArrayBuilder methods for now. |
1ada767
to
764c20f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Please, see my comments inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks good. The tests run.
@jpivarski - please, have a look. Thanks!
@jpivarski - I've added a |
@ManasviGoyal - please, cherry-pick this commit that clears the warnings when the tests are built with the project flags on MacOS. Thanks. |
Done! |
@ManasviGoyal - I've stared testing the new I think, we need to extend the There are several cases where it is needed for building arrays. Here is one example how a current one is used: GrowableBuffer<std::complex<double>> buffer =
GrowableBuffer<std::complex<double>>::empty(initial, old.reserved());
int64_t* oldraw = old.ptr().get();
std::complex<double>* newraw = buffer.ptr().get();
for (size_t i = 0; i < 2*old.length(); i++) {
newraw[i] = {static_cast<double>(oldraw[i]), 0};
}
buffer.set_length(old.length());
old.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GrowableBuffer is great and I would sign off on it right now.
The LayoutBuilder, on the other hand, is about halfway between the simple/fast tool that it could be and where ArrayBuilder is now. I have a lot of inline comments, but I think I could explain what I'm thinking a little better with a prototype. Most of this code can be used as-is—what I'll show with the prototype is a way to prune off some methods that are getting in the way and show how it can all fit together. I'll write it up in Python because it will be faster for me to write it that way; you have a good handle on the C++isms, and I think you'll have no trouble translating.
I'll post that here as soon as I can.
If you want to split this PR into one PR for GrowableBuffer and another for LayoutBuilder, then we'll be able to merge the GrowableBuffer part right away.
Great! Thanks.
Thanks. The prototype really gives a better idea to improve upon the implementation. I'll update the PR with the changes that you suggested.
Yes, it will be better to split the PR so that GrowableBuffer can be merged. I'll do the same. Thanks. |
I finished writing the prototype. Along the way, I realized that there are some subtle issues, such as dealing with records that have no fields and regular arrays with inner size 0. There also needs to be a different builder for records with named fields ("records") and records whose fields are not named ("tuples"). I ran into some hard choices when trying to write Python that is intended to illustrate how it would be implemented in C++. Specifically, there were a few places where the C++ should pass references to integers and strings to be able to modify them remotely. You just can't do that in Python. To simulate this, I made a class Ref:
def __init__(self, value):
self.value = value because Python passes all mutable objects as references, so " // reference has C++ type &int
reference = 3; I hope my meaning is clear. I'm looking forward to the full C++ version of all of this! |
Yes. It is clear. I'll make the requested changes and update the soon. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ManasviGoyal and @jpivarski - the PR looks good. It implements all types and their tests defined in Jim's prototype. There are some places where I'd suggest to use a type aliases. Such as:
using Contents = typename std::tuple<BUILDERS...>;
and declare:
Contents contents_;
and more tests are needed to check that parameters are correctly placed in the forms.
I'll do a more thorough review after @ManasviGoyal has a chance to do any final touches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! There are just a few things to fix, described inline.
PRIMITIVE | ||
last() const { | ||
if (ptr_->length() == 0) { | ||
throw std::runtime_error("Buffer is empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, these functions should be noexcept
and have some default action for such cases. However, doing that after a performance-test framework is set up would be more useful, so that we can see how much it matters.
Builder builder; | ||
}; | ||
|
||
template <unsigned INITIAL, typename PRIMITIVE> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the initial size of the buffer a compile-time constant?
This should become a runtime value, passed down through instance constructors, not template parameters.
With it in the template parameter, every C++ user is going to have to specify INITIAL
at every node of the tree. That's a lot of boilerplate for a value they usually don't care about. They'll usually take a default that gets passed in from above, or through a BuilderOptions
.
The reason it's a parameter at all, instead of hard-coded into the code, is so that we have some leverage when doing performance tests. Users, even C++ users, will trust whatever optimal value we come up with from those performance tests. They're not going to want to have to type it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The users will not see it - we can type alias it as in the test examples. The reason why we need it as a template parameter - we use the default constructors. The builders are allocated on the stack and so their GrowableBuffers. We could change it, but let’s discuss it on Wednesday first. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, the initial GrowableBuffer size has to be a compile-time constant for a C++ reason: you need to use default constructors. (Why is that?)
And if GrowableBuffers are parameterized by more than one parameter—if they need to have a different first panel size from subsequent panel sizes as we discussed in the last meeting—do all of those have to become template parameters?
If this is the case, then we need to provide type aliases as part of the header-only package, with some choice of GrowableBuffer parameters hard-coded. C++ users of this package need to be able to say LayoutBuilder::Numpy<float>
without having to set up an alias on their own.
@ianna What kind of parameters the users might use? One I think will be |
I forgot to check that all the node types can have parameters, all different from each other. Some of them, like At the C++ level, the parameters can just be a string. In Python, it will have to resolve to something JSON-formatted, but the C++ doesn't have to enforce that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ManasviGoyal - beginning identifiers with an underscore is considered poor programming style.
Hi, @jpivarski. Should we keep the type of offsets, starts, stops, mask, etc. fixed in the code or should we make them templated parameters so that users can use different types as needed? |
The types of all of these Indexes (offsets, starts, stops, mask, etc.) should be templated. Most of them can be You could check the v1 Content subclasses, in src/libawkward/array/*.cc, to see which templates are actually instantiated. LayoutBuilder should be reproducing the same set of types. |
I'd suggest that it goes to the next PR. |
Actually, I already added the changes. |
Fantastic! Is it ready for review? Thanks! |
Besides the BitMasked part that we were discussing, yes, everything else is ready for review. |
Thanks! Yes, I have a solution for that, but let's do it in the next PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a question about including GrowableBuffer parameters as compile-time template parameters, but this is looking good for merging now. Any minor issues can be picked up in later PRs.
Builder builder; | ||
}; | ||
|
||
template <unsigned INITIAL, typename PRIMITIVE> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, the initial GrowableBuffer size has to be a compile-time constant for a C++ reason: you need to use default constructors. (Why is that?)
And if GrowableBuffers are parameterized by more than one parameter—if they need to have a different first panel size from subsequent panel sizes as we discussed in the last meeting—do all of those have to become template parameters?
If this is the case, then we need to provide type aliases as part of the header-only package, with some choice of GrowableBuffer parameters hard-coded. C++ users of this package need to be able to say LayoutBuilder::Numpy<float>
without having to set up an alias on their own.
Thanks, I'll merge this one and rebase #1560 that restores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks.
* Added GrowableBuffer.h * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added namespace * added methods and description * overloaded append method for adding an array * changed path of GrowableBuffer.h * start working on LayoutBuilder.h * some tests for LayoutBuilder.h * remove templete parameter for offset in ListOffsetLayoutBuilder * added forms * modified RecordLayoutBuilder and some other changes * modified previous tests and added new tests * changes to fix form * changes in layout builder and growable buffer * change back * fixed ListOffsetLayoutBuilder for nested lists * added ListLayoutBuilder * added IndexLayoutBuilder * added UnmaskedLayoutBuilder * added IndexedOptionLayoutBuilder * changed back to vector of unique_ptr * fixed clear() and dump() in recordlayoutbuilder * fixed form and indexedlayoutbuilder index type * fixed some tests and added more tests * added test-listoffset-of-numpy * introduce form_key_id parameter * add licence info * nontype template id parameter * added default value for forms and other miscellaneous changes * overload append method and fix indexedoptionlayoutbuilder * modified and added more tests for different builders * fixed the order of methods * removed redundant code * fix error in type_to_form() * modify to_buffers, remove dump() and other changes * add dump() * fix ListOffsetLayoutBuilder length() * add EmptyLayoutBuilder * addressed Ianna's comments * removed extra headers * cleanup warnings * Start writing prototype for all Forms. * Just RecordForm and UnionForm are left. * Finished RecordForm and UnionForm. * Correctly handling records/tuples with no fields and regular arrays of size 0. * Nicely formatted. * add more types to fix NumpyLayoutBuilder form * remove begun_ * fix append functions and rename append array as extend * make methods noexcept and fix tests * rename builders and add sub namespace * added set_id method in builders * added is_valid method * added buffer_nbytes method * added parameters * some fixes * added RegularLayoutBuilder and other changes * added ByteMaskedLayoutBuilder * added EmptyRecordLayoutBuilder and set_parameters * added BitMasked * added BitMaskedLayoutBuilder * update BitMasked * fix merge issues * misc changes * add layout builder test to cmake * add field and field_append to Record, restructure and cleanup * remove variant and builder pointers * remove -> syntax, use . * fix record and modify tests * add IndexedOption_Record test * fix record and modify tests * fix headers and method names * use enum and user-defined map to std::string for the field names * is_valid for record * use error std::string for validity checks * add Tuple and its test * add Union and a test * fix union and test * add union form and test * test tuple * fixes and cleanup * remove underscore * remove append method from class Field * fix form parameters and add tests * add current_length() * templated builder index and fixed form * fix include path Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ianna Osborne <ianna.osborne@cern.ch> Co-authored-by: Jim Pivarski <jpivarski@gmail.com>
To develop a templated GrowableBuffer and the Layout builders that use this buffer. There will be two headers: GrowableBuffer.h and LayoutBuilder.h. This will allow using them in an external project without linking to the awkward libraries.
This can be a separate PR:
Documentation of LayoutBuilder.h