-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Integrate SOMAColumn
: Arrow adapter methods, part 1
#3405
base: main
Are you sure you want to change the base?
[c++] Integrate SOMAColumn
: Arrow adapter methods, part 1
#3405
Conversation
SOMAColumn
in Arrow adapter methods [WIP]SOMAColumn
in Arrow adapter methods Part 2
SOMAColumn
in Arrow adapter methods Part 2SOMAColumn
in Arrow adapter methods Part 2
SOMAColumn
in Arrow adapter methods Part 2SOMAColumn
in Arrow adapter methods, part 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the Skip
and Take
are for and document it? It looks like Take
is the index of the column to retrieve and Skip
is relevant only for geometry columns (where it's always 2)?
Also is there a way to use std::variant
or a templated type instead of std::any
or would that make things too complicated?
/** | ||
* Return a copy of the data in a specified column of an arrow table. | ||
* Complex column types are supported. The for each sub column are an | ||
* std::array<T, 2> casted as an std::any object. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* Return a copy of the data in a specified column of an arrow table. | |
* Complex column types are supported. The for each sub column are an | |
* std::array<T, 2> casted as an std::any object. | |
*/ | |
/** | |
* Return a copy of the data in a specified column of an arrow table. | |
* Complex column types are supported. The type for each sub column is | |
* an std::array<T, 2> casted as an std::any object. | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip
and Take
are used in 2 places with 2 specific sets on values (either Skip=3
and Take=2
or Skip=0
and Take=2
) and are independent of the geometry column. Their usage is to extract specific subranges of ArrowArray data and they come in handy during ArrowSchema -> TileDBSchema
where the arrow array provided has 5 values per dimension and we only need the last 2 to set the current domain.
As to using std::variant
, adding more SOMAColumn
types would require changing multiple variants. The use of std::any
here is to enable runtime polymorphism and indirectly introduces a runtime type check (via any_cast
, make_any
) between the templated function and the actual dimension type. std::variant
can provide all the above it is just a different style I am open to discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XanthosXanthopoulos to be clear, I'm not asking you to use a different solution. I'm asking to be clear to the reader the (very good!) reasons you chose the solution you're using.
Please take both paragraphs you just wrote, and put them in the code as comments as well. They are highly informative. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry if I was unclear too. This explaination is good, and I was just looking for it to also be documented in the code.
804c87e
to
a58361a
Compare
5485141
to
0e69ed7
Compare
SOMAColumn
in Arrow adapter methods, part 2SOMAColumn
: Arrow adapter methods, part 1
0e69ed7
to
d6d6187
Compare
8daf17e
to
a426c7a
Compare
d6d6187
to
af1f010
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3405 +/- ##
==========================================
+ Coverage 86.22% 86.32% +0.09%
==========================================
Files 55 55
Lines 6410 6375 -35
==========================================
- Hits 5527 5503 -24
+ Misses 883 872 -11
Flags with carried forward coverage won't be shown. Click here to find out more.
|
af1f010
to
2401416
Compare
…nt domain checks, replace vector with span when selecting points
…nt domain checks, replace vector with span when selecting points
…nt domain checks, replace vector with span when selecting points
2401416
to
77a01e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the get_table_column_by_name
function, but everything else looks good to me. However, someone with more familiarity with the arrow adapter code should look over this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @XanthosXanthopoulos !
index_column_schema->children[i]->name)); | ||
} | ||
|
||
if ((*column)->tiledb_dimensions().has_value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward.
column
needs to becolumn_it
or some such -- it is not a column, it is an iterator- Then,
const auto column = *column_it
after you check thatcolumn _it != columns.end()
- Then the rest of these
(*column)->foo
becomecolumn->foo
as they should be
std::string name); | ||
/** | ||
* Return a copy of the data in a specified column of an arrow table. | ||
* Complex column types are supported. The type for each sub column is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Complex column types are supported. The type for each sub column is an | |
* Complex column types are supported. The type for each subcolumn is an |
} | ||
|
||
/** | ||
* Read a part of am Arrow array to an std::array and cast in to and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Read a part of am Arrow array to an std::array and cast in to and | |
* Read a part of an Arrow array to an std::array and cast in to an |
* | ||
* @example get_table_any_column<3, 2>(array, schema) will return an | ||
* std::array<T, 3> where T is the appropriate type based on the Arrow array | ||
* format casted as an std::any object. The std::array will skip 2 element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* format casted as an std::any object. The std::array will skip 2 element | |
* format casted as an std::any object. The std::array will skip 2 elements |
REQUIRE(str_range[0] < " "); | ||
REQUIRE(str_external.first <= " "); | ||
REQUIRE(str_range[1] > "~"); | ||
// REQUIRE(str_external.second >= "~"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
@@ -98,4 +98,22 @@ std::vector<uint8_t> cast_bit_to_uint8(ArrowSchema* schema, ArrowArray* array) { | |||
return casted; | |||
} | |||
|
|||
std::shared_ptr<SOMAColumn> find_column_by_name( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 😎
* @tparam Take The number of elements to read. | ||
* @tparam Skip The number of elements to skip. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why Take
and Skip
are template parameters and not simply function arguments. (The latter seems simpler.)
After having read through your code here, the answer seems to be: because you are constructing std::array
objects (which are fixed-size) in various places, rather than std::vector
(which are not fixed-size).
It's a good reason to have Take
and Skip
be implemented as template parameters, but I found it non-obvious.
Please come out and state this reasoning, right here, as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the remaining change requests are addressed.
This PR replaces the Arrow schema to TileDB schema transformation to use the
SOMAColumn
create methods.Also there are a set of new data converters from arrow arrays to
std::array
for simplification.This migration also enforces a current domain restriction for string dimensions to libtiledbsoma in addition to the restriction being present only on the R and Python APIs.