-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Intervals #258
Conversation
src/nanoarrow/nanoarrow_types.h
Outdated
enum ArrowIntervalUnit unit; | ||
|
||
/// \brief Pointer containing the underlying data for the interval | ||
void* data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a union
is preferable here to avoid an extra pointer indirection?
Codecov Report
@@ Coverage Diff @@
## main #258 +/- ##
==========================================
- Coverage 87.17% 87.15% -0.03%
==========================================
Files 66 63 -3
Lines 10061 9753 -308
==========================================
- Hits 8771 8500 -271
+ Misses 1290 1253 -37
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I have one suggestion about simplifying the ArrowInterval
but am excited to have support for this!
src/nanoarrow/nanoarrow_types.h
Outdated
/// \brief A representation of an interval. | ||
struct ArrowInterval { | ||
/// \brief The type of interval being used | ||
enum ArrowIntervalUnit unit; | ||
|
||
/// \brief Pointer containing the underlying data for the interval | ||
union { | ||
int32_t year_month; | ||
struct ArrowIntervalDayTime day_time; | ||
struct ArrowIntervalMonthDayNano month_day_nano; | ||
} data; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ArrowDecimal
we do something more along the lines of "declare more fields than needed since the caller probably stack-allocated one of these for whatever loop they're in". The definition here is OK too, but for the sake of argument:
struct ArrowInterval {
enum ArrowType type;
int32_t month;
int32_t day;
int64_t time;
int64_t nano;
};
...might be simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time
is supposed to represent milliseconds in that struct right? Not a strong preference but I think using the unions is a little easier as an end user. With the current design I was thinking an anonymous union would be the best, though that would require C11 instead of just C99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being can we parallel the ArrowDecimal
pattern (i.e., one flat struct
+ an ArrowIntervalInit()
helper) for consistency? There are almost certainly better names than month
/day
/type
/nano
that perhaps give some hint as to how the spec defines them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea makes sense. Actually looking in CI I don't think Cython supports using a union within a struct, so this is probably the only way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with flattening to one struct I think we will still need to keep structs like ArrowIntervalDayTime
and ArrowIntervalMonthDayNano
to unpack the data buffer - is that in line with your expectation or am I misinterpreting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I see reinterpret_cast<const struct ArrowIntervalMonthDayNano*>
and friends. In #214 we opted to memcpy()
instead of reinterpret_cast<>
( #214 (comment) ), although I'm aware we reinterpret_cast<>
and make it easy for others to do so with the int/float/double primitives. I think the idea was to not continue that for anything more exotic than an int64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we still need something to memcpy()
into though...I'm much more interested in having interval support than having the struct be exactly like ArrowDecimal
(although ArrowIntervalUnit
can probably just be ArrowType
unless there's another distinction I'm missing?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch - simplified that
Any idea what I might be overlooking with the tests and how they allocate memory? Locally I get this valgrind traceback: ==710574== at 0x4843828: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==710574== by 0x1C6BFD: ArrowMalloc (utils.c:194)
==710574== by 0x1BF55A: ArrowArrayInitFromType (array.c:139)
==710574== by 0x143D36: ArrayTest_ArrayTestAppendToIntervalArrayMonthDayNano_Test::TestBody() (array_test.cc:934)
==710574== by 0x2048CA: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607)
==710574== by 0x1FC940: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643)
==710574== by 0x1D002F: testing::Test::Run() (gtest.cc:2682)
==710574== by 0x1D0AFD: testing::TestInfo::Run() (gtest.cc:2861)
==710574== by 0x1D1426: testing::TestSuite::Run() (gtest.cc:3015)
==710574== by 0x1E13F4: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:5855)
==710574== by 0x2058C9: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2607)
==710574== by 0x1FDAD2: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2643) Not immediately clear what is different from the other tests in terms of building the array |
Ignore comment above - I see that |
Yes, you'll need to call |
The Python issue is because we auto-generate the |
I don't think the Python error is related? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the Python failure is happening on main too (maybe Cython 3 releated).
Just a few nits!
src/nanoarrow/nanoarrow_types.h
Outdated
static inline void ArrowIntervalInit(struct ArrowInterval* interval, enum ArrowType unit, | ||
int32_t months, int32_t days, int32_t ms, | ||
int64_t ns) { | ||
interval->unit = unit; | ||
interval->months = months; | ||
interval->days = days; | ||
interval->ms = ms; | ||
interval->ns = ns; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just zero-initialize and set unit
and encourage direct field access like ArrowDecimal
? Most users won't use all the fields at once.
int32_t days; | ||
int32_t ms; | ||
int64_t ns; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
I think this will need the fields documented for them to show up in the generated documentation (and \ingroup nanoarrow-utils
for it to show up in the right place).
We have a lot of ArrowType
variables that are named type
...would that be appropriate here?
dist/nanoarrow.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the change to this file? I should add the hook to modify it to pre-commit but right now it's updated via a nightly job (for better or worse).
As noted in #258, the Python tests are failing! I'm not sure the details of why this started now, but `StopIteration` was being raised in a generator which should just return instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
(I should have merged this before the pre-commit one...I'll take care of that 😬 )
No description provided.