Skip to content

ARROW-835: [Format][C++][Java] Create a new Duration type #3644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

emkornfield
Copy link
Contributor

@emkornfield emkornfield commented Feb 14, 2019

  • Create a new DurationInterval type in format.
  • Implement all interval types in C++ and Java including integration test.

Once this is checked in, I think https://issues.apache.org/jira/browse/ARROW-352 can be resolved as a won't fix?

@emkornfield emkornfield changed the title ARROW-352: [WIP][Format] Add time resolution to Interval by creating a new type ARROW-352:[Format] Add time resolution to Interval by creating a new type Feb 15, 2019
@codecov-io
Copy link

codecov-io commented Feb 15, 2019

Codecov Report

Merging #3644 into master will decrease coverage by <.01%.
The diff coverage is 84.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3644      +/-   ##
==========================================
- Coverage   88.07%   88.06%   -0.01%     
==========================================
  Files         773      774       +1     
  Lines       96887    97162     +275     
  Branches     1251     1251              
==========================================
+ Hits        85329    85566     +237     
- Misses      11322    11360      +38     
  Partials      236      236
Impacted Files Coverage Δ
cpp/src/arrow/ipc/read-write-test.cc 99.81% <ø> (ø) ⬆️
cpp/src/arrow/ipc/json-test.cc 98.46% <ø> (ø) ⬆️
cpp/src/arrow/visitor.h 50% <ø> (ø) ⬆️
cpp/src/arrow/scalar.h 88.23% <ø> (ø) ⬆️
cpp/src/arrow/array/builder_primitive.h 95.86% <ø> (ø) ⬆️
cpp/src/arrow/type_traits.h 85.71% <0%> (-7.25%) ⬇️
cpp/src/arrow/array/builder_dict.cc 86.06% <0%> (-1.44%) ⬇️
cpp/src/arrow/pretty_print.cc 83.48% <0%> (-1.6%) ⬇️
cpp/src/arrow/visitor.cc 0% <0%> (ø) ⬆️
cpp/src/arrow/type-test.cc 99.21% <100%> (+0.05%) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ca6fe8...973d80f. Read the comment docs.

@emkornfield emkornfield changed the title ARROW-352:[Format] Add time resolution to Interval by creating a new type ARROW-352:[Format][Java] Add time resolution to Interval by creating a new type Feb 23, 2019
@emkornfield
Copy link
Contributor Author

@wesm could you look at the format changes to see if you are ok with them?
@pravindra Can you look at the impact on the java side (I added all the necessary java classes I think) to confirm this is acceptable for the Java?

@emkornfield
Copy link
Contributor Author

Failure was NodeJs

@kou kou changed the title ARROW-352:[Format][Java] Add time resolution to Interval by creating a new type ARROW-352: [Format][Java] Add time resolution to Interval by creating a new type Feb 24, 2019
@pravindra
Copy link
Contributor

since this is a format change (backward compatible though), I'll wait for @wesm to approve too before merging.

@emkornfield
Copy link
Contributor Author

@pravindra thanks for the review

/// millisecond, but can be any of the other supported TimeUnit values as
/// with Timestamp and Time types. This type is always represented as
/// an 8-byte integer.
table DurationInterval {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over the old e-mail threads it might be that TimeDelta or TimestampDelta is a better name for this type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you happy with "DurationInterval"? It could also be called simply "Duration" or "TimeDuration"

@emkornfield
Copy link
Contributor Author

@wesm based on comments on the other JIRAs CLs it seems like it would be desirable to add an implementation for C++, I can add that to this CL but would you mind taking a look to see if you are ok with naming/modeling (I would prefer not to have to refactor just due to naming issues if possible).

@emkornfield emkornfield changed the title ARROW-352: [Format][Java] Add time resolution to Interval by creating a new type ARROW-352: [WIP][Format][C++] Add time resolution to Interval by creating a new type Apr 12, 2019
@emkornfield emkornfield changed the title ARROW-352: [WIP][Format][C++] Add time resolution to Interval by creating a new type ARROW-835: [WIP][Format][C++] Add time resolution to Interval by creating a new type Apr 29, 2019
@emkornfield emkornfield changed the title ARROW-835: [WIP][Format][C++] Add time resolution to Interval by creating a new type ARROW-835: [WIP][Format][C++][Java] Create a new DurationInterval type Apr 29, 2019
@emkornfield emkornfield changed the title ARROW-835: [WIP][Format][C++][Java] Create a new DurationInterval type ARROW-835: [Format][C++][Java] Create a new DurationInterval type Apr 30, 2019
@emkornfield
Copy link
Contributor Author

I believe the Python on OS/X error is spurious (it was green on previous build).

@wesm
Copy link
Member

wesm commented May 3, 2019

Will work on reviewing the C++ side of this

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ side of this, and the changes to the Flatbuffers files, all looks good to me. Thanks @emkornfield for slogging through this and implementing C++, Java, and integration tests in one go!

One question I have is whether we want to commit to the name "DurationInterval" rather than alternatives such as "Duration" or "TimeDelta" or "TimeDuration". I don't think the names are sacred

Can someone (@siddharthteotia ?) carve out a little time to review the Java so we can try to get this merged soon?

{ class: "DurationIntervalMicro", javaType: "long", friendlyType: "Duration",
arrowType: "org.apache.arrow.vector.types.pojo.ArrowType.DurationInterval"}
{ class: "DurationIntervalNano", javaType: "long", friendlyType: "Duration",
arrowType: "org.apache.arrow.vector.types.pojo.ArrowType.Duration"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's possible to collapse this into a single class? There was an effort to do this for Timestamp (with a TimeUnit class) but it failed only because Dremio had taken on dependency with the TimestampUNIT classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, I followed the pattern already established in the codebase, but I'm open to trying if it is desirable. @pravindra @siddharthteotia do you think this is desirable/possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds okay to me. Let's see what @siddharthteotia and @pravindra say.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, having a single class is definitely desirable. Single this one is starting on a fresh slate, it's worth a try.

@@ -650,6 +650,36 @@ class ARROW_EXPORT FixedSizeBinaryArray : public PrimitiveArray {
int32_t byte_width_;
};

/// DayTimeArray
/// ---------------------
/// \brief Array of Day and Millisecond values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm a bit disappointed that we aren't deprecating this type =/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to get dragged into what it means deprecate this, should be easy enough to do if we want it. As noted below I tried to say it was optional which I seem to recall seeing this was ok on a previous ML thread.

using Time32Builder = NumericBuilder<Time32Type>;
using Time64Builder = NumericBuilder<Time64Type>;
using Date32Builder = NumericBuilder<Date32Type>;
using Date64Builder = NumericBuilder<Date64Type>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant with type_fwd.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it appears so, I'll push a commit removing them. It looks like there might be more redundancy in builder_primitive.h

@@ -111,6 +112,18 @@ Status MakeBuilder(MemoryPool* pool, const std::shared_ptr<DataType>& type,
DictionaryBuilderCase visitor = {pool, dict_type, out};
return VisitTypeInline(*dict_type.dictionary()->type(), &visitor);
}
case Type::INTERVAL: {
const auto& interval_type = dynamic_cast<const IntervalType&>(*type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think static_cast (or checked_cast) is sufficient here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std::shared_ptr<DataType> ARROW_EXPORT day_time_interval();

/// \brief Return an MonthIntervalType instance
std::shared_ptr<DataType> ARROW_EXPORT month_interval();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as far as naming I could see an argument for using

  • interval_duration
  • interval_day_time
  • interval_month

but I'm OK with the way things are here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this to, but I'm happier without the reverse notation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note "duration" conflicts with some of our time classes so I renamed to "duration_type", hopefully that is ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"some of our time classes"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to look back I think it was in the vendored time library we were using. I can look deeper if you don't think there should be a conflict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking again they shouldn't conflict, I will try to fix this today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed. had to touch some of the vendored code.

// 4-byte integers.
// DAY_TIME - Indicates the number of elapsed days and milliseconds,
// stored as 2 contiguous 32-bit integers (8-bytes in total). Support
// of this IntervalUnit is not required for full arrow compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. If it isn't being deprecated then it's reasonable to implement it in C++ and have integration tests

/// millisecond, but can be any of the other supported TimeUnit values as
/// with Timestamp and Time types. This type is always represented as
/// an 8-byte integer.
table DurationInterval {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you happy with "DurationInterval"? It could also be called simply "Duration" or "TimeDuration"

@emkornfield
Copy link
Contributor Author

@wesm Thanks for the review I'll try to address comments in the next few days. In terms of naming, I don't know the etiquette for changing the name once it gets approved on the ML. If was going to change it, I think Duration would be my choice.

@pravindra already reviewed most of the java code, the main changes that happened where plumbing through support for the integration test.

@@ -349,6 +355,9 @@ template class DictionaryBuilder<Date64Type>;
template class DictionaryBuilder<Time32Type>;
template class DictionaryBuilder<Time64Type>;
template class DictionaryBuilder<TimestampType>;
template class DictionaryBuilder<DurationIntervalType>;
template class DictionaryBuilder<DayTimeIntervalType>;
template class DictionaryBuilder<MonthIntervalType>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for your information.

These changes will be conflicted with #4255.
If #4255 is merged, we will be able to just remove these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, they were removed.

@emkornfield emkornfield changed the title ARROW-835: [Format][C++][Java] Create a new DurationInterval type ARROW-835: [Format][C++][Java] Create a new Duration type May 13, 2019
@emkornfield
Copy link
Contributor Author

@wesm I think I addressed all your feedback. The OS/X build seems to in an infinite CMake loop with something like

  the following targets are still affected by RPATH settings:
   arrow_python_shared
   arrow_shared
   arrow_testing_shared
   parquet_shared
   plasma_shared

I'm not sure why my change would have caused this. Any thoughts?

@pravindra I was able to consolidate all the java vectors into one, could you take a look?

@emkornfield emkornfield force-pushed the new_type branch 3 times, most recently from 2913b09 to 773a2b0 Compare May 15, 2019 19:28
@wesm
Copy link
Member

wesm commented May 15, 2019

Baffling

@wesm
Copy link
Member

wesm commented May 15, 2019

It must have something to do with this being an older PR. Is Appveyor enabled on your fork? If not I can push this to my fork to get an Appveyor build going

@emkornfield
Copy link
Contributor Author

No appveyor isn't enabled on my fork. let me try squashing everything, see if that helps.

and mark DayTimeInterval as deprecated.

Provides implementation for Duration, DayTimeInterval and YearMonth
interval in C++ and Java (for duration).  Adds integration test.
@emkornfield
Copy link
Contributor Author

Ahh, squash seems to have worked, lets hope things stay green.

@wesm
Copy link
Member

wesm commented May 15, 2019

@emkornfield it looks you made some changes to vendored code which I think we should try to avoid doing in general, can we open a JIRA to restore to the vendored version and address the name conflict another way? I don't want this issue to block the patch since I need this to go in and then rebase #4316 on top

@wesm
Copy link
Member

wesm commented May 15, 2019

Appveyor appears to be about ~6 hours backed up so I'm going to push a branch on my fork to try to get a quicker thumbs up to merge this

@wesm
Copy link
Member

wesm commented May 15, 2019

Here's an Appveyor build starting now on my fork which will take a while to run https://ci.appveyor.com/project/wesm/arrow/builds/24577591

@emkornfield
Copy link
Contributor Author

Thanks. https://issues.apache.org/jira/browse/ARROW-5346 opened to revert the changes. Right now the only solution seems like reordering includes in some cases or fixing upstream. I will have to think about this.

@wesm
Copy link
Member

wesm commented May 16, 2019

Appveyor build looks OK, I'm going to wait a little while yet to make sure the MinGW build works before merging

@emkornfield
Copy link
Contributor Author

emkornfield commented May 16, 2019

@wesm appveyor on your branch looks all green.

@wesm wesm closed this in 6f80ea4 May 16, 2019
@wesm
Copy link
Member

wesm commented May 16, 2019

Awesome. Thanks @emkornfield!!

@emkornfield emkornfield deleted the new_type branch May 16, 2019 03:19
emkornfield added a commit to emkornfield/arrow that referenced this pull request May 16, 2019
- Create a new DurationInterval type in format.
- Implement all interval types in C++ and Java including integration test.

Once this is checked in, I think https://issues.apache.org/jira/browse/ARROW-352 can be resolved as a  won't fix?

Author: Micah Kornfield <emkornfield@gmail.com>

Closes apache#3644 from emkornfield/new_type and squashes the following commits:

cc6d39c <Micah Kornfield> remove system clock
daf8b4f <Micah Kornfield> duration_type->duration
b064816 <Micah Kornfield> Introduce a new Duration type can that represent time deltas, and mark DayTimeInterval as deprecated.
emkornfield added a commit to emkornfield/arrow that referenced this pull request May 17, 2019
- Create a new DurationInterval type in format.
- Implement all interval types in C++ and Java including integration test.

Once this is checked in, I think https://issues.apache.org/jira/browse/ARROW-352 can be resolved as a  won't fix?

Author: Micah Kornfield <emkornfield@gmail.com>

Closes apache#3644 from emkornfield/new_type and squashes the following commits:

cc6d39c <Micah Kornfield> remove system clock
daf8b4f <Micah Kornfield> duration_type->duration
b064816 <Micah Kornfield> Introduce a new Duration type can that represent time deltas, and mark DayTimeInterval as deprecated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants