Skip to content

Commit

Permalink
apacheGH-37572: [MATLAB] Add arrow.array.Date64Array class (apache#…
Browse files Browse the repository at this point in the history
…37581)

### Rationale for this change

Now that `arrow.type.Date64Type` class has been added to the MATLAB Interface (apache#37578), we can add the `arrow.array.Date64Array` class.

`Date64Array`s can be created from MATLAB [`datetime`](https://www.mathworks.com/help/matlab/ref/datetime.html) values. 

### What changes are included in this PR?

1. Added a new `arrow.array.Date64Array` class.
2. Added a new `arrow.type.traits.Date64Traits` class.
3. Added `arrow.type.Date64Type` support to `arrow.type.traits.traits` function.
4. Factored out `convertToEpochTime` method on `TimestampArray` into internal helper function `arrow.array.internal.temporal.convertDatetimeToEpochTime`.
5. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `Date64Array`.

`Date64Array`s can be created from MATLAB [`datetime`](https://www.mathworks.com/help/matlab/ref/datetime.html) values using the `fromMATLAB` method. `Date64Array`s can be converted to MATLAB `datetime` values using the `toMATLAB` method.

**Example**
```matlab
>> dates = datetime + milliseconds(1:5)'

dates = 

  5×1 datetime array

   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12

% "SSS" displays fractional seconds (i.e. milliseconds)
>> dates.Format = "MMM dd, yyyy HH:mm:ss SSS"

dates = 

  5×1 datetime array

   Sep 05, 2023 16:47:12 933
   Sep 05, 2023 16:47:12 934
   Sep 05, 2023 16:47:12 935
   Sep 05, 2023 16:47:12 936
   Sep 05, 2023 16:47:12 937

>> array = arrow.array.Date64Array.fromMATLAB(dates)

array = 

[
  2023-09-05,
  2023-09-05,
  2023-09-05,
  2023-09-05,
  2023-09-05
]

>> array.toMATLAB

ans = 

  5×1 datetime array

   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12
   05-Sep-2023 16:47:12

% Milliseconds are preserved on round-trip
>> ans.Format = "MMM dd, yyyy HH:mm:ss SSS"

ans = 

  5×1 datetime array

   Sep 05, 2023 16:47:12 933
   Sep 05, 2023 16:47:12 934
   Sep 05, 2023 16:47:12 935
   Sep 05, 2023 16:47:12 936
   Sep 05, 2023 16:47:12 937
```

### Are these changes tested?

1. Added a new `tDate64Array` test class.
2. Added `Date64` related test to `ttraits.m`.
3. Added a new `tDate64Traits.m` test class.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.array.Date64Array`s  from MATLAB `datetime`s.

### Future Directions

1. Add round-trip precision tests for `TimestampArray` (i.e. similar to the test case `TestInt64MaxMilliseconds`).
2. Add a way to extract the raw `int64` values from an `arrow.array.Date64Array` without converting to a MATLAB `datetime` using `toMATLAB`.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
* Closes: apache#37572

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
  • Loading branch information
kevingurney and sgilmore10 authored Sep 6, 2023
1 parent ad7f6ef commit 33b714e
Show file tree
Hide file tree
Showing 14 changed files with 537 additions and 25 deletions.
2 changes: 2 additions & 0 deletions matlab/src/cpp/arrow/matlab/array/proxy/wrap.cc
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ namespace arrow::matlab::array::proxy {
return std::make_shared<proxy::NumericArray<arrow::Time64Type>>(std::static_pointer_cast<arrow::Time64Array>(array));
case ID::DATE32:
return std::make_shared<proxy::NumericArray<arrow::Date32Type>>(std::static_pointer_cast<arrow::Date32Array>(array));
case ID::DATE64:
return std::make_shared<proxy::NumericArray<arrow::Date64Type>>(std::static_pointer_cast<arrow::Date64Array>(array));
case ID::STRING:
return std::make_shared<proxy::StringArray>(std::static_pointer_cast<arrow::StringArray>(array));
default:
Expand Down
1 change: 1 addition & 0 deletions matlab/src/cpp/arrow/matlab/proxy/factory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ libmexclass::proxy::MakeResult Factory::make_proxy(const ClassName& class_name,
REGISTER_PROXY(arrow.array.proxy.Time32Array , arrow::matlab::array::proxy::NumericArray<arrow::Time32Type>);
REGISTER_PROXY(arrow.array.proxy.Time64Array , arrow::matlab::array::proxy::NumericArray<arrow::Time64Type>);
REGISTER_PROXY(arrow.array.proxy.Date32Array , arrow::matlab::array::proxy::NumericArray<arrow::Date32Type>);
REGISTER_PROXY(arrow.array.proxy.Date64Array , arrow::matlab::array::proxy::NumericArray<arrow::Date64Type>);
REGISTER_PROXY(arrow.array.proxy.ChunkedArray , arrow::matlab::array::proxy::ChunkedArray);
REGISTER_PROXY(arrow.tabular.proxy.RecordBatch , arrow::matlab::tabular::proxy::RecordBatch);
REGISTER_PROXY(arrow.tabular.proxy.Schema , arrow::matlab::tabular::proxy::Schema);
Expand Down
6 changes: 6 additions & 0 deletions matlab/src/cpp/arrow/matlab/type/proxy/traits.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include "arrow/matlab/type/proxy/time32_type.h"
#include "arrow/matlab/type/proxy/time64_type.h"
#include "arrow/matlab/type/proxy/date32_type.h"
#include "arrow/matlab/type/proxy/date64_type.h"
#include "arrow/matlab/type/proxy/string_type.h"

namespace arrow::matlab::type::proxy {
Expand Down Expand Up @@ -105,4 +106,9 @@ namespace arrow::matlab::type::proxy {
struct Traits<arrow::Date32Type> {
using TypeProxy = Date32Type;
};

template <>
struct Traits<arrow::Date64Type> {
using TypeProxy = Date64Type;
};
}
3 changes: 3 additions & 0 deletions matlab/src/cpp/arrow/matlab/type/proxy/wrap.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include "arrow/matlab/type/proxy/time32_type.h"
#include "arrow/matlab/type/proxy/time64_type.h"
#include "arrow/matlab/type/proxy/date32_type.h"
#include "arrow/matlab/type/proxy/date64_type.h"
#include "arrow/matlab/type/proxy/string_type.h"

namespace arrow::matlab::type::proxy {
Expand Down Expand Up @@ -59,6 +60,8 @@ namespace arrow::matlab::type::proxy {
return std::make_shared<Time64Type>(std::static_pointer_cast<arrow::Time64Type>(type));
case ID::DATE32:
return std::make_shared<Date32Type>(std::static_pointer_cast<arrow::Date32Type>(type));
case ID::DATE64:
return std::make_shared<Date64Type>(std::static_pointer_cast<arrow::Date64Type>(type));
case ID::STRING:
return std::make_shared<StringType>(std::static_pointer_cast<arrow::StringType>(type));
default:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
% Converts MATLAB datetime values to integer "Epoch time" values which
% represent the number of "ticks" since the UNIX Epoch (Jan-1-1970) with
% respect to the specified TimeUnit / DateUnit.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

function epochTime = convertDatetimeToEpochTime(datetimes, unit)
epochTime = zeros(size(datetimes), "int64");
indices = ~isnat(datetimes);

% convertTo uses the Unzoned UNIX Epoch Jan-1-1970 as the default Epoch.
% If the input datetime array has a TimeZone, then a Zoned UNIX Epoch
% of Jan-1-1970 UTC is used instead.
%
% TODO: convertTo may error if the datetime is 2^63-1 before or
% after the epoch. We should throw a custom error in this case.
epochTime(indices) = convertTo(datetimes(indices), "epochtime", TicksPerSecond=ticksPerSecond(unit));
end
77 changes: 77 additions & 0 deletions matlab/src/matlab/+arrow/+array/Date64Array.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
% arrow.array.Date64Array

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Date64Array < arrow.array.Array

properties(Access=private)
NullSubstitutionValue = NaT
end

methods

function obj = Date64Array(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.array.proxy.Date64Array")}
end
import arrow.internal.proxy.validate
obj@arrow.array.Array(proxy);
end

function dates = toMATLAB(obj)
epochTime = obj.Proxy.toMATLAB();

ticksPerSecond = obj.Type.DateUnit.ticksPerSecond();

% UNIX Epoch (January 1st, 1970).
unixEpoch = datetime(0, ConvertFrom="posixtime", TimeZone="UTC");
dates = datetime(epochTime, ConvertFrom="epochtime", Epoch=unixEpoch, ....
TicksPerSecond=ticksPerSecond);

dates(~obj.Valid) = obj.NullSubstitutionValue;
end

function dates = datetime(obj)
dates = obj.toMATLAB();
end

end

methods(Static)

function array = fromMATLAB(data, opts)
arguments
data
opts.InferNulls(1, 1) logical = true
opts.Valid
end

import arrow.array.Date64Array

arrow.internal.validate.type(data, "datetime");
arrow.internal.validate.shape(data);

validElements = arrow.internal.validate.parseValidElements(data, opts);
epochTime = arrow.array.internal.temporal.convertDatetimeToEpochTime(data, arrow.type.DateUnit.Millisecond);

args = struct(MatlabArray=epochTime, Valid=validElements);
proxy = arrow.internal.proxy.create("arrow.array.proxy.Date64Array", args);
array = Date64Array(proxy);
end

end

end
32 changes: 8 additions & 24 deletions matlab/src/matlab/+arrow/+array/TimestampArray.m
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@
end

function dates = toMATLAB(obj)
time = obj.Proxy.toMATLAB();
epochTime = obj.Proxy.toMATLAB();

epoch = datetime(1970, 1, 1, TimeZone="UTC");
timeZone = obj.Type.TimeZone;
ticksPerSecond = obj.Type.TimeUnit.ticksPerSecond();

tz = obj.Type.TimeZone;
ticsPerSecond = ticksPerSecond(obj.Type.TimeUnit);

dates = datetime(time, ConvertFrom="epochtime", Epoch=epoch, ...
TimeZone=tz, TicksPerSecond=ticsPerSecond);
% UNIX Epoch (January 1st, 1970).
unixEpoch = datetime(0, ConvertFrom="posixtime", TimeZone="UTC");
dates = datetime(epochTime, ConvertFrom="epochtime", Epoch=unixEpoch, ...
TimeZone=timeZone, TicksPerSecond=ticksPerSecond);

dates(~obj.Valid) = obj.NullSubstitutionValue;
end
Expand All @@ -48,21 +48,6 @@
end
end

methods (Static, Access = private)
function time = convertToEpochTime(dates, units)

time = zeros(size(dates), "int64");
indices = ~isnat(dates);

% convertTo uses Jan-1-1970 as the default epoch. If the input
% datetime array has a TimeZone, the epoch is Jan-1-1970 UTC.
%
% TODO: convertTo may error if the datetime is 2^63-1 before or
% after the epoch. We should throw a custom error in this case.
time(indices) = convertTo(dates(indices), "epochtime", TicksPerSecond=ticksPerSecond(units));
end
end

methods(Static)
function array = fromMATLAB(data, opts)
arguments
Expand All @@ -74,9 +59,8 @@

arrow.internal.validate.type(data, "datetime");
arrow.internal.validate.shape(data);

validElements = arrow.internal.validate.parseValidElements(data, opts);
epochTime = arrow.array.TimestampArray.convertToEpochTime(data, opts.TimeUnit);
epochTime = arrow.array.internal.temporal.convertDatetimeToEpochTime(data, opts.TimeUnit);
timezone = string(data.TimeZone);

args = struct(MatlabArray=epochTime, Valid=validElements, TimeZone=timezone, TimeUnit=string(opts.TimeUnit));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@
end

function dateClasses = getDateArrayClasses()
dateClasses = compose("arrow.array.Date%dArray", 32);
dateClasses = compose("arrow.array.Date%dArray", [32 64]);
end

function number = randomNumbers(numberType, numElements)
Expand Down
30 changes: 30 additions & 0 deletions matlab/src/matlab/+arrow/+type/+traits/Date64Traits.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Date64Traits < arrow.type.traits.TypeTraits

properties (Constant)
ArrayConstructor = @arrow.array.Date64Array
ArrayClassName = "arrow.array.Date64Array"
ArrayProxyClassName = "arrow.array.proxy.Date64Array"
ArrayStaticConstructor = @arrow.array.Date64Array.fromMATLAB
TypeConstructor = @arrow.type.Date64Type;
TypeClassName = "arrow.type.Date64Type"
TypeProxyClassName = "arrow.type.proxy.Date64Type"
MatlabConstructor = @datetime
MatlabClassName = "datetime"
end

end
2 changes: 2 additions & 0 deletions matlab/src/matlab/+arrow/+type/+traits/traits.m
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@
typeTraits = Time64Traits();
case ID.Date32
typeTraits = Date32Traits();
case ID.Date64
typeTraits = Date64Traits();
otherwise
error("arrow:type:traits:UnsupportedArrowTypeID", "Unsupported Arrow type ID: " + type);
end
Expand Down
15 changes: 15 additions & 0 deletions matlab/src/matlab/+arrow/+type/DateUnit.m
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,19 @@
Millisecond (1)
end

methods (Hidden)

function ticks = ticksPerSecond(obj)
import arrow.type.DateUnit
switch obj
case DateUnit.Millisecond
ticks = 1e3;
otherwise
error("arrow:dateunit:UnsupportedTicksPerSecond", ...
"The ticksPerSecond method can only be called on a DateUnit of type Millisecond.");
end
end

end

end
Loading

0 comments on commit 33b714e

Please sign in to comment.