Skip to content

Commit

Permalink
apacheGH-37367: [MATLAB] Add arrow.array.Date32Array class (apache#…
Browse files Browse the repository at this point in the history
…37445)

### Rationale for this change

Now that `arrow.type.Date32Type` class has been added to the MATLAB Interface (apache#37348), we can add the `arrow.array.Date32Array` class.

`Date32Array`s can be created from MATLAB [`datetime`](https://www.mathworks.com/help/matlab/ref/datetime.html) values. 

### What changes are included in this PR?

1. Added a new `arrow.array.Date32Array` class.
2. Added a new `arrow.type.traits.Date32Traits` class.
3. Added `arrow.type.Date32Type` support to `arrow.type.traits.traits` function.
4. Fixed typo `arrray` in `tTime32Array` test class.
5. Fixed bug in `numeric_array.h` where the `CType` rather than the `ArrowType` was being used to determine the `DataType` of an array class that is a `NumericArray<T>`.

`Date32Array`s can be created from MATLAB [`datetime`](https://www.mathworks.com/help/matlab/ref/datetime.html) values using the `fromMATLAB` method. `Date32Array`s can be converted to MATLAB `datetime` values using the `toMATLAB` method.

**Example**
```matlab
>> dates = [datetime(2021, 1, 2, 3, 4, 5), datetime(2023, 1, 1), datetime(1989, 2, 3, 10, 10, 10)]'

dates = 

  3x1 datetime array

   02-Jan-2021 03:04:05
   01-Jan-2023 00:00:00
   03-Feb-1989 10:10:10

>> array = arrow.array.Date32Array.fromMATLAB(dates)                                               

array = 

[
  2021-01-02,
  2023-01-01,
  1989-02-03
]

>> array.toMATLAB()                                                                                

ans = 

  3x1 datetime array

   02-Jan-2021
   01-Jan-2023
   03-Feb-1989
``` 

### Are these changes tested?

Yes.

1. Added a new `tDate32Array` test class.
2. Added `Date32` related test to `ttraits.m`.
6. Added a new `tDate32Traits.m` test class.

### Are there any user-facing changes?

Yes.

1. Users can now create `arrow.array.Date32Array`s  from MATLAB `datetime`s.

### Future Directions

1. apache#37230
2. Add `arrow.array.Date64Array`.
3. Add a way to extract the raw `int32` values from an `arrow.array.Date32Array` without converting to a MATLAB `datetime` using `toMATLAB`.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
* Closes: apache#37367

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
2 people authored and loicalleyne committed Nov 13, 2023
1 parent 690d920 commit ecb7897
Show file tree
Hide file tree
Showing 13 changed files with 492 additions and 7 deletions.
2 changes: 1 addition & 1 deletion matlab/src/cpp/arrow/matlab/array/proxy/numeric_array.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class NumericArray : public arrow::matlab::array::proxy::Array {

auto data_buffer = std::make_shared<MatlabBuffer>(numeric_mda);

const auto data_type = arrow::CTypeTraits<CType>::type_singleton();
const auto data_type = arrow::TypeTraits<ArrowType>::type_singleton();
const auto length = static_cast<int64_t>(numeric_mda.getNumberOfElements()); // cast size_t to int64_t

// Pack the validity bitmap values.
Expand Down
2 changes: 2 additions & 0 deletions matlab/src/cpp/arrow/matlab/array/proxy/wrap.cc
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ namespace arrow::matlab::array::proxy {
return std::make_shared<proxy::NumericArray<arrow::Time32Type>>(std::static_pointer_cast<arrow::Time32Array>(array));
case ID::TIME64:
return std::make_shared<proxy::NumericArray<arrow::Time64Type>>(std::static_pointer_cast<arrow::Time64Array>(array));
case ID::DATE32:
return std::make_shared<proxy::NumericArray<arrow::Date32Type>>(std::static_pointer_cast<arrow::Date32Array>(array));
case ID::STRING:
return std::make_shared<proxy::StringArray>(std::static_pointer_cast<arrow::StringArray>(array));
default:
Expand Down
1 change: 1 addition & 0 deletions matlab/src/cpp/arrow/matlab/proxy/factory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ libmexclass::proxy::MakeResult Factory::make_proxy(const ClassName& class_name,
REGISTER_PROXY(arrow.array.proxy.TimestampArray, arrow::matlab::array::proxy::NumericArray<arrow::TimestampType>);
REGISTER_PROXY(arrow.array.proxy.Time32Array , arrow::matlab::array::proxy::NumericArray<arrow::Time32Type>);
REGISTER_PROXY(arrow.array.proxy.Time64Array , arrow::matlab::array::proxy::NumericArray<arrow::Time64Type>);
REGISTER_PROXY(arrow.array.proxy.Date32Array , arrow::matlab::array::proxy::NumericArray<arrow::Date32Type>);
REGISTER_PROXY(arrow.tabular.proxy.RecordBatch , arrow::matlab::tabular::proxy::RecordBatch);
REGISTER_PROXY(arrow.tabular.proxy.Schema , arrow::matlab::tabular::proxy::Schema);
REGISTER_PROXY(arrow.type.proxy.Field , arrow::matlab::type::proxy::Field);
Expand Down
6 changes: 6 additions & 0 deletions matlab/src/cpp/arrow/matlab/type/proxy/traits.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "arrow/matlab/type/proxy/timestamp_type.h"
#include "arrow/matlab/type/proxy/time32_type.h"
#include "arrow/matlab/type/proxy/time64_type.h"
#include "arrow/matlab/type/proxy/date32_type.h"
#include "arrow/matlab/type/proxy/string_type.h"

namespace arrow::matlab::type::proxy {
Expand Down Expand Up @@ -99,4 +100,9 @@ namespace arrow::matlab::type::proxy {
struct Traits<arrow::Time64Type> {
using TypeProxy = Time64Type;
};

template <>
struct Traits<arrow::Date32Type> {
using TypeProxy = Date32Type;
};
}
3 changes: 3 additions & 0 deletions matlab/src/cpp/arrow/matlab/type/proxy/wrap.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "arrow/matlab/type/proxy/timestamp_type.h"
#include "arrow/matlab/type/proxy/time32_type.h"
#include "arrow/matlab/type/proxy/time64_type.h"
#include "arrow/matlab/type/proxy/date32_type.h"
#include "arrow/matlab/type/proxy/string_type.h"

namespace arrow::matlab::type::proxy {
Expand Down Expand Up @@ -56,6 +57,8 @@ namespace arrow::matlab::type::proxy {
return std::make_shared<Time32Type>(std::static_pointer_cast<arrow::Time32Type>(type));
case ID::TIME64:
return std::make_shared<Time64Type>(std::static_pointer_cast<arrow::Time64Type>(type));
case ID::DATE32:
return std::make_shared<Date32Type>(std::static_pointer_cast<arrow::Date32Type>(type));
case ID::STRING:
return std::make_shared<StringType>(std::static_pointer_cast<arrow::StringType>(type));
default:
Expand Down
92 changes: 92 additions & 0 deletions matlab/src/matlab/+arrow/+array/Date32Array.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
% arrow.array.Date32Array

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Date32Array < arrow.array.Array

properties(Access=private)
NullSubstitutionValue = NaT
end

methods

function obj = Date32Array(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.array.proxy.Date32Array")}
end
import arrow.internal.proxy.validate
obj@arrow.array.Array(proxy);
end

function dates = toMATLAB(obj)
import arrow.type.DateUnit

matlabArray = obj.Proxy.toMATLAB();
% UNIX Epoch (January 1st, 1970).
unixEpoch = datetime(0, ConvertFrom="posixtime");
% A Date32 value encodes a certain number of whole days
% before or after the UNIX Epoch.
dates = unixEpoch + days(matlabArray);
dates(~obj.Valid) = obj.NullSubstitutionValue;
end

function dates = datetime(obj)
dates = obj.toMATLAB();
end

end

methods(Static)

function array = fromMATLAB(data, opts)
arguments
data
opts.InferNulls(1, 1) logical = true
opts.Valid
end

import arrow.array.Date32Array

arrow.internal.validate.type(data, "datetime");
arrow.internal.validate.shape(data);

validElements = arrow.internal.validate.parseValidElements(data, opts);

% If the input MATLAB datetime array is zoned (i.e. has a TimeZone),
% then the datetime representing the UNIX Epoch must also have a TimeZone.
if ~isempty(data.TimeZone)
unixEpoch = datetime(0, ConvertFrom="posixtime", TimeZone="UTC");
else
unixEpoch = datetime(0, ConvertFrom="posixtime");
end

% Explicitly round down (i.e. floor) to the nearest whole number
% of days because durations and datetimes are not guaranteed
% to encode "whole" number dates / times (e.g. 1.5 days is a valid duration)
% and the int32 function rounds to the nearest whole number.
% Rounding to the nearest whole number without flooring first would result in a
% "round up error" of 1 whole day in cases where the fractional part of
% the duration is large enough to result in rounding up (e.g. 1.5 days would
% become 2 days).
numDays = int32(floor(days(data - unixEpoch)));
args = struct(MatlabArray=numDays, Valid=validElements);
proxy = arrow.internal.proxy.create("arrow.array.proxy.Date32Array", args);
array = Date32Array(proxy);
end

end

end
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
matlabData = cell(numClasses, 1);

timeClasses = getTimeArrayClasses();
dateClasses = getDateArrayClasses();
numericArrayToMatlabTypeDict = getNumericArrayToMatlabDictionary();

for ii = 1:numel(classes)
Expand All @@ -54,6 +55,10 @@
matlabData{ii} = randomDurations(opts.NumRows);
cmd = compose("%s.fromMATLAB(matlabData{ii})", name);
arrowArrays{ii} = eval(cmd);
elseif ismember(name, dateClasses)
matlabData{ii} = randomDatetimes(opts.NumRows);
cmd = compose("%s.fromMATLAB(matlabData{ii})", name);
arrowArrays{ii} = eval(cmd);
else
error("arrow:test:SupportedArrayCase", ...
"Missing if-branch for array class " + name);
Expand Down Expand Up @@ -86,6 +91,10 @@
timeClasses = compose("arrow.array.Time%dArray", [32 64]);
end

function dateClasses = getDateArrayClasses()
dateClasses = compose("arrow.array.Date%dArray", 32);
end

function number = randomNumbers(numberType, numElements)
number = cast(randi(255, [numElements 1]), numberType);
end
Expand All @@ -107,4 +116,4 @@
function dates = randomDatetimes(numElements)
day = days(randi(255, [numElements 1]));
dates = datetime(2023, 8, 23) + day;
end
end
30 changes: 30 additions & 0 deletions matlab/src/matlab/+arrow/+type/+traits/Date32Traits.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef Date32Traits < arrow.type.traits.TypeTraits

properties (Constant)
ArrayConstructor = @arrow.array.Date32Array
ArrayClassName = "arrow.array.Date32Array"
ArrayProxyClassName = "arrow.array.proxy.Date32Array"
ArrayStaticConstructor = @arrow.array.Date32Array.fromMATLAB
TypeConstructor = @arrow.type.Date32Type;
TypeClassName = "arrow.type.Date32Type"
TypeProxyClassName = "arrow.type.proxy.Date32Type"
MatlabConstructor = @datetime
MatlabClassName = "datetime"
end

end
2 changes: 2 additions & 0 deletions matlab/src/matlab/+arrow/+type/+traits/traits.m
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
typeTraits = Time32Traits();
case ID.Time64
typeTraits = Time64Traits();
case ID.Date32
typeTraits = Date32Traits();
otherwise
error("arrow:type:traits:UnsupportedArrowTypeID", "Unsupported Arrow type ID: " + type);
end
Expand Down
Loading

0 comments on commit ecb7897

Please sign in to comment.