NamedDimsArray
is a zero-cost abstraction to add names to the dimensions of an array.
For nda = NamedDimsArray{(:x, :y, :z)}(rand(10, 20, 30))
.
- Indexing:
nda[y=2]
is the same asnda[x=:, y=2, z=:]
which is the same asnda[:, 2, :]
. - Functions taking a
dims
keyword:sum(nda; dims=:y)
is the same assum(nda; dims=2)
. - Accessing Names:
dimnames(nda)
returns(:x, :y, :z)
, a tuple with the dimension names. - Identifying a dimension by name:
dim(nda, :y)
returns2
, the numerical dimension named:y
. Similarlydim(nda, (:y, :z))
returns(2, 3)
. - Unwrapping:
parent(nda)
returns the underlyingAbstractArray
that is wrapped by theNamedDimsArray
. - Unnaming:
unname(a)
ensures anAbstractArray
is not aNamedDimsArray
; if passed aNamedDimsArray
it unwraps it, otherwise just returns the givenAbstractArray
. - Renaming:
rename(nda, new_names)
returns a newNamedDimsArray
with thenew_names
but still wrapping the same data. - Refining Names:
NamedDimsArray(nda, names)
returns a newNamedDimsArray
with any unnamed dimensions ofnda
getting their names fromnames
. It errors if any names present in both disagree.
Any operation of multiple NamedDimArray
s must have compatible dimension names.
For example trying NamedDimsArray{(:time,)}(ones(5)) + NamedDimsArray{(:place,)}(ones(5))
will throw an error.
If you perform an operation between another AbstractArray
and a NamedDimsArray
, then
the result will take its names from the NamedDimsArray
.
You can use this to bypass the protection,
e.g. NamedDimsArray{(:time,)}(ones(5)) + parent(NamedDimsArray{(:place,)}(ones(5)))
is allowed.
To allow for arrays where only some dimensions have names, the name :_
is treated as a wildcard.
Dimensions named with :_
will not be protected against operating between dimensions of different names; in these cases the result will take the name from the non-wildcard name, if any of the operands had such a concrete name.
For example:
NamedDimsArray{(:time,:_)}(ones(5,2)) + NamedDimsArray{(:_, :place,)}(ones(5,2))
is allowed. and would have a result of:
NamedDimsArray{(:time,:place)}(2*ones(5,2))
As such, unless you want this wildcard behaviour, you should not use :_
as a dimension name.
(Also that is a terrible dimension name, and goes against the whole point of this package.)
When you perform matrix multiplication between a AbstractArray
and a NamedDimsArray
then the new dimensions name is given as the wildcard :_
.
Similarly, when you take the transpose of a AbstractVector
, the new first dimension
is named :_
.
It is a common desire to be able to write code that anyone can call,
whether they are using NamedDimsArray
s or not.
While also being able to use NamedDimsArray
s internally in its definition;
and also getting the assertion when a NamedDimsArray
is passed in, that it has the
expected dimensions.
The way to do this is to call the NamedDimsArray
constructor, with the expected names
within the function.
This operation corresponds to PyTorch's refine_names
.
As in the following example:
function total_variance(data::AbstractMatrix)
n_data = NamedDimsArray(data, (:times, :locations))
location_variance = var(n_data; dims=:times) # calculate variance at each location
return sum(location_variance; dims=:locations) # total them
end
If this function is given (say) a Matrix
, then it will apply the names to it in n_data
.
Thus the function will just work on unnamed types.
If data
is a NamedDimsArray
, with incompatible names an error will be thrown.
For example if it data
was mistakenly transposed and so had the dimension names:
(:locations, :times)
instead of (:times, :locations)
.
If data
was partially named, e.g. (:_, :locations)
, then that name would be allowed to be
combined with the named from the constructor; yielding n_data
with the expected names:
(:times, :locations)
.
This pattern allows both assertions of correctness (for named inputs),
and convenience and compatibility (for unnamed input).
And since NamedDimsArray
is a zero-cost abstraction, this will basically compile out of existence,
most of the time.
There are two common things to do to make a function support NamedDimsArray
s.
These are:
- Adding support for referring to a dimension by name to an existing function
- Make the operation return a
NamedDimsArray
rather than aArray
. (Many operations fallback to dropping the names) Often they are done together.
They are illustrated by the following example:
function foo(nda::NamedDimsArray, args...; dims=:)
numerical_dims = dim(nda, dims) # convert any form of dims into numerical dims
raw_result = foo(parent(nda), args...; dims=numerical_dims) # call it on the backed data
new_names = determine_foo_names(nda, args...) # workout what the new names will be
return NamedDimsArray{new_names)(raw_result) # wrap the result up
end
You can do this to your own functions in your own packages, to add NamedDimsArray
support.
If you implement it for any functions in a standard library, a PR would be very appreciated.
If multiple dimensions have the same names, indexing by name is considered undefined behaviour and should not be relied upon.