-
Notifications
You must be signed in to change notification settings - Fork 401
Arrays: What Are Your Options?
Arrays are the most fundamental data structure in computer science and the only data structure that computer hardware explicitly optimizes for. Computer languages, computer hardware, and arrays evolved together. It makes sense, therefore, that the choice of array organization and array implementation is an important one.
Here’s a trivia question. When it comes to arrays, what is the biggest difference between Fortran and C++? The most common answer you will get is “C++ arrays start at 0, Fortran arrays start at 1”. This answer is wrong. Another answer is “Fortran has multi dimensional arrays and C++ does not” and “Fortran arrays are column major and C++ arrays are row major”. These are also wrong.
The correct answer is “Fortran has language support for arrays and C++ does not”. But “what about my beloved []
operator?” you say, “isn’t that support for arrays?” Not really, no. In C++, array[i]
is syntactic sugar for *(Array+i)
. If the C preprocessor could do such things, you could redefine X[Y]
as *(X+Y)
and everything would still work. Anyways, the point is that support for array dereference is not enough. Support for arrays implies support for operators like adding two arrays to form a third array. C++ does not have this in part because in C++ arrays are just pointers and do not carry information about their own length with them. In C++, the application is supposed to carry information about the length of the array, the C++ language itself does not. There cannot be a C++ operator to add two arrays element-wise because it would not know how many elements there are. Fortran arrays carry their length with them and so Fortran can have language-based array operators.
Of course, you can use C++ templates to build arrays that track their own length, and you can overload C++ operators to
support array operators on those templates. This is called std::array
and std::vector
. Or EPVector. Or ObjexxFCL. Or Eigen. Or
YourFavoriteArrayLibraryHere. If you see anything in C++ that looks like deep support for arrays, it’s not C++ itself, it’s a
library. And that makes a difference. What are the advantages of array template libraries vis-a-vis native C-style hand-rolled arrays?
- Memory management is hidden within template constructors/destructors.
- Resizing (if needed) is supported in a clean efficient function, with copying minimized speculatively.
- Range checking asserts for memory safety.
- Utility functions and iterators.
These specific advantages are the reason that there is a general consensus that modern code should use array template libraries as opposed to hand-rolled C-style arrays. Okay.
Just so we are clear, what are the disadvantages of array template libraries vis-a-vis C-style arrays?
- The compiler cannot easily optimize across library calls, even for template libraries. This is the most important drawback in my opinion. Remember, this is not a problem for Fortran compilers because Fortran arrays are part of the language so the compiler knows how to reason about them.
- Array object overhead. At this point you are thinking, "but it's only the
size
,max_size
, anddatapointer
fields, how much overhead is that? You're right, it's not that much overhead in terms of space. But it is overhead in terms of heap allocations because the data storage has to be allocated separately. Allocating an array object actually requires two allocations, not one. And this overhead is repeated for many of arrays, many of which conceptually share thesize
andmax_size
fields. - Compiler hints and directives have to “reach into” the actual data portions of the array and may not work.
- Use of heavier, math-oriented array templates for objects instead of for scalars increases compilation time because the compiler will have to instantiate all operators associated with a template, even the arithmetic ones that don't necessarily make sense.
- Array slicing or subsetting requires at the very least creating a new object and potentially copying data rather than handing off a pointer. UPDATE: the C++20
std::span
template reduces the cost of slicing substantially. This is why this disadvantage is moved to the end.
Not all array template libraries are created equal. std::vector
and EPVector
are both lightweight and only give you the basic functionality described above, i.e., allocation/reallocation/deallocation, range checking, and some iterators. A library like ObjexxFCL
is heavier because it is designed to emulate Fortran and so it also supports:
- Simple arithmetic operators like addition of two arrays.
- Basic allocation/reallocation/deallocation, indexing, range checking and iterators for multi-dimensional arrays.
- Simple arithmetic operators on multi-dimensional arrays, e.g., scaling multi-dimensional array by constant.
A library like Eigen is even heavier in that it also gives you:
- Multi-dimensional array algorithms like matrix multiplication, inversion, decomposition.
If we agree that we should use array an array template library, which one should we use? Or rather which one should we use when? It is of course possible to use multiple array libraries in a single program.
One obvious take away is that you should use a light-weight library like std::vector
of EPVector
when dealing with an array of objects where arithmetic operations are not relevant. There may not be runtime overhead to using a heavier library but there will be compilation overhead for having to instantiate larger templates.
What about for arrays of numbers on which you want to do arithmetic? Should you use std::vector
/EPVector
or something like ObjexxFCL
or Eigen
. The choice comes down to how you feel about array arithmetric operators and multi-dimensional arrays.
A library like ObjexxFCL
lets you do something like this:
Real64 A, B;
class ObjexxFCL::Array1D<Real64> X, Y;
Y = A*X + B;
This handy little shorthand actually results in suboptimal code, why? Because it requires a temporary array T
to hold the results of A*X
. This code is essentially equivalent to:
Real64 A, B;
class ObjexxFCL::Array1D<Real64> X, Y, T;
T = A*X;
Y = T + B;
Which expands to:
for (int i = 1; i <= X.size(); ++i)
T(i) = A*X(i);
for (int i = 1; i <= Y.size(); ++i)
Y(i) = T(i) + B;
Two loops instead of one and an extra array. You are much better off doing this by hand:
for (int i = 1; i <= X.size(); ++i)
Y(i) = A*X(i) + B;
The lesson here?
We've now reached the final topic, multi-dimensional arrays. Here are the options.
- Do you really need a multi dimensional array? Are you sure?
- A library with built-in support for multi-dimensional arrays like ObjexxFCL or Eigen.
- Nested 1D arrays using a library like
std::vector
orEPVector
. - A multi-dimensional indexing scheme inside a 1D array.
Consider this array from EnergyPlus:
ObjexxFCL::Array2D<Real64> TodayOutDryBulbTemp;
TodayOutDryBulbTemp.allocate(state.dataGlobal->NumOfTimeStepInHour, 24);
Ignore for a second that TimeStepInHour
and HourInDay
dimensions should be flipped to improve spatial locality, are two dimensions even necessary? Both dimensions are time after all. Could this be a valid implementation?
ObjexxFCL::Array1D<Real64> TodayOutDryBulbTemp;
TodayOutDryBulbTemp.allocate(24 * state.dataGlobal->NumOfTimeStepInHour);
Here is another example, which at this point is historical, i.e., has already been changed in EnergyPlus.
ObjexxFCL::Array2D<Real64> SurfSunCosHourly;
SurfSunCosHourly.allocate(HoursInDay, 3);
The second dimension of this array are the X,Y,Z axes. However, there is already an object ObjexxFCL::Vector3
with X,Y,Z fields. This code now uses a 1D array of Vector3
objects. Note, this can actually be a std::array
since HoursInDay is a compile-time constant (I'm 99% sure).
ObjexxFCL::Array1D<ObjexxFCL::Vector3<Real64>> SurfSunCosHourly;
SurfSunCosHourly.allocate(HoursInDay);
Here is another historical example, the array that holds master temperature histories of surfaces.
ObjexxFCL::Array3D<Real64> THM(2, Construction::MaxCTFTerms, state.dataSurface->TotSurfaces);
The outermost dimension in this array distinguishes the outside of the surface (1) from the inside of the surface (2). Having this as an extra array dimension doesn’t buy you very much, because this dimension is treated as a variable in only one spot, a two-line loop that updates the array TH
from the array THM
. The rest of the time, this extra dimension just adds complexity and address calculation overhead. It’s better to just split this array into two 2D arrays.
ObjexxFCL::Array2D<Real64> THMinside (Construction::MaxCTFTerms, state.dataSurface->TotSurfaces);
ObjexxFCL::Array2D<Real64> THMoutside (Construction::MaxCTFTerms, state.dataSurface->TotSurfaces);
In this instance, we didn’t successfully convert a multi-dimensional array into a 1D array, but we did reduce the array dimension by one. Which is still good.
If you have library or language support for 1D arrays, you can create multi-dimensional arrays by making arrays of arrays. Let’s go back to the now 2D example of THMinside. Instead of:
ObjexxFCL::Array2D<Real64> THMinside(Constructions::MaxCTFTerms, state.dataSurfaces->TotSurfaces);
You can do:
EPVector<EPVector<Real64>> THMinside (Constructions::MaxCTFTerms, std::vector<Real64>(state.dataSurfaces->TotSurfaces));
Your loops then look like this:
for (int term = 0; term < Construction::MaxCTFTerms ; ++term) {
for (int surf = 0; surf < TotSurfaces ; ++surf) {
... THMinside(term)(surf) ...;
}
}
Actually, you may want to do this to avoid having to do a double dereference each time, because depending on what is going on with the code the compiler may not be able to figure out that this is an okay thing to do.
for (int term = 0; term < Construction::MaxCTFTerms ; ++term) {
EPVector<Real64> &THMinsideTerm(THMinside(term));
for (int surf = 0; surf < TotSurfaces ; ++surf) {
... THMinsideTerm(surf) ...;
}
}
We just hit on the downside of arrays-of-arrays. Each element dereference is essentially a chain of dereferences.
Sometimes the compiler can hoist these out of loops. Sometimes you can help the compiler as I did in the example above.
But sometimes you are just stuck with things like THMinside[term][surf]
. Which is not ideal. For performance, you do not
want chains of stuff. Especially chains of dereferences. [Ed: this is why linked lists are slow and not used anywhere other
than as examples in CS101 classes.]
One nice thing about nested arrays is that you can “shift” rows quickly. This comes into play in the UpdateThermalHistory
code.
Rather than copying the contents of every row to the next row over, you can use the fact that the interior arrays contain pointers to the heap element storage to move those pointers around using the move constructor, which is activated using the std::move
template.
EPVector<Real64> THMinsideTemp(std::move(THMinside(Constructions::MaxCTFTerms)));
for (int term = Constructions::MaxCTFTerms; term >= 2; --term) {
THMinside(term) = std::move(THMinside[term - 1]);
}
THMinside(1) = std::move(THMinsideTemp);
EnergyPlus already uses this "trick".
Another option for implementing a multi-dimensional array is as a 1D array with multi-dimensional indexing. This is actually what ObjexxFCL and other array libraries do internally.
std::vector<Real64> THMinside (Constructions::MaxCTFTerms * state.dataSurfaces->TotSurfaces);
for (int term = 0; term < Construction:: MaxCTFTerms ; ++term) {
for (int surf = 0; surf < TotSurfaces ; ++surf) {
... THMinside[state.dataSurfaces->TotSurfaces*term + surf] ...;
}
}
This can be a higher performance option than nested arrays because loading an element requires only a single dereference. And after a while, writing down this kind of indexing code becomes almost natural!
I would say that the only compelling reason to use a heavyweight array library with built-in support for multi-dimensional arrays is if you are getting some nice optimized implementations of useful algorithms like matrix inversion, LU decomposition, etc. Given that library implementations of simple array arithmetic operations are not really worthwhile, your best bet subsequently is to build your own multi-dimensional arrays out of a lightweight 1D array template library using either nesting or explicit multi-dimensional indexing.