Lightweight header-only tensor library for C++17 and later.
Create a virtual tensor and allocate the memory:
Tensor<float> a = Tensor<float>({1024, 1024}).allocate();or fill with special values:
Tensor<float> b = Tensor<float>({1024, 1024}).randn(); // 0-1, box-muller normal
Tensor<float> c = Tensor<float>({1024, 1024}).fill(0.f);
Tensor<float> d = Tensor<float>({1024, 1024}).eye();Copy existing tensors:
Tensor<float> e = Tensor<>::like(a); // returns virtual tensor with no memory
Tensor<float> f = a.copy(); // new tensor that shares memory with a
Tensor<float> g = a.clone(); // new tensor that copied the memory of aTensor<float> h = Tensor<>::like(a).eye(); // identity tensor identical to a
Tensor<float> i = Tensor<>::like(a).fill(0.f) // 0-full tensor identical to aAll the memory allocated by the library is aligned according to the macro MEMORY_ALIGNMENT defaulted to 32 bytes for AVX-256, unless the size of the block is smaller then the alignment requirement.
Allocate the data using the included alloc<T> function. This ensures the memory block is aligned correctly:
float* data = alloc<float>(1024*1024);
Tensor<float> a = Tensor<float>(data, 1024*1024, {1024, 1024});If you must allocate the memory yourself, use aligned_alloc or malloc if the number of items is small. Never use new[], because garbage collection is handled with free() and the memory will leak.
float* data = static_cast<float*>( aligned_alloc(MEMORY_ALIGNMENT, 1024*1024*sizeof(float)) );
Tensor<float> b = Tensor<float>(data, 1024*1024, {1024/2, 2, 1024});If the macro FORCE_ALIGNMENT is defined, all input arrays will be copied into new aligned memory blocks. You are free to use whatever array allocation you want:
#define FORCE_ALIGNMENT
float* data[1024*1024];
Tensor<float> c = Tensor<float>(data, 1024*1024, {1024, 1024});
Elements can also be entered manually:
Tensor<float> a = Tensor<float>({5}, {1});
Tensor<float> b = Tensor<float>({0, 1, 2, 3, 4, 5}, {2, 3});
Tensor<float> a = Tensor<float>({2, 1024, 2048});a.ndim()returns number of dimensions.a.numel()returns the number of elements in the tensor.a.device()returns the device on which the memory is stored.a.memsize()returns the number of elements allocated in memory.a.has_grad()returns aboolthat tracks if the tensor keeps a gradient.a.is_initialized()returns aboolthat tracks if underlying tensor data exists.a.is_allocated()returns aboolthat tracks if all the tensor values exist in memory.a.is_eye()returns aboolthat tracks if the tensor is an indentity tensor.a.is_sub()returns aboolthat tracks if the tensor is a subtensor.a.storage()returns aconst T*to the underlying tensor data.a.shape()returns aconst uint32_t*to the tensor view data.a.strides()returns aconst uint32_t*to the tensor stride data.
Help construct special tensors. Throw if tensor data already exists.
fill: fill a tensor with one value.
auto a = Tensor<float>::fill({2048, 2048}, 1.0);
auto b = Tensor<float>({2048, 2048}).fill(0.0);
auto c = Tensor<>::like(a).fill(1.f); // Shallow copy and fillThis only stores one value in memory and redirects to it given all correct indices. Because the view does not correspond to contiguous memory, data can only be accessed using operator() and bresolved will return false. Manually accessing the data() array will yield random values or errors.
arange: fill a tensor with evenly spaced values in a given interval.
static Tensor<T> arange(T stop, T start=0, T step=1, Device device=CPU) {auto a = Tensor<float>::arange(50);
auto b = Tensor<float>::arange(50, 10);
auto c = Tensor<float>::arange(50, 10, 5);Stores all resulting values in memory, bresolved is true.
randn: fill a tensor with randomly generated values that fit a given distribution.
Distributions can be either NORMAL, UNIFORM or CHI_SQUARED.
void randn(T up=1.f, T down=0.f, uint32_t seed=0, Distribution dist=NORMAL);
static Tensor<T> randn(std::initializer_list<uint32_t> shp, T up=1.f, T down=0.f,
uint32_t seed=0, Device device=CPU, Distribution dist=NORMAL);auto a = Tensor<float>::randn({2048, 2048}, 3.14, -3.14);
auto b = Tensor<>::like(a).randn(3.14); // Shallow copy and fill with randn()Stores all resulting values in memory, bresolved is true.
You can also randomly generate arrays of the same distributions:
static std::unique_ptr<float[]> f32_generate_uniform_distribution(uint32_t count, float up=1.f,
float down=0.f, double seed=0, bool bepsilon=false, float epsilon=0);
static std::unique_ptr<float[]> f32_generate_chi_squared_distribution(uint32_t count,
float up=1.f, float down=0.f, double seed=0);
static std::unique_ptr<float[]> f32_generate_box_muller_normal_distribution(uint32_t count,
float up=1.f, float down=0.f, double seed=0);auto a = Tensor<>::f32_generate_uniform_distribution(2048*2048, 3.14, -3.14);
auto b = Tensor<>::f32_generate_chi_squared_distribution(2048*2048, 3.14, -3.14);
auto c = Tensor<>::f32_generate_box_muller_normal_distribution(2048*2048, 3.14, -3.14);
eye: creates an identity tensor.
static Tensor<T> eye(uint32_t size, uint32_t dims=2, Device device=CPU);auto a = Tensor<>::eye(4096, 4);
auto b = Tensor<>::like(a).eye(); // Not yet implementedValues are generated when indexing, bresolved is false.
Tensors that have no underlying data.
auto a = Tensor<float>({2048, 2048});sized_array<uint32_t> s { std::unique_ptr<uint32_t[]>(new uint32_t[2]()), 2};
s.ptr[0] = 2;
s.ptr[1] = 3;
Tensor<float> b(s);
They perform changes to the view and stride of the tensor, but make no changes to the underlying data.
reshape: change theviewof the tensor without changing the number of elements inside.
auto a = Tensor<float>(data, 2048*2048, {2048, 2048});
a.reshape({2, 1024, 2048});
a.reshape({1024, 2, 1024, 2});
permute: change the order of the dimensions inside of the tensor.
auto a = Tensor<float>(data, 2048*2048, {1024, 2, 1024, 2});
a.permute({1, 3, 0, 2}) // {2, 2, 1024, 1024}