From a58d8af300dc99a652982735f98cf5ac0a5a53b1 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Fri, 27 Oct 2023 11:25:30 +0000 Subject: [PATCH] build based on b53d8d9 --- dev/.documenter-siteinfo.json | 2 +- dev/api/index.html | 82 +++++++++++++++++------------------ dev/api_overview/index.html | 2 +- dev/changelog/index.html | 2 +- dev/examples/index.html | 2 +- dev/index.html | 2 +- dev/license/index.html | 2 +- dev/overview/index.html | 2 +- 8 files changed, 48 insertions(+), 48 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 25ae3ed8..e9482aa5 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-10-27T11:13:22","documenter_version":"1.1.2"}} \ No newline at end of file +{"documenter":{"julia_version":"1.9.3","generation_timestamp":"2023-10-27T11:25:16","documenter_version":"1.1.2"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index d7f6cca4..e79b96e2 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,5 +1,5 @@ -API Reference · NNHelferlein.jl

API doc of all exported functions are listed here:

Chains

NNHelferlein.AbstractNNType
abstract type AbstractNN

Mother type for AbstractNN hierarchy with implementation for a chain of layers.

Signatures:

  • (m::AbstractNN)(x): run the AbstractArray x througth all layers and return the output
  • (m::AbstractNN)(x,y): Calculate the loss for one minibatch x and teaching input y
  • (m::AbstractNN)(d::Knet.Data): Calculate the loss for all minibatches in d
  • (m::AbstractNN)(d::Tuple): Calculate the loss for all minibatches in d
  • (m::AbstractNN)(d::NNHelferlein.DataLoader): Calculate the loss for all minibatches in d if teaching input is included (i.e. elements of d are tuples). Otherwise return the out of all minibatches as one array with samples as columns.

```

source
NNHelferlein.AbstractChainType
abstract type AbstractChain

Mother type for AbstractChain hierarchy with implementation for a chain of layers. By default every AbstractChain has a property layers with a iterable list of AbstractLayers or AbstractChains that are executed recursively.

Non-standard Chains in which Layers are not execueted sequnetially (such as ResnetBlocks) must provide a custom implementation with the signature chain(x).

Signatures:

  • (m::AbstractChain)(x): run the AbstractArray x througth all layers and return the output

```

source
NNHelferlein.add_layer!Function
function add_layer!(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l)

Add a layer l or a chain to a model n. The layer is always added at the end of the chains. The modified model is returned.

source
Base.:+Function
function +(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l::Union{AbstractLayer, AbstractChain})
+API Reference · NNHelferlein.jl

API doc of all exported functions are listed here:

Chains

NNHelferlein.AbstractNNType
abstract type AbstractNN

Mother type for AbstractNN hierarchy with implementation for a chain of layers.

Signatures:

  • (m::AbstractNN)(x): run the AbstractArray x througth all layers and return the output
  • (m::AbstractNN)(x,y): Calculate the loss for one minibatch x and teaching input y
  • (m::AbstractNN)(d::Knet.Data): Calculate the loss for all minibatches in d
  • (m::AbstractNN)(d::Tuple): Calculate the loss for all minibatches in d
  • (m::AbstractNN)(d::NNHelferlein.DataLoader): Calculate the loss for all minibatches in d if teaching input is included (i.e. elements of d are tuples). Otherwise return the out of all minibatches as one array with samples as columns.

```

source
NNHelferlein.AbstractChainType
abstract type AbstractChain

Mother type for AbstractChain hierarchy with implementation for a chain of layers. By default every AbstractChain has a property layers with a iterable list of AbstractLayers or AbstractChains that are executed recursively.

Non-standard Chains in which Layers are not execueted sequnetially (such as ResnetBlocks) must provide a custom implementation with the signature chain(x).

Signatures:

  • (m::AbstractChain)(x): run the AbstractArray x througth all layers and return the output

```

source
NNHelferlein.add_layer!Function
function add_layer!(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l)

Add a layer l or a chain to a model n. The layer is always added at the end of the chains. The modified model is returned.

source
Base.:+Function
function +(n::Union{NNHelferlein.AbstractNN, NNHelferlein.AbstractChain}, l::Union{AbstractLayer, AbstractChain})
 function +(l1::AbstractLayer, l2::Union{AbstractLayer, AbstractChain})

The plus-operator is overloaded to be able to add layers and chains to a network.

The second form returns a new chain if 2 Layers are added.

Example:

julia> mdl = Classifier() + Dense(2,5)
 julia> print_network(mdl)
 
@@ -25,33 +25,33 @@
     Dense layer 5 → 1 with identity,                                 6 params
  
 Total number of layers: 3
-Total number of parameters: 51
source
NNHelferlein.ClassifierType
struct Classifier <: AbstractNN

Classifier with default nll loss. An alternative loss function can be supplied as keyword argument. The function must provide a signature to be called as loss(model(x), y).

Constructors:

Classifier(layers...; loss=Knet.nll)

Signatures:

(m::Classifier)(x,y) = m.loss(m(x), y)
source
NNHelferlein.RegressorType
struct Regressor <: AbstractNN

Regression network with square loss as loss function.

Constructors:

Regressor(layers...; loss=mean_squared_error.nll)

Signatures:

(m::Regression)(x,y) = mean(abs2, Array(m(x)) - y)
source
NNHelferlein.TransformerType
mutable struct Transformer

A Bert-like transformer network consisting of an encoder and a decoder stack.

Constructor:

Transformer(n_layers, depth, heads; drop_rate=0.1)
  • n_layers: number of layers in encoder and decoder
  • depth: embedding depth
  • heads: number of heads for the multi-head attention
  • drop_rate: dropout rate used in all layers

Signature:

(tf::Transformer)(x, y; enc_mask=nothing, dec_mask=nothing)

The transformer is called with two 3-d-arrays of embedded sequences x and y of size [depth, seq_len, n_minibatch] and returns a tensor of size [depth, seq_len_y, n_minibatch]. Sequences x and y may be of different lengths; output has always the same dimensions as y.

Attention factors of the last run are stored in the field α of the transformer object.

enc_mask and dec_mask are optional padding masks for the encoder and decoder input, respectively. They must be of size [seq_len, n_minibatch].

source
NNHelferlein.TokenTransformerType
mutable struct TokenTransformer

A wrapper around the Transformer object that takes sequences of token ids as input.

Constructor:

TokenTransformer(n_layers, depth, heads, 
+Total number of parameters: 51
source
NNHelferlein.ClassifierType
struct Classifier <: AbstractNN

Classifier with default nll loss. An alternative loss function can be supplied as keyword argument. The function must provide a signature to be called as loss(model(x), y).

Constructors:

Classifier(layers...; loss=Knet.nll)

Signatures:

(m::Classifier)(x,y) = m.loss(m(x), y)
source
NNHelferlein.RegressorType
struct Regressor <: AbstractNN

Regression network with square loss as loss function.

Constructors:

Regressor(layers...; loss=mean_squared_error.nll)

Signatures:

(m::Regression)(x,y) = mean(abs2, Array(m(x)) - y)
source
NNHelferlein.TransformerType
mutable struct Transformer

A Bert-like transformer network consisting of an encoder and a decoder stack.

Constructor:

Transformer(n_layers, depth, heads; drop_rate=0.1)
  • n_layers: number of layers in encoder and decoder
  • depth: embedding depth
  • heads: number of heads for the multi-head attention
  • drop_rate: dropout rate used in all layers

Signature:

(tf::Transformer)(x, y; enc_mask=nothing, dec_mask=nothing)

The transformer is called with two 3-d-arrays of embedded sequences x and y of size [depth, seq_len, n_minibatch] and returns a tensor of size [depth, seq_len_y, n_minibatch]. Sequences x and y may be of different lengths; output has always the same dimensions as y.

Attention factors of the last run are stored in the field α of the transformer object.

enc_mask and dec_mask are optional padding masks for the encoder and decoder input, respectively. They must be of size [seq_len, n_minibatch].

source
NNHelferlein.TokenTransformerType
mutable struct TokenTransformer

A wrapper around the Transformer object that takes sequences of token ids as input.

Constructor:

TokenTransformer(n_layers, depth, heads, 
                  x_vocab, y_vocab;
                  drop_rate=0.1)
  • n_layers: number of layers in encoder and decoder
  • depth: embedding depth
  • heads: number of heads for the multi-head attention
  • x_vocab: vocabulary size of the input sequences as integer value or a WordTokenizer object
  • y_vocab: vocabulary size of the output sequences as integer value or a WordTokenizer object
  • drop_rate: dropout rate used in all layers

Signature:

    (tt::TokenTransformer)(x, y; enc_mask=nothing, dec_mask=nothing
-                           embedded=true)

The transformer is called with two 2-d-arrays of token ids x and y of size [seq_len, n_minibatch] which may be of different lengths. It returns a tensor of size [y_vocab, seq_len_y, n_minibatch] with the raw activations of output neurons or, if embedded is set to false, a 2-d-array of size [seq_len_y, n_minibatch] with the sequences of generated tokens.

source
NNHelferlein.ChainType
struct Chain <: AbstractChain

Simple wrapper to chain layers and execute them one after another.

source
NNHelferlein.VAEType
struct VAE   <: AbstractNN

Type for a generic variational autoencoder.

Constructor:

VAE(encoder, decoder)

Separate predefind chains (ideally, but not necessarily of type Chain) for encoder and decoder must be specified. The VAE needs the 2 parameters mean and variance to define the distribution of each code-neuron in the bottleneck-layer. In consequence the encoder output must be 2 times the size of the decoder input (in case of dense layers: if encoder output is a 8-value vector, 4 codes are defined and the decoder input is a 4-value vector; in case of convolutional layers the number of encoder output channels must be 2 times the number of the encoder input channels - see the examples).

Signatures:

(vae::VAE)(x)
+                           embedded=true)

The transformer is called with two 2-d-arrays of token ids x and y of size [seq_len, n_minibatch] which may be of different lengths. It returns a tensor of size [y_vocab, seq_len_y, n_minibatch] with the raw activations of output neurons or, if embedded is set to false, a 2-d-array of size [seq_len_y, n_minibatch] with the sequences of generated tokens.

source
NNHelferlein.ChainType
struct Chain <: AbstractChain

Simple wrapper to chain layers and execute them one after another.

source
NNHelferlein.VAEType
struct VAE   <: AbstractNN

Type for a generic variational autoencoder.

Constructor:

VAE(encoder, decoder)

Separate predefind chains (ideally, but not necessarily of type Chain) for encoder and decoder must be specified. The VAE needs the 2 parameters mean and variance to define the distribution of each code-neuron in the bottleneck-layer. In consequence the encoder output must be 2 times the size of the decoder input (in case of dense layers: if encoder output is a 8-value vector, 4 codes are defined and the decoder input is a 4-value vector; in case of convolutional layers the number of encoder output channels must be 2 times the number of the encoder input channels - see the examples).

Signatures:

(vae::VAE)(x)
 (vae::VAE)(x,y)

Called with one argument, predict will be executed; with two arguments (args x and y should be identical for the autoencoder) the loss will be returned.

Details:

The loss is calculated as the sum of element-wise error squares plus the Kullback-Leibler-Divergence to adapt the distributions of the bottleneck codes:

\[\mathcal{L} = \frac{1}{2} \sum_{i=1}^{n_{outputs}} (t_{i}-o_{i})^{2} - - \frac{1}{2} \sum_{j=1}^{n_{codes}}(1 + ln\sigma_{c_j}^{2}-\mu_{c_j}^{2}-\sigma_{c_j}^{2}) \]

Output of the autoencoder is cropped to the size of input before loss calculation (and before prediction); i.e. the output has always the same dimensions as the input, even if the last layer generates a bigger shape.

KL-training parameters:

The parameter β is by default set to 1.0, i.e. mean-squared error and KL has the same weights. The functions set_beta(vae, beta) and get_beta(vae) can be used to set and get the β used in training. With β=0.0 no KL-loss will be used.

source
NNHelferlein.get_betaFunction
function get_beta(vae::VAE; ramp=false)

Return a Dict with the current VAE-parameters beta and ramp-up.

Arguments:

  • ramp=false: if true, a vector of β for all ramp-up steps is returned. This way, the ramp-up phase can be visualised: <img src="./assets/vae-beta-range.png"/>
source
NNHelferlein.set_beta!Function

function setbeta!(vae::VAE, βmax; ramp_up=false, steps=0)

Helper to set the current value of the VAE-parameter beta and ramp-up settings.

VAE loss is calculated as (mean of error squares) + β * (mean of KL divergence).

Ramp-up:

In case of ramp_up=true, β starts with almost 0.0 (sigm(-10.0) ≈4.5e-5) and reaches almost 1.0 after steps steps, following a sigmoid curve. steps should be more than 25, to avoid rounding errors in the calculation of the derivative of the sigmoid function.

source

Layers

NNHelferlein.AbstractLayerType
abstract type AbstractLayer
-abstract type Layer

Mother type for layers hierarchy. (The type Layer is kept for backward compatibility)

source

Fully connected layers

NNHelferlein.DenseType
struct Dense  <: AbstractLayer

Default Dense layer.

Constructors:

  • Dense(w, b, actf): default constructor, w are the weights and b the bias.
  • Dense(i::Int, j::Int; actf=sigm, init=..): layer of j neurons with i inputs. Initialiser is xavieruniform for actf=sigm and xaviewnormal otherwise.
  • Dense(h5::HDF5.File, group::String; trainable=false, actf=sigm): kernel and bias are loaded by the specified group.
  • Dense(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=sigm): layer imported from a hdf5-file from TensorFlow with the hdf-object h5 and the group name group.
source
NNHelferlein.LinearType
struct Linear  <: AbstractLayer

Almost standard dense layer, but functionality inspired by the TensorFlow-layer:

  • capable to work with input tensors of any number of dimensions
  • default activation function identity
  • optionally without biases.

The shape of the input tensor is preserved; only the size of the first dim is changed from in to out.

Constructors:

  • Linear(i::Int, j::Int; bias=true, actf=identity, init=xaview_normal) where i is fan-in and j is fan-out.

Keyword arguments:

  • bias=true: if false biases are fixed to 0.0
  • actf=identity: activation function.
source
NNHelferlein.EmbedType
struct Embed <: AbstractLayer

Simple type for an embedding layer to embed a virtual onehot-vector into a smaller number of neurons by linear combination. The onehot-vector is virtual, because not the vector, but only the index of the "one" in the vector has to be provided as Integer value (or a minibatch of integers) with values between 1 and the vocab size.

Constructors:

  • Embed(v,d; actf=identity, mask=nothing): with vocab size v, embedding depth d and default activation function identity. mask defines the padding token (see below).

Signatures:

  • (l::Embed)(x): default embedding of input tensor x.

Value:

The embedding is constructed by adding a first dimension to the input tensor with number of rows = embedding depth. If x is a column vector, the value is a matrix. If x is as row-vector or a matrix, the value is a 3-d array, etc.

Padding and masking:

If a token value is defined as mask, occurences are embedded as zero vector. This can be used for padding sequence with zeros. The masking/padding token counts to the vocab size. If padding tokens are not masked, their embedding will be optimised during training (which is not recommended but still possible for many applications).

Zero may be used as padding token, but it must count to the vocab size (i.e. the vocab size must be one larger than the number of tokens) and the keyword arg mask=0 must be specified.

source

Convolutional

NNHelferlein.ConvType
struct Conv  <: AbstractLayer

Default Conv layer.

Constructors:

  • Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu; kwargs...): layer with o kernels of size (w1,w2) for an input of i channels.
  • Conv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu; kwargs...): layer with 3-dimensional kernels for 3D convolution (requires 5-dimensional input)
  • Conv(w1::Int, i::Int, o::Int; actf=relu; kwargs...): layer with o kernels of size (1,w1) for an input of i channels. This 1-dimensional convolution uses a 2-dimensional kernel with a first dimension of size 1. Input and output contain an empty firfst dimension of size 1. If padding, stride or dilation are specified, 2-tuples must be specified to correspond with the 2-dimensional kernel (e.g. padding=(0,1) for a 1-padding along the 1D sequence).

Constructors to read parameters from Tensorflow/Keras HDF-files:

  • Conv(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=Knet.relu, use_bias=true, kwargs...): Import parameters from HDF file h5 with kernel and bias specifying the full path to weights and biases, respectively.
  • Conv(h5::HDF5.File, group::String; trainable=false, actf=relu, tf=true, use_bias=true): Import a conv-layer from a default TF/Keras HDF5 file. If tf=false, group defines the full path to the parameters group/kernel:0 and group/bias:0. If tf=true, group defines the only the group name and parameters are addressed as model_weights/group/group/kernel:0 and model_weights/group/group/bias:0.

Keyword arguments:

  • padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension.
  • stride=1: the number of elements to slide to reach the next filtering window.
  • dilation=1: dilation factor for each dimension.
  • ... See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function conv4() are supported.
source
NNHelferlein.DeConvType
struct DeConv  <: AbstractLayer

Default deconvolution layer.

Constructors:

  • DeConv(w, b, actf, kwargs...): default constructor
  • DeConv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2) for an input of i channels.
  • DeConv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2,w3) for an input of i channels.

Keyword arguments:

  • padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension (applied to the output).
  • stride=1: the number of elements to slide to reach the next filtering window (applied to the output).
  • ... See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function deconv4() are supported.
source
NNHelferlein.ResNetBlockType
struct ResNetBlock <: AbstractChain

Executable type for one block of a ResNet-type network.

Constructors:

  • ResNetBlock(layers; shortcut=[identity], post=[identity]): 3 chains to form the block: the main chain, the shortcut and a chain of layers to be added after the confluence. All chains must be specified as lists, even if they are empty ([]) or comprise only one layer ([BatchNorm]).
source
NNHelferlein.DepthwiseConvType
DepthwiseConv  <: AbstractLayer

Conv layer with seperate filters per input channel. o output feature maps will be created by performing a convolution on only one input channel. o must be a multiple of i.

Constructors:

  • DepthwiseConv(w, b, actf; kwargs): default constructor
  • Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2) for every input channel of an 2-d input of i layers. o must be a multiple of i; if o == i, each output feature map is generated from one channel. If o == n*i, n feature maps are generated from each channel.

Keyword arguments:

  • padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension.
  • stride=1: the number of elements to slide to reach the next filtering window.
  • dilation=1: dilation factor for each dimension.
source
NNHelferlein.PoolType
struct Pool <: AbstractLayer

Pooling layer.

Constructors:

  • Pool(;kwargs...): max pooling; without kwargs, 2-pooling is performed.

Keyword arguments:

  • window=2: pooling window size (same for all directions)
  • ...: See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function pool are supported.
source
NNHelferlein.UnPoolType
struct UnPool <: AbstractLayer

Unpooling layer.

Constructors:

  • UnPool(;kwargs...): user-defined unpooling
source
NNHelferlein.PadType
struct Pad     <: AbstractLayer

Pad an n-dimensional array along dimensions with one of the types ':zeros' (default), ':ones'.

Constructors:

  • Pad(padding::Int...; mode=:zeros): Pad with padding along all specified dims. If padding is a single integer, it is applied to all but the last 2 dims (i.e. in context of a CNN the channel and minibatch dimension will be excluded from padding). If more then one padding value is specified, the values will be applied to the dims in the order they are specified and missing values will be filled with zeros.

Keyword arguments:

  • mode: one of
    • :zeros: zero-padding
    • :ones: one-padding
source

Recurrent

NNHelferlein.RecurrentUnitType
abstract type RecurrentUnit end

Supertype for all recurrent unit types. Self-defined recurrent units which are a child of RecurrentUnit can be used inside the 'Recurrent' layer.

Interface

All subtypes of RecurrentUnit must provide the followning:

  • a constructor with signature Type(n_inputs, n_units; kwargs) and arbitrary keyword arguments.
  • an implementation of signature (o::Recurrent)(x) where x is a 3d- or 2d-array of shape [fan-in, mb-size, 1] or [fan-in, mb-size]. The function must return the result of one forward computation for one step and return the hidden state and set the internal fields h and optionally c.
  • a field h (to store the last hidden state)
  • an optional field c, if the cell state is to be stored such as in a lstm unit.
source
NNHelferlein.RecurrentType
struct Recurrent <: AbstractLayer

One layer RNN that works with minibatches of (time) series data. Minibatch can be a 2- or 3-dimensional Array. If 2-d, inputs for one step are in one column and the Array has as many colums as steps. If 3-d, the last dimension iterates the samples of the minibatch.

Result is an array matrix with the output of the units of all steps for all smaples of the minibatch (with model depth as first and samples of the minimatch as last dimension).

Constructors:

Recurrent(n_inputs::Int, n_units::Int; u_type=:lstm, 
+               \frac{1}{2} \sum_{j=1}^{n_{codes}}(1 + ln\sigma_{c_j}^{2}-\mu_{c_j}^{2}-\sigma_{c_j}^{2}) \]

Output of the autoencoder is cropped to the size of input before loss calculation (and before prediction); i.e. the output has always the same dimensions as the input, even if the last layer generates a bigger shape.

KL-training parameters:

The parameter β is by default set to 1.0, i.e. mean-squared error and KL has the same weights. The functions set_beta(vae, beta) and get_beta(vae) can be used to set and get the β used in training. With β=0.0 no KL-loss will be used.

source
NNHelferlein.get_betaFunction
function get_beta(vae::VAE; ramp=false)

Return a Dict with the current VAE-parameters beta and ramp-up.

Arguments:

  • ramp=false: if true, a vector of β for all ramp-up steps is returned. This way, the ramp-up phase can be visualised: <img src="./assets/vae-beta-range.png"/>
source
NNHelferlein.set_beta!Function

function setbeta!(vae::VAE, βmax; ramp_up=false, steps=0)

Helper to set the current value of the VAE-parameter beta and ramp-up settings.

VAE loss is calculated as (mean of error squares) + β * (mean of KL divergence).

Ramp-up:

In case of ramp_up=true, β starts with almost 0.0 (sigm(-10.0) ≈4.5e-5) and reaches almost 1.0 after steps steps, following a sigmoid curve. steps should be more than 25, to avoid rounding errors in the calculation of the derivative of the sigmoid function.

source

Layers

NNHelferlein.AbstractLayerType
abstract type AbstractLayer
+abstract type Layer

Mother type for layers hierarchy. (The type Layer is kept for backward compatibility)

source

Fully connected layers

NNHelferlein.DenseType
struct Dense  <: AbstractLayer

Default Dense layer.

Constructors:

  • Dense(w, b, actf): default constructor, w are the weights and b the bias.
  • Dense(i::Int, j::Int; actf=sigm, init=..): layer of j neurons with i inputs. Initialiser is xavieruniform for actf=sigm and xaviewnormal otherwise.
  • Dense(h5::HDF5.File, group::String; trainable=false, actf=sigm): kernel and bias are loaded by the specified group.
  • Dense(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=sigm): layer imported from a hdf5-file from TensorFlow with the hdf-object h5 and the group name group.
source
NNHelferlein.LinearType
struct Linear  <: AbstractLayer

Almost standard dense layer, but functionality inspired by the TensorFlow-layer:

  • capable to work with input tensors of any number of dimensions
  • default activation function identity
  • optionally without biases.

The shape of the input tensor is preserved; only the size of the first dim is changed from in to out.

Constructors:

  • Linear(i::Int, j::Int; bias=true, actf=identity, init=xaview_normal) where i is fan-in and j is fan-out.

Keyword arguments:

  • bias=true: if false biases are fixed to 0.0
  • actf=identity: activation function.
source
NNHelferlein.EmbedType
struct Embed <: AbstractLayer

Simple type for an embedding layer to embed a virtual onehot-vector into a smaller number of neurons by linear combination. The onehot-vector is virtual, because not the vector, but only the index of the "one" in the vector has to be provided as Integer value (or a minibatch of integers) with values between 1 and the vocab size.

Constructors:

  • Embed(v,d; actf=identity, mask=nothing): with vocab size v, embedding depth d and default activation function identity. mask defines the padding token (see below).

Signatures:

  • (l::Embed)(x): default embedding of input tensor x.

Value:

The embedding is constructed by adding a first dimension to the input tensor with number of rows = embedding depth. If x is a column vector, the value is a matrix. If x is as row-vector or a matrix, the value is a 3-d array, etc.

Padding and masking:

If a token value is defined as mask, occurences are embedded as zero vector. This can be used for padding sequence with zeros. The masking/padding token counts to the vocab size. If padding tokens are not masked, their embedding will be optimised during training (which is not recommended but still possible for many applications).

Zero may be used as padding token, but it must count to the vocab size (i.e. the vocab size must be one larger than the number of tokens) and the keyword arg mask=0 must be specified.

source

Convolutional

NNHelferlein.ConvType
struct Conv  <: AbstractLayer

Default Conv layer.

Constructors:

  • Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu; kwargs...): layer with o kernels of size (w1,w2) for an input of i channels.
  • Conv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu; kwargs...): layer with 3-dimensional kernels for 3D convolution (requires 5-dimensional input)
  • Conv(w1::Int, i::Int, o::Int; actf=relu; kwargs...): layer with o kernels of size (1,w1) for an input of i channels. This 1-dimensional convolution uses a 2-dimensional kernel with a first dimension of size 1. Input and output contain an empty firfst dimension of size 1. If padding, stride or dilation are specified, 2-tuples must be specified to correspond with the 2-dimensional kernel (e.g. padding=(0,1) for a 1-padding along the 1D sequence).

Constructors to read parameters from Tensorflow/Keras HDF-files:

  • Conv(h5::HDF5.File, kernel::String, bias::String; trainable=false, actf=Knet.relu, use_bias=true, kwargs...): Import parameters from HDF file h5 with kernel and bias specifying the full path to weights and biases, respectively.
  • Conv(h5::HDF5.File, group::String; trainable=false, actf=relu, tf=true, use_bias=true): Import a conv-layer from a default TF/Keras HDF5 file. If tf=false, group defines the full path to the parameters group/kernel:0 and group/bias:0. If tf=true, group defines the only the group name and parameters are addressed as model_weights/group/group/kernel:0 and model_weights/group/group/bias:0.

Keyword arguments:

  • padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension.
  • stride=1: the number of elements to slide to reach the next filtering window.
  • dilation=1: dilation factor for each dimension.
  • ... See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function conv4() are supported.
source
NNHelferlein.DeConvType
struct DeConv  <: AbstractLayer

Default deconvolution layer.

Constructors:

  • DeConv(w, b, actf, kwargs...): default constructor
  • DeConv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2) for an input of i channels.
  • DeConv(w1::Int, w2::Int, w3::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2,w3) for an input of i channels.

Keyword arguments:

  • padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension (applied to the output).
  • stride=1: the number of elements to slide to reach the next filtering window (applied to the output).
  • ... See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function deconv4() are supported.
source
NNHelferlein.ResNetBlockType
struct ResNetBlock <: AbstractChain

Executable type for one block of a ResNet-type network.

Constructors:

  • ResNetBlock(layers; shortcut=[identity], post=[identity]): 3 chains to form the block: the main chain, the shortcut and a chain of layers to be added after the confluence. All chains must be specified as lists, even if they are empty ([]) or comprise only one layer ([BatchNorm]).
source
NNHelferlein.DepthwiseConvType
DepthwiseConv  <: AbstractLayer

Conv layer with seperate filters per input channel. o output feature maps will be created by performing a convolution on only one input channel. o must be a multiple of i.

Constructors:

  • DepthwiseConv(w, b, actf; kwargs): default constructor
  • Conv(w1::Int, w2::Int, i::Int, o::Int; actf=relu, kwargs...): layer with o kernels of size (w1,w2) for every input channel of an 2-d input of i layers. o must be a multiple of i; if o == i, each output feature map is generated from one channel. If o == n*i, n feature maps are generated from each channel.

Keyword arguments:

  • padding=0: the number of extra zeros implicitly concatenated at the start and end of each dimension.
  • stride=1: the number of elements to slide to reach the next filtering window.
  • dilation=1: dilation factor for each dimension.
source
NNHelferlein.PoolType
struct Pool <: AbstractLayer

Pooling layer.

Constructors:

  • Pool(;kwargs...): max pooling; without kwargs, 2-pooling is performed.

Keyword arguments:

  • window=2: pooling window size (same for all directions)
  • ...: See the Knet documentation for Details: https://denizyuret.github.io/Knet.jl/latest/reference/#Convolution-and-Pooling. All keywords to the Knet function pool are supported.
source
NNHelferlein.UnPoolType
struct UnPool <: AbstractLayer

Unpooling layer.

Constructors:

  • UnPool(;kwargs...): user-defined unpooling
source
NNHelferlein.PadType
struct Pad     <: AbstractLayer

Pad an n-dimensional array along dimensions with one of the types ':zeros' (default), ':ones'.

Constructors:

  • Pad(padding::Int...; mode=:zeros): Pad with padding along all specified dims. If padding is a single integer, it is applied to all but the last 2 dims (i.e. in context of a CNN the channel and minibatch dimension will be excluded from padding). If more then one padding value is specified, the values will be applied to the dims in the order they are specified and missing values will be filled with zeros.

Keyword arguments:

  • mode: one of
    • :zeros: zero-padding
    • :ones: one-padding
source

Recurrent

NNHelferlein.RecurrentUnitType
abstract type RecurrentUnit end

Supertype for all recurrent unit types. Self-defined recurrent units which are a child of RecurrentUnit can be used inside the 'Recurrent' layer.

Interface

All subtypes of RecurrentUnit must provide the followning:

  • a constructor with signature Type(n_inputs, n_units; kwargs) and arbitrary keyword arguments.
  • an implementation of signature (o::Recurrent)(x) where x is a 3d- or 2d-array of shape [fan-in, mb-size, 1] or [fan-in, mb-size]. The function must return the result of one forward computation for one step and return the hidden state and set the internal fields h and optionally c.
  • a field h (to store the last hidden state)
  • an optional field c, if the cell state is to be stored such as in a lstm unit.
source
NNHelferlein.RecurrentType
struct Recurrent <: AbstractLayer

One layer RNN that works with minibatches of (time) series data. Minibatch can be a 2- or 3-dimensional Array. If 2-d, inputs for one step are in one column and the Array has as many colums as steps. If 3-d, the last dimension iterates the samples of the minibatch.

Result is an array matrix with the output of the units of all steps for all smaples of the minibatch (with model depth as first and samples of the minimatch as last dimension).

Constructors:

Recurrent(n_inputs::Int, n_units::Int; u_type=:lstm, 
           bidirectional=false, allow_mask=false, o...)
  • n_inputs: number of inputs
  • n_units: number of units
  • u_type : unit type can be one of the Knet unit types (:relu, :tanh, :lstm, :gru) or a type which must be a subtype of RecurrentUnit and fullfill the respective interface (see the docs for RecurentUnit).
  • bidirectional=false: if true, 2 layers of n_units units will be defined and run in forward and backward direction respectively. The hidden state is [2*n_units*mb] or [2*n_units,steps,mb] id return_all==true.
  • allow_mask=false: if masking is allowed, a slower algorithm is used to be able to ignore any masked step. Arbitrary sequence positions may be masked for any sequence.

Any keyword argument of Knet.RNN or a self-defined RecurrentUnit type may be provided.

Signatures:

function (rnn::Recurrent)(x; c=nothing, h=nothing, return_all=false, 
-          mask=nothing)

The layer is called either with a 2-dimensional array of the shape [fan-in, steps] or a 3-dimensional array of [fan-in, steps, batchsize].

Arguments:

  • c=0, h=0: inits the hidden and cell state. If nothing, states h or c keep their values. If c=0 or h=0, the states are resetted to 0; otherwise an array of states of the correct dimensions can be supplied to be used as initial states.
  • return_all=false: if true an array with all hidden states of all steps is returned (size is [units, time-steps, minibatch]). Otherwise only the hidden states of the last step are returned ([units, minibatch]).
  • mask: optional mask for the input sequence minibatch of shape [steps, minibatch]. Values in the mask must be 1.0 for masked positions or 0.0 otherwise and of type Float32 or CuArray{Float32} for GPU context. Appropriate masks can be generated with the NNHelferlein function mk_padding_mask().

Bidirectional layers can be constructed by specifying bidirectional=true, if the unit-type supports it (Knet.RNN does). Please be aware that the actual number of units is 2 x n_units for bidirectional layers and the output dimension is [2 x units, steps, mb] or [2 x units, mb].

source
NNHelferlein.get_hidden_statesFunction
function get_hidden_states(l::<RNN_Type>; flatten=true)

Return the hidden states of one or more layers of an RNN. <RNN_Type> is one of NNHelferlein.Recurrent, Knet.RNN.

Arguments:

  • flatten=true: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.
source
NNHelferlein.get_cell_statesFunction
function get_cell_states(l::<RNN_Type>; unbox=true, flatten=true)

Return the cell states of one or more layers of an RNN only if it is a LSTM (Long short-term memory).

Arguments:

  • unbox=true: By default, c is unboxed when called in @diff context (while AutoGrad is recording) to avoid unwanted dependencies of the computation graph s2s.attn(reset=true) (backprop should run via the hidden states, not the cell states).
  • flatten=true: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.
source

Transformers

NNHelferlein.TFEncoderType
TFEncoder

A Bert-like encoder to be used as part of a tranformer. The encoder is build as a stack of TFEncoderLayers which is entered after embedding, positional encoding and generation of a padding mask.

Constructor:

TFEncoder(n_layers, depth, n_heads; drop_rate=0.1)

Signature:

(e::TFEncoder)(x)

The encoder is called with a matrix of embedded tokens of size [depth, seq_len, n_minibatch] and returns a tensor of size [depth, seq_len, n_minibatch].

source
NNHelferlein.TFEncoderLayerType
TFEncoderLayer

A Bert-like encoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. Both parts have separate residual connections and layer normalisation.

The design follows the original paper "Attention is all you need" by Vaswani, 2017.

Constructor:

TFEncoderLayer(depth, n_heads, drop)
  • depth: Embedding depth
  • n_heads: number of heads for the multi-head attention
  • drop_rate: dropout rate

Signature:

(el::TFEncoderLayer)(x; mask=nothing)

Objects of type TFEncoderLayer are callable and expect a 3-dimensional array of size [embeddingdepth, seqlen, minibatchsize] as input. The optional mask must be of size [seqlen, minibatch_size] and mark masked positions with 1.0.

It returns a tensor of the same size as the input and the self-attention factors of size [seqlen, seqlen, minibatch_size].

source
NNHelferlein.TFDecoderType
TFDecoder

A Bert-like decoder to be used as part of a tranformer. The decoder is build as a stack of TFDecoderLayers which is entered after embedding, positional encoding and generation of a padding mask and a peek-ahead mask.

Constructor:

TFDecoder(n_layers, depth, n_heads, vocab_size; 
-          pad_id=NNHelferlein.TOKEN_PAD, drop_rate=0.1)

Signature:

(e::TFdecoder)(x)

The decoder is called with a matrix of token ids of size [seq_len, n_minibatch] and returns a tensor of size [depth, seq_len, n_minibatch] and the attention factors.

source
NNHelferlein.TFDecoderLayerType
TFDecoderLayer

A Bert-like decoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head self-attention sub-layer, a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. All three parts have separate residual connections and layer normalisation.

The design follows the original paper "Attention is all you need" by Vaswani, 2017.

Constructor:

TFDecoderLayer(depth, n_heads, drop)
  • depth: Embedding depth
  • n_heads: number of heads for the multi-head attention
  • drop: dropout rate

Signature:

(el::TFDecoderLayer)(x, h_encoder; enc_m_pad=nothing, m_combi=nothing)

Objects of type TFDecoderLayer are callable and expect a minibatch of embedded sequences as input.

  • x: 3-dimensional array of size [embeddingdepth, seqlen, minibatch_size]
  • h_encoder: output of the encoder stack
  • enc_m_pad: optional padding mask for the encoder output
  • m_combi: optional mask for the decoder self-attention combining padding and peek-ahead mask.

It returns a tensor of the same size as the input, the self-attention factors and the decoder-encoder attention factors.

source

These layers are used by the Transformer and TokenTransformer types to build Bert-like transformer networks.

Others

NNHelferlein.FlatType
struct Flat <: AbstractLayer

Default flatten layer.

Constructors:

  • Flat(): with no options.
source
NNHelferlein.PyFlatType
struct PyFlat <: AbstractLayer

Flatten layer with optional Python-stype flattening (row-major). This layer can be used if pre-trained weight matrices from tensorflow are applied after the flatten layer.

Constructors:

  • PyFlat(; python=true): if true, row-major flatten is performed.
source
NNHelferlein.FeatureSelectionType
struct FeatureSelection  <: AbstractLayer

Simple feature selection layer that maps input to output with one-by-one connections; i.e. a layer of size 128 has 128 weights (plus optional biases).

Biases and activation functions are disabled by default.

Constructors:

  • FeatureSelection(i; bias=false, actf=identity): with the same input- and output-size i, whre i is an integer or a Tuple of the input dimensions.
source
NNHelferlein.ActivationType
struct Activation <: AbstractLayer

Simple activation layer with the desired activation function as argument.

Constructors:

  • Activation(actf)
  • Relu()
  • Sigm()
  • Swish()
source
NNHelferlein.SoftmaxType
struct Softmax <: AbstractLayer

Simple softmax layer to compute softmax probabilities.

Constructors:

  • Softmax()
source
NNHelferlein.LogisticType
struct Logistic <: AbstractLayer

Logistic (sigmoid) layer activation with additional Temperature parameter to control the slope of the curve. Low temperatures (such as T=0.001) result in a step-like activation function, whereas high temperatures (such as T=10) makes the activation almoset linear.

Constructors:

  • Logistic(; T=1.0)
source
NNHelferlein.DropoutType
struct Dropout <: AbstractLayer

Dropout layer. Implemented with help of Knet's dropout() function that evaluates AutoGrad.recording() to detect if in training or in prediction. Dropouts are applied only if prediction.

Constructors:

  • Dropout(p) with the dropout rate p.
source
NNHelferlein.BatchNormType
struct BatchNorm <: AbstractLayer

Batchnormalisation layer. Implemented with help of Knet's batchnorm() function that evaluates AutoGrad.recording() to detect if in training or in prediction. In training the moments are updated to record the running averages; in prediction the moments are applied, but not modified.

In addition, optional trainable factor a and bias b are applied:

\[y = a \cdot \frac{(x - \mu)}{(\sigma + \epsilon)} + b\]

Constructors:

  • BatchNorm(; scale=true, channels=0) will initialise the moments with Knet.bnmoments() and trainable parameters β and γ only if scale==true (in this case, the number of channels must be defined - for CNNs this is the number of feature maps).

Constructors to read parameters from Tensorflow/Keras HDF-files:

  • BatchNorm(h5::HDF5.File, β_path, γ_path, μ_path, var_path; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4): Import parameters from HDF file h5 with β_path, γ_path, μ_path and var_path specifying the full path to β, γ, μ and variance respectively.

  • BatchNorm(h5::HDF5.File, group::String; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4, tf=true): Import parameters from HDF file h5 with parameters in the group group. Paths to β, γ, μ and variance are constructed if tf=true as model_weights/group/group/beta:0, etc. If tf=false group must define the full group path: group/beta:0. dims specifies the number of dimensions of the input and may be 2, 4 or 5. The default (4) applies to standard CNNs (imgsize, imgsize, channels, batchsize).

Keyword arguments:

  • scale=true: if true, the trainable scale parameters β and γ are used.
  • trainable=true. only used with hdf5-import. If true the parameters β and γ are initialised as Param and trained in training.

Details:

2d, 4d and 5d inputs are supported. Mean and variance are computed over dimensions (2), (1,2,4) and (1,2,3,5) for 2d, 4d and 5d arrays, respectively.

If scale=true and channels != 0, trainable parameters β and γ will be initialised for each channel.

If scale=true and channels == 0 (i.e. BatchNorm(scale=true)), the params β and γ are not initialised by the constructor. Instead, the number of channels is inferred when the first minibatch is normalised as: 2d: size(x)[1] 4d: size(x)[3] 5d: size(x)[4] or 0 otherwise.

source
NNHelferlein.LayerNormType
struct LayerNorm  <: AbstractLayer

Simple layer normalisation (inspired by TFs LayerNormalization). Implementation is from Deniz Yuret's answer to feature request 429 (https://github.com/denizyuret/Knet.jl/issues/492).

The layer performs a normalisation within each sample, not batchwise. Normalisation is modified by two trainable parameters a and b (variance and mean) added to every value of the sample vector.

Constructors:

  • LayertNorm(depth; eps=1e-6): depth is the number of activations for one sample of the layer.

Signatures:

  • function (l::LayerNorm)(x; dims=1): normalise x along the given dimensions. The size of the specified dimension must fit with the initialised depth.
source
NNHelferlein.GaussianNoiseType
struct GaussianNoise

Gaussian noise layer. Multiplies Gaussian-distributed random values with mean = 1.0 and sigma = σ to each training value.

Constructors:

  • aussianNoise(σ; train_only=true)

Arguments:

  • σ: Standard deviation for the distribution of noise
  • train_only=true: if true, noise will only be applied in training.
source
NNHelferlein.GlobalAveragePoolingType
struct GlobalAveragePooling  <: AbstractLayer

Layer to return a matrix with the mean values of all but the last two dimensions for each sample of the minibatch. If the input is a stack of feature maps from a convolutional layer, the result can be seen as the mean value of each feature map. Number of output-rows equals number of input-featuremaps; number of output-columns equals size of minibatch.

Constructors:

GlobalAveragePooling()
source

Attention Mechanisms

NNHelferlein.AttentionMechanismType
abstract type AttentionMechanism

Attention mechanisms follow the same interface and common signatures.

If possible, the algorithm allows precomputing of the projections of the context vector generated by the encoder in a encoder-decoder-architecture (i.e. in case of an RNN encoder the accumulated encoder hidden states).

By default attention scores are scaled according to Vaswani et al., 2017 (Vaswani et al., Attention Is All You Need, CoRR, 2017).

All algorithms use soft attention.

Constructors:

Attn*Mechanism*(dec_units, enc_units; scale=true)
+          mask=nothing)

The layer is called either with a 2-dimensional array of the shape [fan-in, steps] or a 3-dimensional array of [fan-in, steps, batchsize].

Arguments:

  • c=0, h=0: inits the hidden and cell state. If nothing, states h or c keep their values. If c=0 or h=0, the states are resetted to 0; otherwise an array of states of the correct dimensions can be supplied to be used as initial states.
  • return_all=false: if true an array with all hidden states of all steps is returned (size is [units, time-steps, minibatch]). Otherwise only the hidden states of the last step are returned ([units, minibatch]).
  • mask: optional mask for the input sequence minibatch of shape [steps, minibatch]. Values in the mask must be 1.0 for masked positions or 0.0 otherwise and of type Float32 or CuArray{Float32} for GPU context. Appropriate masks can be generated with the NNHelferlein function mk_padding_mask().

Bidirectional layers can be constructed by specifying bidirectional=true, if the unit-type supports it (Knet.RNN does). Please be aware that the actual number of units is 2 x n_units for bidirectional layers and the output dimension is [2 x units, steps, mb] or [2 x units, mb].

source
NNHelferlein.get_hidden_statesFunction
function get_hidden_states(l::<RNN_Type>; flatten=true)

Return the hidden states of one or more layers of an RNN. <RNN_Type> is one of NNHelferlein.Recurrent, Knet.RNN.

Arguments:

  • flatten=true: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.
source
NNHelferlein.get_cell_statesFunction
function get_cell_states(l::<RNN_Type>; unbox=true, flatten=true)

Return the cell states of one or more layers of an RNN only if it is a LSTM (Long short-term memory).

Arguments:

  • unbox=true: By default, c is unboxed when called in @diff context (while AutoGrad is recording) to avoid unwanted dependencies of the computation graph s2s.attn(reset=true) (backprop should run via the hidden states, not the cell states).
  • flatten=true: if the states tensor is 3d with a 3rd dim > 1, the array is transformed to [units, mb, 1] to represent all current states after the last step.
source

Transformers

NNHelferlein.TFEncoderType
TFEncoder

A Bert-like encoder to be used as part of a tranformer. The encoder is build as a stack of TFEncoderLayers which is entered after embedding, positional encoding and generation of a padding mask.

Constructor:

TFEncoder(n_layers, depth, n_heads; drop_rate=0.1)

Signature:

(e::TFEncoder)(x)

The encoder is called with a matrix of embedded tokens of size [depth, seq_len, n_minibatch] and returns a tensor of size [depth, seq_len, n_minibatch].

source
NNHelferlein.TFEncoderLayerType
TFEncoderLayer

A Bert-like encoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. Both parts have separate residual connections and layer normalisation.

The design follows the original paper "Attention is all you need" by Vaswani, 2017.

Constructor:

TFEncoderLayer(depth, n_heads, drop)
  • depth: Embedding depth
  • n_heads: number of heads for the multi-head attention
  • drop_rate: dropout rate

Signature:

(el::TFEncoderLayer)(x; mask=nothing)

Objects of type TFEncoderLayer are callable and expect a 3-dimensional array of size [embeddingdepth, seqlen, minibatchsize] as input. The optional mask must be of size [seqlen, minibatch_size] and mark masked positions with 1.0.

It returns a tensor of the same size as the input and the self-attention factors of size [seqlen, seqlen, minibatch_size].

source
NNHelferlein.TFDecoderType
TFDecoder

A Bert-like decoder to be used as part of a tranformer. The decoder is build as a stack of TFDecoderLayers which is entered after embedding, positional encoding and generation of a padding mask and a peek-ahead mask.

Constructor:

TFDecoder(n_layers, depth, n_heads, vocab_size; 
+          pad_id=NNHelferlein.TOKEN_PAD, drop_rate=0.1)

Signature:

(e::TFdecoder)(x)

The decoder is called with a matrix of token ids of size [seq_len, n_minibatch] and returns a tensor of size [depth, seq_len, n_minibatch] and the attention factors.

source
NNHelferlein.TFDecoderLayerType
TFDecoderLayer

A Bert-like decoder layer to be used as part of a Bert-like transformer. The layer consists of a multi-head self-attention sub-layer, a multi-head attention sub-layer followed by a feed-forward network of size depth -> 4*depth -> depth. All three parts have separate residual connections and layer normalisation.

The design follows the original paper "Attention is all you need" by Vaswani, 2017.

Constructor:

TFDecoderLayer(depth, n_heads, drop)
  • depth: Embedding depth
  • n_heads: number of heads for the multi-head attention
  • drop: dropout rate

Signature:

(el::TFDecoderLayer)(x, h_encoder; enc_m_pad=nothing, m_combi=nothing)

Objects of type TFDecoderLayer are callable and expect a minibatch of embedded sequences as input.

  • x: 3-dimensional array of size [embeddingdepth, seqlen, minibatch_size]
  • h_encoder: output of the encoder stack
  • enc_m_pad: optional padding mask for the encoder output
  • m_combi: optional mask for the decoder self-attention combining padding and peek-ahead mask.

It returns a tensor of the same size as the input, the self-attention factors and the decoder-encoder attention factors.

source

These layers are used by the Transformer and TokenTransformer types to build Bert-like transformer networks.

Others

NNHelferlein.FlatType
struct Flat <: AbstractLayer

Default flatten layer.

Constructors:

  • Flat(): with no options.
source
NNHelferlein.PyFlatType
struct PyFlat <: AbstractLayer

Flatten layer with optional Python-stype flattening (row-major). This layer can be used if pre-trained weight matrices from tensorflow are applied after the flatten layer.

Constructors:

  • PyFlat(; python=true): if true, row-major flatten is performed.
source
NNHelferlein.FeatureSelectionType
struct FeatureSelection  <: AbstractLayer

Simple feature selection layer that maps input to output with one-by-one connections; i.e. a layer of size 128 has 128 weights (plus optional biases).

Biases and activation functions are disabled by default.

Constructors:

  • FeatureSelection(i; bias=false, actf=identity): with the same input- and output-size i, whre i is an integer or a Tuple of the input dimensions.
source
NNHelferlein.ActivationType
struct Activation <: AbstractLayer

Simple activation layer with the desired activation function as argument.

Constructors:

  • Activation(actf)
  • Relu()
  • Sigm()
  • Swish()
source
NNHelferlein.SoftmaxType
struct Softmax <: AbstractLayer

Simple softmax layer to compute softmax probabilities.

Constructors:

  • Softmax()
source
NNHelferlein.LogisticType
struct Logistic <: AbstractLayer

Logistic (sigmoid) layer activation with additional Temperature parameter to control the slope of the curve. Low temperatures (such as T=0.001) result in a step-like activation function, whereas high temperatures (such as T=10) makes the activation almoset linear.

Constructors:

  • Logistic(; T=1.0)
source
NNHelferlein.DropoutType
struct Dropout <: AbstractLayer

Dropout layer. Implemented with help of Knet's dropout() function that evaluates AutoGrad.recording() to detect if in training or in prediction. Dropouts are applied only if prediction.

Constructors:

  • Dropout(p) with the dropout rate p.
source
NNHelferlein.BatchNormType
struct BatchNorm <: AbstractLayer

Batchnormalisation layer. Implemented with help of Knet's batchnorm() function that evaluates AutoGrad.recording() to detect if in training or in prediction. In training the moments are updated to record the running averages; in prediction the moments are applied, but not modified.

In addition, optional trainable factor a and bias b are applied:

\[y = a \cdot \frac{(x - \mu)}{(\sigma + \epsilon)} + b\]

Constructors:

  • BatchNorm(; scale=true, channels=0) will initialise the moments with Knet.bnmoments() and trainable parameters β and γ only if scale==true (in this case, the number of channels must be defined - for CNNs this is the number of feature maps).

Constructors to read parameters from Tensorflow/Keras HDF-files:

  • BatchNorm(h5::HDF5.File, β_path, γ_path, μ_path, var_path; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4): Import parameters from HDF file h5 with β_path, γ_path, μ_path and var_path specifying the full path to β, γ, μ and variance respectively.

  • BatchNorm(h5::HDF5.File, group::String; scale=false, trainable=true, momentum=0.1, ε=1e-5, dims=4, tf=true): Import parameters from HDF file h5 with parameters in the group group. Paths to β, γ, μ and variance are constructed if tf=true as model_weights/group/group/beta:0, etc. If tf=false group must define the full group path: group/beta:0. dims specifies the number of dimensions of the input and may be 2, 4 or 5. The default (4) applies to standard CNNs (imgsize, imgsize, channels, batchsize).

Keyword arguments:

  • scale=true: if true, the trainable scale parameters β and γ are used.
  • trainable=true. only used with hdf5-import. If true the parameters β and γ are initialised as Param and trained in training.

Details:

2d, 4d and 5d inputs are supported. Mean and variance are computed over dimensions (2), (1,2,4) and (1,2,3,5) for 2d, 4d and 5d arrays, respectively.

If scale=true and channels != 0, trainable parameters β and γ will be initialised for each channel.

If scale=true and channels == 0 (i.e. BatchNorm(scale=true)), the params β and γ are not initialised by the constructor. Instead, the number of channels is inferred when the first minibatch is normalised as: 2d: size(x)[1] 4d: size(x)[3] 5d: size(x)[4] or 0 otherwise.

source
NNHelferlein.LayerNormType
struct LayerNorm  <: AbstractLayer

Simple layer normalisation (inspired by TFs LayerNormalization). Implementation is from Deniz Yuret's answer to feature request 429 (https://github.com/denizyuret/Knet.jl/issues/492).

The layer performs a normalisation within each sample, not batchwise. Normalisation is modified by two trainable parameters a and b (variance and mean) added to every value of the sample vector.

Constructors:

  • LayertNorm(depth; eps=1e-6): depth is the number of activations for one sample of the layer.

Signatures:

  • function (l::LayerNorm)(x; dims=1): normalise x along the given dimensions. The size of the specified dimension must fit with the initialised depth.
source
NNHelferlein.GaussianNoiseType
struct GaussianNoise

Gaussian noise layer. Multiplies Gaussian-distributed random values with mean = 1.0 and sigma = σ to each training value.

Constructors:

  • aussianNoise(σ; train_only=true)

Arguments:

  • σ: Standard deviation for the distribution of noise
  • train_only=true: if true, noise will only be applied in training.
source
NNHelferlein.GlobalAveragePoolingType
struct GlobalAveragePooling  <: AbstractLayer

Layer to return a matrix with the mean values of all but the last two dimensions for each sample of the minibatch. If the input is a stack of feature maps from a convolutional layer, the result can be seen as the mean value of each feature map. Number of output-rows equals number of input-featuremaps; number of output-columns equals size of minibatch.

Constructors:

GlobalAveragePooling()
source

Attention Mechanisms

NNHelferlein.AttentionMechanismType
abstract type AttentionMechanism

Attention mechanisms follow the same interface and common signatures.

If possible, the algorithm allows precomputing of the projections of the context vector generated by the encoder in a encoder-decoder-architecture (i.e. in case of an RNN encoder the accumulated encoder hidden states).

By default attention scores are scaled according to Vaswani et al., 2017 (Vaswani et al., Attention Is All You Need, CoRR, 2017).

All algorithms use soft attention.

Constructors:

Attn*Mechanism*(dec_units, enc_units; scale=true)
 Attn*Mechanism*(units; scale=true)

The one-argument version can be used, if encoder dimensions and decoder dimensions are the same.

Common Signatures:

function (attn::AttentionMechanism)(h_t, h_enc; reset=false, mask=nothing)
-function (attn::AttentionMechanism)(; reset=false)

Arguments:

  • h_t: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].
  • h_enc: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.
  • mask: optional mask (e.g. padding mask) for masking input steps of dimensions [mb, steps]. Attentions factors for masked steps will be set to 0.0.
  • reset=false: If the keyword argument is set to true, projections of the encoder states are computed. By default projections are stored in the object and reused until the object is resetted. For attention mechanisms that do not allow precomputation the argument is ignored.

The short form (::AttentionMechanism)(reset=true) can be used to reset the precomputed projections.

Return values

All functions return c and α where α is a matrix of size [mb,steps] with the attention factors for each step and minibatch. c is a matrix of size [units, mb] with the context vector for each sample of the minibatch, calculated as the α-weighted sum of all encoder hidden states $h_{enc}$ for each minibatch.

Attention Mechanisms:

All attention mechanisms calculate attention factors α from scores derived from projections of the encoder hidden states:

\[\alpha = \mathrm{softmax}(\mathrm{score}(h_{enc},h_{t}) \cdot 1/\sqrt{n}))\]

Attention mechanisms implemented:

source
NNHelferlein.AttnBahdanauType
mutable struct AttnBahdanau <: AttentionMechanism

Bahdanau-style (additive, concat) attention mechanism according to the paper:

D. Bahdanau, KH. Co, Y. Bengio, Neural Machine Translation by jointlylearning to align and translate, ICLR, 2015.

\[\mathrm{score}(h_{t},h_{enc}) = v_{a}^{\top}\cdot\tanh(W[h_{t},h_{enc}])\]

Constructors:

AttnBahdanau(dec_units, enc_units; scale=true)
-AttnBahdanau(units; scale=true)
source
NNHelferlein.AttnLuongType
mutable struct AttnLuong <: AttentionMechanism

Luong-style (multiplicative) attention mechanism according to the paper (referred as General-type attention): M.-T. Luong, H. Pham, C.D. Manning, Effective Approaches to Attention-based Neural Machine Translation, CoRR, 2015.

\[\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} W h_{enc}\]

Constructors:

AttnLuong(dec_units, enc_units; scale=true)
-AttnLuong(units; scale=true)
source
NNHelferlein.AttnDotType
mutable struct AttnDot <: AttentionMechanism

Dot-product attention (without trainable parameters) according to the Luong, et al. (2015) paper.

$\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} h_{enc}$

Constructors:

AttnDot(; scale=true)
source
NNHelferlein.AttnLocationType
mutable struct AttnLocation <: AttentionMechanism

Location-based attention that only depends on the current decoder state $h_t$ and not on the encoder states, according to the Luong, et al. (2015) paper.

$\mathrm{score}(h_{t}) = W h_{t}$

Constructors:

AttnLocation(len, dec_units; scale=true)
  • len: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length of α, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller than α, only the matching attention factors are applied.
  • dec_units: number of decoder units.
source
NNHelferlein.AttnInFeedType
mutable struct AttnInFeed <: AttentionMechanism

Input-feeding attention that depends on the current decoder state $h_t$ and the next input to the decoder $i_{t+1}$, according to the Luong, et al. (2015) paper.

Infeed attention provides a semantic attention that depends on the next input token.

$\mathrm{score}(h_{t}, i_{t+1}) = W_h h_{t} + W_i i_{t+1} = W [h_t, i_{t+1}]$

Constructors:

AttnInFeed(len, dec_units, fan_in; scale=true)
  • len: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length of α, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller than α, only the matching attention factors are applied.
  • dec_units: number of decoder units.
  • fan_in: size of the decoder input.

Signature:

function (attn::AttnInFeed)(h_t, inp, h_enc; mask=nothing)
  • h_t: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].
  • inp: next decoder input $i_{t+1}$ (e.g. next embedded token of sequence)
  • h_enc: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.
  • mask: Optional mask for input states of shape [mb, steps].
source

Data providers

NNHelferlein.SequenceDataType
struct SequenceData <: DataLoader

Type for a generic minibatch iterator. All NNHelferlein models accept minibatches of type DataLoader.

Constructors:

SequenceData(x; shuffle=true)
  • x: List, Array or other iterable object with the minibatches
  • shuffle: if true, minibatches are shuffled every epoch.
source

Iteration utilities

NNHelferlein.PartialIteratorType
struct PartialIterator <: DataLoader

The PartialIterator wraps any iterator and will only iterate the states specified in the list indices.

Constuctors

PartialIterator(inner, indices; shuffle=true)

Type of the states must match the states of the wrapped iterator inner. A nothing element may be given to specify the first iterator element.

If shuffle==true, the list of indices are shuffled every time the PartialIterator is started.

source
NNHelferlein.split_minibatchesFunction
function split_minibatches(it, at=0.8; shuffle=true)

Return 2 iterators of type PartialIterator which iterate only parts of the states of the iterator it. Be aware that the partial iterators will not contain copies of the data but instead forward the data provided by the iterator it.

The function can be used to split an iterator of minibatches into train- and validation iterators, without copying any data. As the PartialIterator objects work with the states of the inner iterator, it is important not to shuffle the inner iterator (in this case the composition of the partial iterators would change and training and validation data may be mixed!).

Arguments:

  • it: Iterator to be splitted. The list of allowed states is created by performing a full iteration once.
  • at: Split point. The first returned iterator will include the given fraction (default: 80%) of the states.
  • shuffle: If true, the elements are shuffled at each restart of the iterator.
source
NNHelferlein.MBNoiserType
type MBNoiser

Iterator to wrap any Knet.Data iterator of minibatches in order to add random noise. Each value will be multiplied with a random value form Gaussian noise with mean=1.0 and sd=σ.

Construtors:

MBNoiser(mbs::Knet.Data, σ)
+function (attn::AttentionMechanism)(; reset=false)

Arguments:

  • h_t: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].
  • h_enc: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.
  • mask: optional mask (e.g. padding mask) for masking input steps of dimensions [mb, steps]. Attentions factors for masked steps will be set to 0.0.
  • reset=false: If the keyword argument is set to true, projections of the encoder states are computed. By default projections are stored in the object and reused until the object is resetted. For attention mechanisms that do not allow precomputation the argument is ignored.

The short form (::AttentionMechanism)(reset=true) can be used to reset the precomputed projections.

Return values

All functions return c and α where α is a matrix of size [mb,steps] with the attention factors for each step and minibatch. c is a matrix of size [units, mb] with the context vector for each sample of the minibatch, calculated as the α-weighted sum of all encoder hidden states $h_{enc}$ for each minibatch.

Attention Mechanisms:

All attention mechanisms calculate attention factors α from scores derived from projections of the encoder hidden states:

\[\alpha = \mathrm{softmax}(\mathrm{score}(h_{enc},h_{t}) \cdot 1/\sqrt{n}))\]

Attention mechanisms implemented:

source
NNHelferlein.AttnBahdanauType
mutable struct AttnBahdanau <: AttentionMechanism

Bahdanau-style (additive, concat) attention mechanism according to the paper:

D. Bahdanau, KH. Co, Y. Bengio, Neural Machine Translation by jointlylearning to align and translate, ICLR, 2015.

\[\mathrm{score}(h_{t},h_{enc}) = v_{a}^{\top}\cdot\tanh(W[h_{t},h_{enc}])\]

Constructors:

AttnBahdanau(dec_units, enc_units; scale=true)
+AttnBahdanau(units; scale=true)
source
NNHelferlein.AttnLuongType
mutable struct AttnLuong <: AttentionMechanism

Luong-style (multiplicative) attention mechanism according to the paper (referred as General-type attention): M.-T. Luong, H. Pham, C.D. Manning, Effective Approaches to Attention-based Neural Machine Translation, CoRR, 2015.

\[\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} W h_{enc}\]

Constructors:

AttnLuong(dec_units, enc_units; scale=true)
+AttnLuong(units; scale=true)
source
NNHelferlein.AttnDotType
mutable struct AttnDot <: AttentionMechanism

Dot-product attention (without trainable parameters) according to the Luong, et al. (2015) paper.

$\mathrm{score}(h_{t},h_{enc}) = h_{t}^{\top} h_{enc}$

Constructors:

AttnDot(; scale=true)
source
NNHelferlein.AttnLocationType
mutable struct AttnLocation <: AttentionMechanism

Location-based attention that only depends on the current decoder state $h_t$ and not on the encoder states, according to the Luong, et al. (2015) paper.

$\mathrm{score}(h_{t}) = W h_{t}$

Constructors:

AttnLocation(len, dec_units; scale=true)
  • len: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length of α, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller than α, only the matching attention factors are applied.
  • dec_units: number of decoder units.
source
NNHelferlein.AttnInFeedType
mutable struct AttnInFeed <: AttentionMechanism

Input-feeding attention that depends on the current decoder state $h_t$ and the next input to the decoder $i_{t+1}$, according to the Luong, et al. (2015) paper.

Infeed attention provides a semantic attention that depends on the next input token.

$\mathrm{score}(h_{t}, i_{t+1}) = W_h h_{t} + W_i i_{t+1} = W [h_t, i_{t+1}]$

Constructors:

AttnInFeed(len, dec_units, fan_in; scale=true)
  • len: maximum sequence length of the encoder to be considered for attention. If the actual length of $h_{enc}$ is bigger than the length of α, attention factors for the remaining states are set to 0.0. If the actual length of h_enc is smaller than α, only the matching attention factors are applied.
  • dec_units: number of decoder units.
  • fan_in: size of the decoder input.

Signature:

function (attn::AttnInFeed)(h_t, inp, h_enc; mask=nothing)
  • h_t: decoder hidden state. If $h_t$ is a vector, its length equals the number of decoder units. If it is a matrix, $h_t$ includes the states for a minibatch of samples and has the size [units, mb].
  • inp: next decoder input $i_{t+1}$ (e.g. next embedded token of sequence)
  • h_enc: encoder hidden states, 2d or 3d. If $h_{enc}$ is a matrix [units, steps] with the hidden states of all encoder steps. If 3d: [units, mb, steps] encoder states for all minibatches.
  • mask: Optional mask for input states of shape [mb, steps].
source

Data providers

NNHelferlein.SequenceDataType
struct SequenceData <: DataLoader

Type for a generic minibatch iterator. All NNHelferlein models accept minibatches of type DataLoader.

Constructors:

SequenceData(x; shuffle=true)
  • x: List, Array or other iterable object with the minibatches
  • shuffle: if true, minibatches are shuffled every epoch.
source

Iteration utilities

NNHelferlein.PartialIteratorType
struct PartialIterator <: DataLoader

The PartialIterator wraps any iterator and will only iterate the states specified in the list indices.

Constuctors

PartialIterator(inner, indices; shuffle=true)

Type of the states must match the states of the wrapped iterator inner. A nothing element may be given to specify the first iterator element.

If shuffle==true, the list of indices are shuffled every time the PartialIterator is started.

source
NNHelferlein.split_minibatchesFunction
function split_minibatches(it, at=0.8; shuffle=true)

Return 2 iterators of type PartialIterator which iterate only parts of the states of the iterator it. Be aware that the partial iterators will not contain copies of the data but instead forward the data provided by the iterator it.

The function can be used to split an iterator of minibatches into train- and validation iterators, without copying any data. As the PartialIterator objects work with the states of the inner iterator, it is important not to shuffle the inner iterator (in this case the composition of the partial iterators would change and training and validation data may be mixed!).

Arguments:

  • it: Iterator to be splitted. The list of allowed states is created by performing a full iteration once.
  • at: Split point. The first returned iterator will include the given fraction (default: 80%) of the states.
  • shuffle: If true, the elements are shuffled at each restart of the iterator.
source
NNHelferlein.MBNoiserType
type MBNoiser

Iterator to wrap any Knet.Data iterator of minibatches in order to add random noise. Each value will be multiplied with a random value form Gaussian noise with mean=1.0 and sd=σ.

Construtors:

MBNoiser(mbs::Knet.Data, σ)
 MBNoiser(mbs::Knet.Data; σ=0.01)
  • mbs: iterator with minibatches
  • σ: standard deviation for the Gaussian noise

Example:

julia> trn = minibatch(x)
 julia> tb_train!(mdl, Adam, MBNoiser(trn, σ=0.1))
-julia> mbs_noised = MBNoiser(mbs, 0.05)
source
NNHelferlein.MBMasqueradeType
struct MBMasquerade  <: DataLoader

Iterator wrapper to partially mask training data of a minibatch iterator of type Knet.Data or NNHelferlein.DataLoader.

Constructors:

MBMasquerade(it, rho=0.1; mode=:noise, value=0)
+julia> mbs_noised = MBNoiser(mbs, 0.05)
source
NNHelferlein.MBMasqueradeType
struct MBMasquerade  <: DataLoader

Iterator wrapper to partially mask training data of a minibatch iterator of type Knet.Data or NNHelferlein.DataLoader.

Constructors:

MBMasquerade(it, rho=0.1; mode=:noise, value=0)
 MBMasquerade(it; ρ=0.1, mode=:noise, value=0)

The constructor may be called with the density rho as normal argument or ρ as keyword argument.

Arguments:

  • it: Minibatch iterator that must deliver (x,y)-tuples of minibatches
  • ρ=0.1 or rho: Density of mask; a value of 1.0 will mask everything, a value of 0.0 nothing.
  • value=0: the value with which the masking is done.
  • mode=:noise: type of masking (only :noise implemented yet):
    • :noise: randomly distributed single values of the training data will be overwitten with value.

Examples:

julia> dtrn 
 26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}
 
 julia> mtrn = Masquerade(dtrn, 0.5, value=2.0h)
-Masquerade(26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}, 0.5, 2.0, :noise)
source
NNHelferlein.GPUIteratorType
GPUIterator(iterator)

Wraps any iterator and makes it return CuArrays. Element types are preserved except of Float-Types, which are casted to Float32 for performance reasons).

Contsructor:

GPUIterator(iterator; y=:cpu): + iterator: any iterator + y: if :gpu, the labels of the iterator are also converted to CuArray{}. If :cpu, the labels are not converted. For a classifier (labels are integers), keeping labels on the cpu is more efficient. For Regression (labels are Floats), labels on the gpu is recommended.

Deprecation warning:

Use of GPUIterator is deprecated in favour of CUDA.CuIterator, which offeres similar functionality.

source

Tabular data

Tabular data is normally provided in table form (csv, ods) row-wise, i.e. one sample per row. The helper functions can read the tables and generate Knet compatible iterators of minibatches.

NNHelferlein.dataframe_readFunction
dataframe_read(fname; o...)

Read a data table from an CSV-file with one sample per row and return a DataFrame with the data. (ODS-support is removed because of PyCall compatibility issues of the OdsIO package).

All keyword arguments accepted by CSV.File() can be used.

source
NNHelferlein.dataframe_minibatchFunction
dataframe_minibatch(data::DataFrames.DataFrame; size=256, 
+Masquerade(26-element Knet.Train20.Data{Tuple{CuArray{Float32}, Array{UInt8}}}, 0.5, 2.0, :noise)
source
NNHelferlein.GPUIteratorType
GPUIterator(iterator)

Wraps any iterator and makes it return CuArrays. Element types are preserved except of Float-Types, which are casted to Float32 for performance reasons).

Contsructor:

GPUIterator(iterator; y=:cpu): + iterator: any iterator + y: if :gpu, the labels of the iterator are also converted to CuArray{}. If :cpu, the labels are not converted. For a classifier (labels are integers), keeping labels on the cpu is more efficient. For Regression (labels are Floats), labels on the gpu is recommended.

Deprecation warning:

Use of GPUIterator is deprecated in favour of CUDA.CuIterator, which offeres similar functionality.

source

Tabular data

Tabular data is normally provided in table form (csv, ods) row-wise, i.e. one sample per row. The helper functions can read the tables and generate Knet compatible iterators of minibatches.

NNHelferlein.dataframe_readFunction
dataframe_read(fname; o...)

Read a data table from an CSV-file with one sample per row and return a DataFrame with the data. (ODS-support is removed because of PyCall compatibility issues of the OdsIO package).

All keyword arguments accepted by CSV.File() can be used.

source
NNHelferlein.dataframe_minibatchFunction
dataframe_minibatch(data::DataFrames.DataFrame; size=256, 
                     ignore=[], teaching=nothing, 
                     verbose=1, o...)
 
-dataframe_minibatches()

Make Knet-conform minibatches of type Knet.data from a dataframe with one sample per row.

dataframe_minibatches() is an alieas kept for backward compatibility.

Arguments:

  • ignore: defines a list of column names to be ignored
  • teaching=nothing: defines the column name with teaching input. teaching is handled differently, depending on its type: If Int, the teaching input is interpreted as class IDs and directly used for training (this assumes that the values range from 1..n). If type is a String, values are interpreted as class labels and converted to numeric class IDs by calling mk_class_ids(). The list of valid lables and their order can be created by calling mk_class_ids(data.y)[2]. If teaching is a scalar value, regression context is assumed, and the value is used unchanged for training. If teaching is nothing, no teaching input is used and minibatches of x-data only are returned.
  • verbose=1: if > 0, a summary of how the dataframe is used is echoed.
  • other keyword arguments: all keyword arguments accepted by Knet.minibatch() may be used.

Allowed column definitions for ignore and teaching include names (as Strings), column names (as Symbols) or column indices (as Integer values).

source
NNHelferlein.dataframe_splitFunction
function dataframe_split(df::DataFrames.DataFrame;
-                         teaching="y", split=0.8, balanced=true)

Split data, organised row-wise in a DataFrame into train and validation sets.

Arguments:

  • df: data
  • teaching="y": name or index of column with teaching input "y"
  • split=0.8: fraction of data to be used for the first returned subdataframe
  • shuffle=true: shuffle the rows of the dataframe.
  • balanced=true: if true, result datasets will be balanced by oversampling. Returned datasets will be bigger as expected but include the same numbers of samples for each class.
source
NNHelferlein.mk_class_idsFunction
function mk_class_ids(labels)

Take a list with n class labels for n instances and return a list of n class-IDs (of type Int) and an array of lables with the array index of each label corresponds its ID.

Arguments:

  • labels: List of labels (typically Strings)

Result values:

  • array of class-IDs in the same order as the input
  • array of unique class-IDs ordered by their ID.

Examples:

julia> labels = ["blue", "red", "red", "red", "green", "blue", "blue"]
+dataframe_minibatches()

Make Knet-conform minibatches of type Knet.data from a dataframe with one sample per row.

dataframe_minibatches() is an alieas kept for backward compatibility.

Arguments:

  • ignore: defines a list of column names to be ignored
  • teaching=nothing: defines the column name with teaching input. teaching is handled differently, depending on its type: If Int, the teaching input is interpreted as class IDs and directly used for training (this assumes that the values range from 1..n). If type is a String, values are interpreted as class labels and converted to numeric class IDs by calling mk_class_ids(). The list of valid lables and their order can be created by calling mk_class_ids(data.y)[2]. If teaching is a scalar value, regression context is assumed, and the value is used unchanged for training. If teaching is nothing, no teaching input is used and minibatches of x-data only are returned.
  • verbose=1: if > 0, a summary of how the dataframe is used is echoed.
  • other keyword arguments: all keyword arguments accepted by Knet.minibatch() may be used.

Allowed column definitions for ignore and teaching include names (as Strings), column names (as Symbols) or column indices (as Integer values).

source
NNHelferlein.dataframe_splitFunction
function dataframe_split(df::DataFrames.DataFrame;
+                         teaching="y", split=0.8, balanced=true)

Split data, organised row-wise in a DataFrame into train and validation sets.

Arguments:

  • df: data
  • teaching="y": name or index of column with teaching input "y"
  • split=0.8: fraction of data to be used for the first returned subdataframe
  • shuffle=true: shuffle the rows of the dataframe.
  • balanced=true: if true, result datasets will be balanced by oversampling. Returned datasets will be bigger as expected but include the same numbers of samples for each class.
source
NNHelferlein.mk_class_idsFunction
function mk_class_ids(labels)

Take a list with n class labels for n instances and return a list of n class-IDs (of type Int) and an array of lables with the array index of each label corresponds its ID.

Arguments:

  • labels: List of labels (typically Strings)

Result values:

  • array of class-IDs in the same order as the input
  • array of unique class-IDs ordered by their ID.

Examples:

julia> labels = ["blue", "red", "red", "red", "green", "blue", "blue"]
 7-element Array{String,1}:
  "blue"
  "red"
@@ -75,7 +75,7 @@
 3-element Array{String,1}:
  "blue"
  "green"
- "red"
source

Image data

Images as data should be provided in directories with the directory names denoting the class labels. The helpers read from the root of a directory tree in which the first level of sub-dirs tell the class label. All images in the tree under a class label are read as instances of the respective class. The following tree will generate the classes daisy, rose and tulip:

image_dir/
+ "red"
source

Image data

Images as data should be provided in directories with the directory names denoting the class labels. The helpers read from the root of a directory tree in which the first level of sub-dirs tell the class label. All images in the tree under a class label are read as instances of the respective class. The following tree will generate the classes daisy, rose and tulip:

image_dir/
 ├── daisy
 │   ├── 01
 │   │   ├── 01
@@ -100,10 +100,10 @@
     pre_proc
     pre_load
     i_images
-end

Iterable image loader to provide minibatches of images as 4-d-arrays (x,y,rgb,mb).

source
NNHelferlein.mk_image_minibatchFunction
function mk_image_minibatch(dir, batchsize; split=false, at=0.8,
+end

Iterable image loader to provide minibatches of images as 4-d-arrays (x,y,rgb,mb).

source
NNHelferlein.mk_image_minibatchFunction
function mk_image_minibatch(dir, batchsize; split=false, at=0.8,
                             balanced=false, shuffle=true, train=true,
                             pre_load=false,
-                            aug_pipl=nothing, pre_proc=nothing)

Return one or two iterable image-loader-objects that provides minibatches of images. For training each minibatch is a tupel (x,y) with x: 4-d-array with the minibatch of data and y: vector of class IDs as Int.

Arguments:

  • dir: base-directory of the image dataset. The first level of sub-dirs are used as class names.
  • batchsize: size of minibatches

Keyword arguments:

  • split: return two iterators for training and validation
  • at: split fraction (for training; the rest is for validation).
  • balanced: return balanced data (i.e. same number of instances for all classes). Balancing is achieved via oversampling
  • shuffle: if true, shuffle the images everytime the iterator restarts
  • train: if true, minibatches with (x,y) tuples are provided, if false only x (for prediction)
  • aug_pipl: augmentation pipeline for Augmentor.jl. Augmentation is performed before the pre_proc-function is applied
  • pre_proc: function with preprocessing and augmentation algorithms of type x = f(x). In contrast to the augmentation that modifies images, is pre_proc working on Arrays{Float32}.
  • pre_load=false: read all images from disk once when populating the loader (requires loads of memory, but speeds up training).
source
NNHelferlein.image2arrayFunction
function image2array(img)

Take an image and return a 3d-array for RGB and a 2d-array for grayscale images with the colour channels as last dimension.

source
NNHelferlein.array2imageFunction
function array2image(arr)

Take a 3d-array with colour channels as last dimension or a 2d-array and return an array of RGB or of Gray as Image.

source
NNHelferlein.array2RGBFunction
function array2RGB(arr)

Take a 3d-array with colour channels as last dimension or a 2d-array and return always an array of RGB as Image.

source

Text data

NNHelferlein.WordTokenizerType
mutable struct WordTokenizer
+                            aug_pipl=nothing, pre_proc=nothing)

Return one or two iterable image-loader-objects that provides minibatches of images. For training each minibatch is a tupel (x,y) with x: 4-d-array with the minibatch of data and y: vector of class IDs as Int.

Arguments:

  • dir: base-directory of the image dataset. The first level of sub-dirs are used as class names.
  • batchsize: size of minibatches

Keyword arguments:

  • split: return two iterators for training and validation
  • at: split fraction (for training; the rest is for validation).
  • balanced: return balanced data (i.e. same number of instances for all classes). Balancing is achieved via oversampling
  • shuffle: if true, shuffle the images everytime the iterator restarts
  • train: if true, minibatches with (x,y) tuples are provided, if false only x (for prediction)
  • aug_pipl: augmentation pipeline for Augmentor.jl. Augmentation is performed before the pre_proc-function is applied
  • pre_proc: function with preprocessing and augmentation algorithms of type x = f(x). In contrast to the augmentation that modifies images, is pre_proc working on Arrays{Float32}.
  • pre_load=false: read all images from disk once when populating the loader (requires loads of memory, but speeds up training).
source
NNHelferlein.image2arrayFunction
function image2array(img)

Take an image and return a 3d-array for RGB and a 2d-array for grayscale images with the colour channels as last dimension.

source
NNHelferlein.array2imageFunction
function array2image(arr)

Take a 3d-array with colour channels as last dimension or a 2d-array and return an array of RGB or of Gray as Image.

source
NNHelferlein.array2RGBFunction
function array2RGB(arr)

Take a 3d-array with colour channels as last dimension or a 2d-array and return always an array of RGB as Image.

source

Text data

NNHelferlein.WordTokenizerType
mutable struct WordTokenizer
     len
     w2i
     i2w
@@ -176,12 +176,12 @@
 julia> vocab(["They love Julia", "I love Julia"])
 2-element Array{Array{Int64,1},1}:
  [7, 5, 8]
- [6, 5, 8]
source
NNHelferlein.get_tatoeba_corpusFunction
function get_tatoeba_corpus(lang; force=false,
-            url="https://www.manythings.org/anki/")

Download and read a bilingual text corpus from Tatoeba (provided) by ManyThings (https://www.manythings.org). All corpi are English-Language-pairs with different size and quality. Considerable languages include:

  • fra: French-English, 180 000 sentences
  • deu: German-English, 227 000 sentences
  • heb: Hebrew-English, 126 000 sentences
  • por: Portuguese-English, 170 000 sentences
  • tur: Turkish-English, 514 000 sentences

The function returns two lists with corresponding sentences in both languages. Sentences are not processed/normalised/cleaned, but exactly as provided by Tatoeba.

The data is stored in the package directory and only downloaded once.

Arguments:

  • lang: languagecode
  • force=false: if true, the corpus is downloaded even if a data file is already saved.
  • url: base url of ManyThings.
source
NNHelferlein.get_tatoeba_corpusFunction
function get_tatoeba_corpus(lang; force=false,
+            url="https://www.manythings.org/anki/")

Download and read a bilingual text corpus from Tatoeba (provided) by ManyThings (https://www.manythings.org). All corpi are English-Language-pairs with different size and quality. Considerable languages include:

  • fra: French-English, 180 000 sentences
  • deu: German-English, 227 000 sentences
  • heb: Hebrew-English, 126 000 sentences
  • por: Portuguese-English, 170 000 sentences
  • tur: Turkish-English, 514 000 sentences

The function returns two lists with corresponding sentences in both languages. Sentences are not processed/normalised/cleaned, but exactly as provided by Tatoeba.

The data is stored in the package directory and only downloaded once.

Arguments:

  • lang: languagecode
  • force=false: if true, the corpus is downloaded even if a data file is already saved.
  • url: base url of ManyThings.
source
NNHelferlein.sequence_minibatchFunction
function sequence_minibatch(x, [y], batchsize; 
                             pad=NNHelferlein.TOKEN_PAD, 
                             seq2seq=true, pad_y=pad,
                             x_padding=false,
-                            shuffle=true, partial=false)

Return an iterator of type DataLoader with (x,y) sequence minibatches from two lists of sequences.

All sequences within a minibatch in x and y are brought to the same length by padding with the token provided as pad.

The sequences are sorted by length before building minibatches in order to reduce padding (i.e. sequences of similar length are combined to a minibatch). If the same sequence length is needed for all minibatches, the sequences must be truncated or padded before call of sequence_minibatch() (see functions truncate_seqence() and pad_sequence()).

Arguments:

  • x: List of sequences of Int
  • y: List of sequences of Int or list of target values (i.e. teaching input)
  • batchsize: size of minibatches
  • pad=NNHelferlein.PAD_TOKEN,
  • pad_y=x: token, used for padding. The token must be compatible with the type of the sequence elements. If pad_y is omitted, it is set equal to pad_x.
  • seq2seq=true: if true and y is provided, sequence-to-sequence minibatches are created. Otherwise y is treated as scalar teaching input.
  • shuffle=true: The minibatches are shuffled as last step. If false the minibatches with short sequences will be at the beginning of the dataset.
  • partial=false: If true, a partial minibatch will be created if necessaray to include all input data.
  • x_padding=false: if true, pad sequences in x to make minibatches of the demanded size, even if there are not enougth sequences of the same length in x. If false, partial minibatches are built (if partial == true) or remaining sequneces are skipped (if partial == false).
source
NNHelferlein.pad_sequenceFunction
function pad_sequence(s, len; token=NNHelferlein.TOKEN_PAD)

Stretch a sequence to length len by adding the padding token.

source
NNHelferlein.truncate_sequenceFunction
function truncate_sequence(s, len; end_token=nothing)

Truncate a sequence to the length len. If not isnothing(end_token), the last token of the sequence is overwritten by the token.

source
NNHelferlein.clean_sentenceFunction
function clean_sentence(s)

Cleaning a sentence in some simple steps:

  • normalise Unicode
  • remove punctuation
  • remove duplicate spaces
  • strip
source

Training

NNHelferlein.tb_train!Function
function tb_train!(mdl, opti, trn, vld=nothing; epochs=1, split=nothing,
+                            shuffle=true, partial=false)

Return an iterator of type DataLoader with (x,y) sequence minibatches from two lists of sequences.

All sequences within a minibatch in x and y are brought to the same length by padding with the token provided as pad.

The sequences are sorted by length before building minibatches in order to reduce padding (i.e. sequences of similar length are combined to a minibatch). If the same sequence length is needed for all minibatches, the sequences must be truncated or padded before call of sequence_minibatch() (see functions truncate_seqence() and pad_sequence()).

Arguments:

  • x: List of sequences of Int
  • y: List of sequences of Int or list of target values (i.e. teaching input)
  • batchsize: size of minibatches
  • pad=NNHelferlein.PAD_TOKEN,
  • pad_y=x: token, used for padding. The token must be compatible with the type of the sequence elements. If pad_y is omitted, it is set equal to pad_x.
  • seq2seq=true: if true and y is provided, sequence-to-sequence minibatches are created. Otherwise y is treated as scalar teaching input.
  • shuffle=true: The minibatches are shuffled as last step. If false the minibatches with short sequences will be at the beginning of the dataset.
  • partial=false: If true, a partial minibatch will be created if necessaray to include all input data.
  • x_padding=false: if true, pad sequences in x to make minibatches of the demanded size, even if there are not enougth sequences of the same length in x. If false, partial minibatches are built (if partial == true) or remaining sequneces are skipped (if partial == false).
source
NNHelferlein.pad_sequenceFunction
function pad_sequence(s, len; token=NNHelferlein.TOKEN_PAD)

Stretch a sequence to length len by adding the padding token.

source
NNHelferlein.truncate_sequenceFunction
function truncate_sequence(s, len; end_token=nothing)

Truncate a sequence to the length len. If not isnothing(end_token), the last token of the sequence is overwritten by the token.

source
NNHelferlein.clean_sentenceFunction
function clean_sentence(s)

Cleaning a sentence in some simple steps:

  • normalise Unicode
  • remove punctuation
  • remove duplicate spaces
  • strip
source

Training

NNHelferlein.tb_train!Function
function tb_train!(mdl, opti, trn, vld=nothing; epochs=1, split=nothing,
                   lr_decay=nothing, lrd_steps=5, lrd_linear=false,
                   l2=nothing, l1=nothing,
                   eval_size=0.2, eval_freq=1,
@@ -191,21 +191,21 @@
                   tb_dir="logs", tb_name="run",
                   tb_text="""Description of tb_train!() run.""",
                   resume=true, tensorboard=true, return_stats=false,
-                  opti_args...)

Train function with TensorBoard integration. TB logs are written with the TensorBoardLogger.jl package. The model is updated (in-place) and the trained model is returned.

Arguments:

  • mdl: model; i.e. forward-function for the net
  • opti: Knet-stype optimiser type
  • trn: training data; iterator to provide (x,y)-tuples with minibatches
  • vld: validation data; iterator to provide (x,y)-tuples with minibatches. Set to nothing, if not defined.

Keyword arguments:

Optimiser:

  • epochs=1: number of epochs to train
  • resume=true: if true, optimiser parameters (momentum or gradient moving average) from a previous run are used to enable a seemless continuation of the training. Be aware that in a resumeed training, the original optimizer will be used, even if a different one is specified for the continuation.
  • lr_decay=nothing: do a leraning rate decay if not nothing: the value given is the final learning rate after lrd_steps steps of decay (lr_decay may be bigger than lr; in this case the leraning rate is increased). lr_decay is only applied if both start learning rate lr and final learning rate lr_decay are defined explicitly. Example: lr=0.01, lr_decay=0.001 will reduce the lr from 0.01 to 0.001 during the training (by default in 5 steps). lr_decay is applied to l1 and l2 with the same decay rate.
  • lrd_steps=5: number of learning rate decay steps. Default is 5, i.e. modify the lr 4 times during the training (resulting in 5 different learning rates).
  • lrd_linear=false: type of learning rate decay; If false, lr is modified by a constant factor (e.g. 0.9) resulting in an exponential decay. If true, lr is modified by the same step size, i.e. linearly.
  • l1=nothing: L1 regularisation; implemented as weight decay per parameter. If learning-rate decay is used, L1 and L2 are also decayed.
  • l2=nothing: L2 regularisation; implemented as weight decay per parameter
  • opti_args...: optional keyword arguments for the optimiser can be specified (i.e. lr, gamma, ...).

Model evaluation:

  • split=nothing: if no validation data is specified and split is a fraction (between 0.0 and 1.0), the training dataset is splitted at the specified point (e.g.: if split=0.8, 80% of the minibatches are used for training and 20% for validation).
  • eval_size=0.2: fraction of validation data to be used for calculating loss and accuracy for train and validation data during training.
  • eval_freq=1: frequency of evaluation; default=1 means evaluation is calculated after each epoch. With eval_freq=10 eveluation is calculated 10 times per epoch.
  • acc_fun=nothing: function to calculate accuracy. The function must implement the following signature: fun(model; data) where data is an iterator that provides (x,y)-tuples of minibatches. For classification tasks, accuracy from the Knet package is a good choice. For regression a correlation or mean error may be preferred.
  • mb_loss_freq=100: frequency of training loss reporting. default=100 means that 100 loss-values per epoch will be logged to TensorBoard. If mblossfreq is greater then the number of minibatches, loss is logged for each minibatch.
  • checkpoints=nothing: frequency of model checkpoints written to disk. Default is nothing, i.e. no checkpoints are written. To write the model after each epoch with name model use cpepoch=1; to write every second epochs cpepoch=2, etc.
  • cp_dir="checkpoints": directory for checkpoints
  • return_stats=false: if true, a dictionary with losses and accuracies of training and validation data is returned instead of the model.

TensorBoard:

TensorBoard log-directory is created from 3 parts: tb_dir/tb_name/<current date time>.

  • tensorboard=true: if true, TensorBoard logs are written
  • tb_dir="logs": root directory for TensorBoard logs.
  • tb_name="run": name of training run. tb_name will be used as directory name and should not include whitespace
  • tb_text: description to be included in the TensorBoard log as text log.
source

Evaluation and accuracy

NNHelferlein.focal_nllFunction
function focal_nll(scores, labels::AbstractArray{<:Integer}; γ=2.0, dims=1)
-function focal_nll(mdl; data, γ=2.0, dims=1)

Calculate the negative log-likelihood (i.e. cross entropy) with increased weights on weekly classified samples. focal nll for sample j is defined as

\[- (1 - p_{j})^{\gamma} \cdot \ln p_{j} =\]

\[(1 - p_{j})^{\gamma} \cdot nll(p_{j})\]

where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.

The second signature can be used to caclulate the mean focus nll for a dataset of minibatches of (x,y)-tuples.

Arguments:

  • scores: unnormalised scores (i.e. activations of output neurons without applying an activation function), typically of a classifier with one neuron per class
  • labels: ground truth as integer values
  • γ=2.0: The parameter γ controls the strength of the effect: for γ=0, all weights become exactly 1.0; with higher values for γ, focus on mis-classified or weakly classified sample is increased.

dims=1: dimension in which the instances are organised.

source
NNHelferlein.focal_bceFunction
function focal_bce(scores, labels::AbstractArray{<:Integer}; 
-function focal_bce(mdl; data, γ=2.0, dims=1)

Calculate the biray crossentropywith increased weights on weekly classified samples. focal bce for sample j is defined as

\[(1 - p_{j})^{\gamma} \cdot bce(p_{j})\]

where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.

The second signature can be used to caclulate the mean focus bce for a dataset of minibatches of (x,y)-tuples.

For arguments and details, please refer to the documentation of focal_nll.

source
NNHelferlein.predictFunction
function predict(mdl; data, softmax=false)
-function predict(mdl, x; softmax=false )

Return the prediction for minibatches of data. The signature follows the standard call predict(model, data=xxx). The second signature predicts a single Array of data.

Arguments:

  • mdl: executable network model
  • data=iterator: iterator providing minibatches of input data; if the minibatches include y-values (i.e. teaching input), predictions (i.e. index of class with highest value and the y-values will be returned.
  • data: single Array of input data (i.e. input for one minibatch)
  • softmax: if true or if model is of type Classifier the predicted softmax probabilities are returned instead of raw activations.
source
NNHelferlein.predict_top5Function
function predict_top5(mdl; data, top_n=5, classes=nothing)

Run the model mdl for data in minibatches data and print the top 5 predictions as softmax probabilities.

Arguments:

  • top_n: print top n hits
  • classes: optional list of human readable class labels.
source
NNHelferlein.minibatch_evalFunction
function minibatch_eval(mdl, fun, data; o...)

Given an accuracy or loss function fun(p, y) that returns an accuracy meassure for n-dimensional arrays of predictions p and teaching input y (i.e. one minibatch of data), minibatch_eval() applies the fun() to all minibatches supplied by the minibatch iterator data.

Arguments:

  • mdl: model to compute predictions
  • fun: evaluation function for one minibatch that returns the mean of results for all samples of the minibatch
  • data: iterator that supplies a Tuple of (x,y) for each minibatch

o...: all additional keyword arguments are forwarded to fun().

source
NNHelferlein.squared_error_accFunction
function squared_error_acc(mdl; data)

Return the mean squared error between the predictions of the model mdl and the corresponding teaching input by providung the standard signature fun(model, data=iterator).

Arguments

  • mdl: model with the signature mdl(x) to generate predictions for one minibatch (i.e. array) of data.
  • data: iterator, providing (x,y)-tuples of training or validation data.
source
NNHelferlein.abs_error_accFunction
function abs_error_acc(mdl; data)

Return the mean absolute error between the predictions of the model mdl and the corresponding teaching input by providung the standard signature fun(model, data=iterator).

Arguments

  • mdl: model with the signature mdl(x) to generate predictions for one minibatch (i.e. array) of data.
  • data: iterator, providing (x,y)-tuples of training or validation data.
source
NNHelferlein.hamming_distFunction
function hamming_dist(p, t; accuracy=false, 
+                  opti_args...)

Train function with TensorBoard integration. TB logs are written with the TensorBoardLogger.jl package. The model is updated (in-place) and the trained model is returned.

Arguments:

  • mdl: model; i.e. forward-function for the net
  • opti: Knet-stype optimiser type
  • trn: training data; iterator to provide (x,y)-tuples with minibatches
  • vld: validation data; iterator to provide (x,y)-tuples with minibatches. Set to nothing, if not defined.

Keyword arguments:

Optimiser:

  • epochs=1: number of epochs to train
  • resume=true: if true, optimiser parameters (momentum or gradient moving average) from a previous run are used to enable a seemless continuation of the training. Be aware that in a resumeed training, the original optimizer will be used, even if a different one is specified for the continuation.
  • lr_decay=nothing: do a leraning rate decay if not nothing: the value given is the final learning rate after lrd_steps steps of decay (lr_decay may be bigger than lr; in this case the leraning rate is increased). lr_decay is only applied if both start learning rate lr and final learning rate lr_decay are defined explicitly. Example: lr=0.01, lr_decay=0.001 will reduce the lr from 0.01 to 0.001 during the training (by default in 5 steps). lr_decay is applied to l1 and l2 with the same decay rate.
  • lrd_steps=5: number of learning rate decay steps. Default is 5, i.e. modify the lr 4 times during the training (resulting in 5 different learning rates).
  • lrd_linear=false: type of learning rate decay; If false, lr is modified by a constant factor (e.g. 0.9) resulting in an exponential decay. If true, lr is modified by the same step size, i.e. linearly.
  • l1=nothing: L1 regularisation; implemented as weight decay per parameter. If learning-rate decay is used, L1 and L2 are also decayed.
  • l2=nothing: L2 regularisation; implemented as weight decay per parameter
  • opti_args...: optional keyword arguments for the optimiser can be specified (i.e. lr, gamma, ...).

Model evaluation:

  • split=nothing: if no validation data is specified and split is a fraction (between 0.0 and 1.0), the training dataset is splitted at the specified point (e.g.: if split=0.8, 80% of the minibatches are used for training and 20% for validation).
  • eval_size=0.2: fraction of validation data to be used for calculating loss and accuracy for train and validation data during training.
  • eval_freq=1: frequency of evaluation; default=1 means evaluation is calculated after each epoch. With eval_freq=10 eveluation is calculated 10 times per epoch.
  • acc_fun=nothing: function to calculate accuracy. The function must implement the following signature: fun(model; data) where data is an iterator that provides (x,y)-tuples of minibatches. For classification tasks, accuracy from the Knet package is a good choice. For regression a correlation or mean error may be preferred.
  • mb_loss_freq=100: frequency of training loss reporting. default=100 means that 100 loss-values per epoch will be logged to TensorBoard. If mblossfreq is greater then the number of minibatches, loss is logged for each minibatch.
  • checkpoints=nothing: frequency of model checkpoints written to disk. Default is nothing, i.e. no checkpoints are written. To write the model after each epoch with name model use cpepoch=1; to write every second epochs cpepoch=2, etc.
  • cp_dir="checkpoints": directory for checkpoints
  • return_stats=false: if true, a dictionary with losses and accuracies of training and validation data is returned instead of the model.

TensorBoard:

TensorBoard log-directory is created from 3 parts: tb_dir/tb_name/<current date time>.

  • tensorboard=true: if true, TensorBoard logs are written
  • tb_dir="logs": root directory for TensorBoard logs.
  • tb_name="run": name of training run. tb_name will be used as directory name and should not include whitespace
  • tb_text: description to be included in the TensorBoard log as text log.
source

Evaluation and accuracy

NNHelferlein.focal_nllFunction
function focal_nll(scores, labels::AbstractArray{<:Integer}; γ=2.0, dims=1)
+function focal_nll(mdl; data, γ=2.0, dims=1)

Calculate the negative log-likelihood (i.e. cross entropy) with increased weights on weekly classified samples. focal nll for sample j is defined as

\[- (1 - p_{j})^{\gamma} \cdot \ln p_{j} =\]

\[(1 - p_{j})^{\gamma} \cdot nll(p_{j})\]

where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.

The second signature can be used to caclulate the mean focus nll for a dataset of minibatches of (x,y)-tuples.

Arguments:

  • scores: unnormalised scores (i.e. activations of output neurons without applying an activation function), typically of a classifier with one neuron per class
  • labels: ground truth as integer values
  • γ=2.0: The parameter γ controls the strength of the effect: for γ=0, all weights become exactly 1.0; with higher values for γ, focus on mis-classified or weakly classified sample is increased.

dims=1: dimension in which the instances are organised.

source
NNHelferlein.focal_bceFunction
function focal_bce(scores, labels::AbstractArray{<:Integer}; 
+function focal_bce(mdl; data, γ=2.0, dims=1)

Calculate the biray crossentropywith increased weights on weekly classified samples. focal bce for sample j is defined as

\[(1 - p_{j})^{\gamma} \cdot bce(p_{j})\]

where p is the softmax-scaled likelyhood for the true class of the j-th sample. The sample weight is high, if predicted p << 1.

The second signature can be used to caclulate the mean focus bce for a dataset of minibatches of (x,y)-tuples.

For arguments and details, please refer to the documentation of focal_nll.

source
NNHelferlein.predictFunction
function predict(mdl; data, softmax=false)
+function predict(mdl, x; softmax=false )

Return the prediction for minibatches of data. The signature follows the standard call predict(model, data=xxx). The second signature predicts a single Array of data.

Arguments:

  • mdl: executable network model
  • data=iterator: iterator providing minibatches of input data; if the minibatches include y-values (i.e. teaching input), predictions (i.e. index of class with highest value and the y-values will be returned.
  • data: single Array of input data (i.e. input for one minibatch)
  • softmax: if true or if model is of type Classifier the predicted softmax probabilities are returned instead of raw activations.
source
NNHelferlein.predict_top5Function
function predict_top5(mdl; data, top_n=5, classes=nothing)

Run the model mdl for data in minibatches data and print the top 5 predictions as softmax probabilities.

Arguments:

  • top_n: print top n hits
  • classes: optional list of human readable class labels.
source
NNHelferlein.minibatch_evalFunction
function minibatch_eval(mdl, fun, data; o...)

Given an accuracy or loss function fun(p, y) that returns an accuracy meassure for n-dimensional arrays of predictions p and teaching input y (i.e. one minibatch of data), minibatch_eval() applies the fun() to all minibatches supplied by the minibatch iterator data.

Arguments:

  • mdl: model to compute predictions
  • fun: evaluation function for one minibatch that returns the mean of results for all samples of the minibatch
  • data: iterator that supplies a Tuple of (x,y) for each minibatch

o...: all additional keyword arguments are forwarded to fun().

source
NNHelferlein.squared_error_accFunction
function squared_error_acc(mdl; data)

Return the mean squared error between the predictions of the model mdl and the corresponding teaching input by providung the standard signature fun(model, data=iterator).

Arguments

  • mdl: model with the signature mdl(x) to generate predictions for one minibatch (i.e. array) of data.
  • data: iterator, providing (x,y)-tuples of training or validation data.
source
NNHelferlein.abs_error_accFunction
function abs_error_acc(mdl; data)

Return the mean absolute error between the predictions of the model mdl and the corresponding teaching input by providung the standard signature fun(model, data=iterator).

Arguments

  • mdl: model with the signature mdl(x) to generate predictions for one minibatch (i.e. array) of data.
  • data: iterator, providing (x,y)-tuples of training or validation data.
source
NNHelferlein.hamming_distFunction
function hamming_dist(p, t; accuracy=false, 
                             ignore_ctls=false, vocab=nothing, 
                             start=nothing, stop=nothing, pad=nothing, unk=nothing)
 
 
 function hamming_acc(p, t; o...)
 
-function hamming_acc(mdl; data=data, o...)

Return the Hamming distance between two sequences or two minibatches of sequences. Predicted sequences p and teaching input sequences t may be of different length but the number of sequences in the minibatch must be the same.

Arguments:

  • p, t: n-dimensional arrays of type Int with predictions and teaching input for a minibatch of sequences. Shape of the arrays must be identical except of the first dimension (i.e. the sequence length) that may differ between p and t.
  • accuracy=false: if false, the mean Hamming distance in the minibatch is returned (i.e. the average number of differences in the sequences). If true, the accuracy is returned for all not padded positions in a range (0.0 - 1.0).
  • ignore_ctls=false: a vocab is used to replace all '<start>, <end>, <unknwon>, <pad>' tokens by <pad>. If true, padding and other control tokens are treated as normal codes and are not ignored.
  • vocab=nothing: target laguage vocabulary of type NNHelferlein.WordTokenizer. If defined, the padding token of vocab is used to mask all control tokens in the sequences (i.e. '<start>, <end>, <unknwon>, <pad>').
  • start, stop, pad, unk: may be used to define individual control tokens. default is nothing.

Details:

The function hamming_acc() is a shortcut to return the accuracy instead of the distance. The signature hamming_acc(mdl; data=data; o...) is for compatibility with acc functions called by train.

source
NNHelferlein.peak_finder_accFunction
function peak_finder_acc(p, t; ret=:f1, verbose=0, 
+function hamming_acc(mdl; data=data, o...)

Return the Hamming distance between two sequences or two minibatches of sequences. Predicted sequences p and teaching input sequences t may be of different length but the number of sequences in the minibatch must be the same.

Arguments:

  • p, t: n-dimensional arrays of type Int with predictions and teaching input for a minibatch of sequences. Shape of the arrays must be identical except of the first dimension (i.e. the sequence length) that may differ between p and t.
  • accuracy=false: if false, the mean Hamming distance in the minibatch is returned (i.e. the average number of differences in the sequences). If true, the accuracy is returned for all not padded positions in a range (0.0 - 1.0).
  • ignore_ctls=false: a vocab is used to replace all '<start>, <end>, <unknwon>, <pad>' tokens by <pad>. If true, padding and other control tokens are treated as normal codes and are not ignored.
  • vocab=nothing: target laguage vocabulary of type NNHelferlein.WordTokenizer. If defined, the padding token of vocab is used to mask all control tokens in the sequences (i.e. '<start>, <end>, <unknwon>, <pad>').
  • start, stop, pad, unk: may be used to define individual control tokens. default is nothing.

Details:

The function hamming_acc() is a shortcut to return the accuracy instead of the distance. The signature hamming_acc(mdl; data=data; o...) is for compatibility with acc functions called by train.

source
NNHelferlein.peak_finder_accFunction
function peak_finder_acc(p, t; ret=:f1, verbose=0, 
                          tolerance=1, limit=0.5
 
-function peak_finder_acc(mdl; data=data, o...)

Calculate an accuracy-like measure for data series consisting mainly of zeros and rare peaks. The function counts the number of peaks in y detected by p (true positives), peaks not detected (false negatives) and the number of peaks in p not present in y (false positives).

It is assumed that peaks in y are marked by a single value higher as the limit (typically 1.0). Peaks in p may be broader; and are defined as local maxima with a value above the limit. If the tolerance ist set to > 0, it may happen that the peaks at the first or last step are not evaluated (because evaluation stops at end-tolerance).

If requested, f1, G-mean and intersection over union are calulated from the raw values .

Arguments:

  • p, t: Predictions p and teaching input t (i.e. y) are mini-batches of 1-d series of data. The sequence must be in the 1st dimension (column). All other dims are treated as separate windows of length size(p/t,1).
  • ret: return value as Symbol; one of :peaks, :recall, :precision, :miss_rate, :f1, :g_mean, :iou or :all. If :all a named tuple is returned.
  • verbose=0: if 0, no additional output is generated; if 1, composite measures are printed to stdout; if 2, all raw counts are printed.
  • tolerance=1: peak finder tolerance: The peak is defined as correct if it is detected within the tolerance.
  • limit=0.5: Only maxima with values above the limit are considered.
source
NNHelferlein.confusion_matrixFunction
function confusion_matrix(mdl; data, labels=nothing, pretty_print=true, accuracy=true)
-function confusion_matrix(y, p; labels=nothing, pretty_print=true, accuracy=true)

Compute and display the confusion matrix of (x,y)-minibatches. Predictions are calculated with model mdl for which a signature mdl(x) must exist.

The second signature generates the confusion matrix from the 2 vectors ground truth y and predictions p.

The function is an interface to the function confusmat provided by the package MLBase.

Arguments:

  • mdl: mdl with signature mdl(x) to generate predictions
  • data: minibatches of (x,y)-tuples
  • pretty_print=true: if true, the matrix will pe displayed to stdout
  • labels=nothing: a vecor of human readable labels can be provided
  • accuracy=true: if true, accuracy, precisiomn and recall is printed for all classes.
source

ImageNet tools

NNHelferlein.preproc_imagenet_vggFunction
function preproc_imagenet_vgg(img)
+function peak_finder_acc(mdl; data=data, o...)

Calculate an accuracy-like measure for data series consisting mainly of zeros and rare peaks. The function counts the number of peaks in y detected by p (true positives), peaks not detected (false negatives) and the number of peaks in p not present in y (false positives).

It is assumed that peaks in y are marked by a single value higher as the limit (typically 1.0). Peaks in p may be broader; and are defined as local maxima with a value above the limit. If the tolerance ist set to > 0, it may happen that the peaks at the first or last step are not evaluated (because evaluation stops at end-tolerance).

If requested, f1, G-mean and intersection over union are calulated from the raw values .

Arguments:

  • p, t: Predictions p and teaching input t (i.e. y) are mini-batches of 1-d series of data. The sequence must be in the 1st dimension (column). All other dims are treated as separate windows of length size(p/t,1).
  • ret: return value as Symbol; one of :peaks, :recall, :precision, :miss_rate, :f1, :g_mean, :iou or :all. If :all a named tuple is returned.
  • verbose=0: if 0, no additional output is generated; if 1, composite measures are printed to stdout; if 2, all raw counts are printed.
  • tolerance=1: peak finder tolerance: The peak is defined as correct if it is detected within the tolerance.
  • limit=0.5: Only maxima with values above the limit are considered.
source
NNHelferlein.confusion_matrixFunction
function confusion_matrix(mdl; data, labels=nothing, pretty_print=true, accuracy=true)
+function confusion_matrix(y, p; labels=nothing, pretty_print=true, accuracy=true)

Compute and display the confusion matrix of (x,y)-minibatches. Predictions are calculated with model mdl for which a signature mdl(x) must exist.

The second signature generates the confusion matrix from the 2 vectors ground truth y and predictions p.

The function is an interface to the function confusmat provided by the package MLBase.

Arguments:

  • mdl: mdl with signature mdl(x) to generate predictions
  • data: minibatches of (x,y)-tuples
  • pretty_print=true: if true, the matrix will pe displayed to stdout
  • labels=nothing: a vecor of human readable labels can be provided
  • accuracy=true: if true, accuracy, precisiomn and recall is printed for all classes.
source

ImageNet tools

NNHelferlein.preproc_imagenet_vggFunction
function preproc_imagenet_vgg(img)
 function preproc_imagenet_resnetv2(img)

Image preprocessing for pre-trained ImageNet examples. Preprocessing includes

  • bring RGB colour values into a range 0-255
  • standardise of colour values by substracting mean colour values (103.939, 116.779, 123.68) from RGB
  • changing colour channel sequence from RGB to BGR
  • normalising or scaling colour values.

Resize is not done, because this may be part of the augmentation pipeline.

Details

Unfortunately image preprocessing is not consistent between all pretrained Tenrflow/Keras applications. As a result, different preprocessing functions must be used for different pretrained applications:

  • VGG16, VGG19: preproc_imagenet_vgg (colour space: BGR, values: 0 - 255, centered according to the imagenet training set)
  • RESNET: preproc_imagenet_resnet (identical to vgg)
  • RESNET V2: preproc_imagenet_resnetv2 (colour space: RGB, values: -1.0 - 1.0, scaled for each sample individually)

Examples:

The function can be used with the image loader; for prediction with a trained model as:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
 images = mk_image_minibatch("./example_pics", 16;
                     shuffle=false, train=false,
@@ -219,8 +219,8 @@
                     split=true, at=0.8, balanced=false,
                     shuffle=true, train=true,
                     aug_pipl=pipl,
-                    pre_proc=preproc_imagenet_vgg)
source

Other utils

Layers and helpers for transformers

NNHelferlein.PositionalEncodingType
struct PositionalEncoding <: AbstractLayer

Positional encoding layer. Only sincos-style (according to Vaswani, et al., NIPS 2017) is implemented.

The layer takes an array of any number of dimensions (>=2), calculates the Vaswani-2017-style positional encoding and adds the encoding to each plane of the array.

source
NNHelferlein.positional_encoding_sincosFunction
function positional_encoding_sincos(n_embed, n_seq)

Calculate and return a matrix of size [n_embed, n_seq] of positional encoding values following the sin and cos style in the paper Vaswani, A. et al.; Attention Is All You Need; 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017.

source
NNHelferlein.mk_padding_maskFunction
function mk_padding_mask(x; pad=TOKEN_PAD, add_dims=false)

Make a padding mask; i.e. return an Array of type KnetArray{Float32} (or Array{Float32}) similar to x but with two additional dimensions of size 1 in the middle (this will represent the 2nd seq_len and the number of heads) in multi-head attention and the value 1.0 at each position where x is pad and 0.0 otherwise.

The function can be used for creating padding masks for attention mechanisms.

Arguments:

  • x: Array of sequences (typically a matrix with ncols sequences of length nrows)
  • pad: value for the token to be masked
  • add_dims: if true, 2 additional dimensions are inserted to return a 4-D-array as needed for transformer architectures. Otherwise the size of the returned array is similar to x.
source
NNHelferlein.mk_peek_ahead_maskFunction
function mk_peek_ahead_mask(x; dim=1)
-function mk_peek_ahead_mask(n_seq)

Return a matrix of size [n_seq, n_seq] filled with 1.0 and the uppper triangle set to 0.0. Type is CuArray{Float32} in GPU context, Array{Float32} otherwise. The matrix can be used as peek-ahead mask in transformers.

dim=1 specifies the dimension in which the sequence length is represented. For un-embedded data this is normally 1, i.e. the shape of x is [nseq, nmb]. After embedding the shape probably is [depth, nseq, nmb].

source
NNHelferlein.dot_prod_attnFunction
function dot_prod_attn(q, k, v; mask=nothing)

Generic scaled dot product attention following the paper of Vaswani et al., (2017), Attention Is All You Need.

Arguments:

  • q: query of size [depth, n_seq_q, ...]
  • k: key of size [depth, n_seq_v, ...]
  • v: value of size [depth, n_seq_v, ...]
  • mask: mask for attention factors may have different shapes but must be broadcastable for addition to the scores tensor (which as the same size as alpha [n_seq_v, n_seq_q, ...]). In transformer context typical masks are one of: padding mask of size [n_seq_v, ...] or a peek-ahead mask of size [n_seq_v, n_seq_v] (which is only possible in case of self-attention when all sequence lengths are identical).

q, k, v must have matching leading dimensions (i.e. same depth or embedding). k and v must have the same sequence length.

Return values:

  • c: context as alpha-weighted sum of values with size [depth, nseqv, ...]
  • alpha: attention factors of size [nseqv, nseqq, ...]
source
NNHelferlein.MultiHeadAttnType
struct MultiHeadAttn <: AbstractLayer

Multi-headed attention layer, designed following the Vaswani, 2017 paper.

Constructor:

MultiHeadAttn(depth, n_heads)
  • depth: Embedding depth
  • n_heads: number of heads for the attention.

Signature:

function(mha::MultiHeadAttn)(q, k, v; mask=nothing)

q, k, v are 3-dimensional tensors of the same size ([depth, seqlen, nminibatch]) and the optional mask must be of size [seqlen, nminibatch] and mark masked positions with 1.0.

source
NNHelferlein.separate_headsFunction
function separate_heads(x, n)

Helper function for multi-headed attention mechanisms: an additional second dimension is added to a tensor of minibatches by splitting the first (i.e. depth).

source

Utils for array manipulation

NNHelferlein.crop_arrayFunction
function crop_array(x, crop_sizes)

Crop a n-dimensional array to the given size. Cropping is always centered (i.e. a margin is removed).

Arguments:

  • x: n-dim AbstractArray
  • crop_sizes: Tuple of target sizes to which the array is cropped. Allowed values are Int or :. If crop_sizes defines less dims as x has, the remaining dims will not be cropped (assuming :). If a demanded crop size is bigger as the actual size of x, it is ignored.
source
NNHelferlein.blowup_arrayFunction

function blowup_array(x, n)

Blow up an array x with an additional dimension and repeat the content of the array n times.

Arguments:

  • x: Array of any dimension
  • n: number of repeats. ´n=1´ will return an

array with an additional dimension of size 1.

Examples:

julia> x = [1,2,3,4]; blowup_array(x, 3)
+                    pre_proc=preproc_imagenet_vgg)
source

Other utils

Layers and helpers for transformers

NNHelferlein.PositionalEncodingType
struct PositionalEncoding <: AbstractLayer

Positional encoding layer. Only sincos-style (according to Vaswani, et al., NIPS 2017) is implemented.

The layer takes an array of any number of dimensions (>=2), calculates the Vaswani-2017-style positional encoding and adds the encoding to each plane of the array.

source
NNHelferlein.positional_encoding_sincosFunction
function positional_encoding_sincos(n_embed, n_seq)

Calculate and return a matrix of size [n_embed, n_seq] of positional encoding values following the sin and cos style in the paper Vaswani, A. et al.; Attention Is All You Need; 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017.

source
NNHelferlein.mk_padding_maskFunction
function mk_padding_mask(x; pad=TOKEN_PAD, add_dims=false)

Make a padding mask; i.e. return an Array of type KnetArray{Float32} (or Array{Float32}) similar to x but with two additional dimensions of size 1 in the middle (this will represent the 2nd seq_len and the number of heads) in multi-head attention and the value 1.0 at each position where x is pad and 0.0 otherwise.

The function can be used for creating padding masks for attention mechanisms.

Arguments:

  • x: Array of sequences (typically a matrix with ncols sequences of length nrows)
  • pad: value for the token to be masked
  • add_dims: if true, 2 additional dimensions are inserted to return a 4-D-array as needed for transformer architectures. Otherwise the size of the returned array is similar to x.
source
NNHelferlein.mk_peek_ahead_maskFunction
function mk_peek_ahead_mask(x; dim=1)
+function mk_peek_ahead_mask(n_seq)

Return a matrix of size [n_seq, n_seq] filled with 1.0 and the uppper triangle set to 0.0. Type is CuArray{Float32} in GPU context, Array{Float32} otherwise. The matrix can be used as peek-ahead mask in transformers.

dim=1 specifies the dimension in which the sequence length is represented. For un-embedded data this is normally 1, i.e. the shape of x is [nseq, nmb]. After embedding the shape probably is [depth, nseq, nmb].

source
NNHelferlein.dot_prod_attnFunction
function dot_prod_attn(q, k, v; mask=nothing)

Generic scaled dot product attention following the paper of Vaswani et al., (2017), Attention Is All You Need.

Arguments:

  • q: query of size [depth, n_seq_q, ...]
  • k: key of size [depth, n_seq_v, ...]
  • v: value of size [depth, n_seq_v, ...]
  • mask: mask for attention factors may have different shapes but must be broadcastable for addition to the scores tensor (which as the same size as alpha [n_seq_v, n_seq_q, ...]). In transformer context typical masks are one of: padding mask of size [n_seq_v, ...] or a peek-ahead mask of size [n_seq_v, n_seq_v] (which is only possible in case of self-attention when all sequence lengths are identical).

q, k, v must have matching leading dimensions (i.e. same depth or embedding). k and v must have the same sequence length.

Return values:

  • c: context as alpha-weighted sum of values with size [depth, nseqv, ...]
  • alpha: attention factors of size [nseqv, nseqq, ...]
source
NNHelferlein.MultiHeadAttnType
struct MultiHeadAttn <: AbstractLayer

Multi-headed attention layer, designed following the Vaswani, 2017 paper.

Constructor:

MultiHeadAttn(depth, n_heads)
  • depth: Embedding depth
  • n_heads: number of heads for the attention.

Signature:

function(mha::MultiHeadAttn)(q, k, v; mask=nothing)

q, k, v are 3-dimensional tensors of the same size ([depth, seqlen, nminibatch]) and the optional mask must be of size [seqlen, nminibatch] and mark masked positions with 1.0.

source
NNHelferlein.separate_headsFunction
function separate_heads(x, n)

Helper function for multi-headed attention mechanisms: an additional second dimension is added to a tensor of minibatches by splitting the first (i.e. depth).

source

Utils for array manipulation

NNHelferlein.crop_arrayFunction
function crop_array(x, crop_sizes)

Crop a n-dimensional array to the given size. Cropping is always centered (i.e. a margin is removed).

Arguments:

  • x: n-dim AbstractArray
  • crop_sizes: Tuple of target sizes to which the array is cropped. Allowed values are Int or :. If crop_sizes defines less dims as x has, the remaining dims will not be cropped (assuming :). If a demanded crop size is bigger as the actual size of x, it is ignored.
source
NNHelferlein.blowup_arrayFunction

function blowup_array(x, n)

Blow up an array x with an additional dimension and repeat the content of the array n times.

Arguments:

  • x: Array of any dimension
  • n: number of repeats. ´n=1´ will return an

array with an additional dimension of size 1.

Examples:

julia> x = [1,2,3,4]; blowup_array(x, 3)
 4×3 Array{Int64,2}:
  1  1  1
  2  2  2
@@ -239,7 +239,7 @@
 
 [:, :, 3] =
  1  2
- 3  4
source
NNHelferlein.recycle_arrayFunction

function recycle_array(x, n; dims=dims(x))

Recycle an array x along the specified dimension (default the last dimension) and repeat the content of the array n times. The number of dims stays unchanged, but the array values are repeated n times.

Arguments:

  • x: Array of any dimension
  • n: number of repeats. ´n=1´ will return an unchanged array
  • dims: dimension to be repeated.

Examples:

julia> recycle_array([1,2],3)
+ 3  4
source
NNHelferlein.recycle_arrayFunction

function recycle_array(x, n; dims=dims(x))

Recycle an array x along the specified dimension (default the last dimension) and repeat the content of the array n times. The number of dims stays unchanged, but the array values are repeated n times.

Arguments:

  • x: Array of any dimension
  • n: number of repeats. ´n=1´ will return an unchanged array
  • dims: dimension to be repeated.

Examples:

julia> recycle_array([1,2],3)
 6-element Array{Int64,1}:
  1
  2
@@ -262,7 +262,7 @@
 3x3 Array{Int64,2}:
  1 2 3
  1 2 3
- 1 2 3
source
NNHelferlein.de_embedFunction
function de_embed(x; remove_dim=false)

Replace the maximum of the first dimension of an n-dimensional array by its index (aka argmax()). If remove_dim is true, the result has the first dimension removed; otherwise the returned array has the first dimension with size 1 (default).

Examples:

> x = [1 1 1
+ 1 2 3
source
NNHelferlein.de_embedFunction
function de_embed(x; remove_dim=false)

Replace the maximum of the first dimension of an n-dimensional array by its index (aka argmax()). If remove_dim is true, the result has the first dimension removed; otherwise the returned array has the first dimension with size 1 (default).

Examples:

> x = [1 1 1
        2 1 1
        1 2 1
        1 1 2]
@@ -274,19 +274,19 @@
 3-element Vector{Int64}:
  2
  3
- 4
source

Utils for fixing types in GPU context

NNHelferlein.init0Function
function init0(siz...)

Initialise a vector or array of size siz with zeros. If a GPU is detected type of the returned value is KnetArray{Float32}, otherwise Array{Float32}.

Examples:

julia> init0(2,10)
+ 4
source

Utils for fixing types in GPU context

NNHelferlein.init0Function
function init0(siz...)

Initialise a vector or array of size siz with zeros. If a GPU is detected type of the returned value is KnetArray{Float32}, otherwise Array{Float32}.

Examples:

julia> init0(2,10)
 2×10 Array{Float32,2}:
  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 
  julia> init0(0,10)
- 0×10 Array{Float32,2}
source
NNHelferlein.convert2CuArrayFunction
function convert2CuArray(x, innerType=Float32)
 function convert2KnetArray(x, innerType=Float32)
-function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source
NNHelferlein.convert2KnetArrayFunction
function convert2CuArray(x, innerType=Float32)
+function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source
NNHelferlein.convert2KnetArrayFunction
function convert2CuArray(x, innerType=Float32)
 function convert2KnetArray(x, innerType=Float32)
-function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source
NNHelferlein.ifgpuFunction
function convert2CuArray(x, innerType=Float32)
+function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source
NNHelferlein.ifgpuFunction
function convert2CuArray(x, innerType=Float32)
 function convert2KnetArray(x, innerType=Float32)
-function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source
NNHelferlein.emptyCuArrayFunction
function emptyCuArray(size...=(0,0);innerType=Float32)
+function ifgpu(x, innerType=Float32)

Convert an array x to a CuArray{Float32} or whatever specified as innerType only in GPU context (if CUDA.functional()) or to an Array{Float32} otherwise. By converting, the data is copied to the GPU.

convert2KnetArray() is kept as an alias for backward compatibility.

ifgpu() is an alias/shortcut to convert2KnetArray().

source
NNHelferlein.emptyCuArrayFunction
function emptyCuArray(size...=(0,0);innerType=Float32)
 function emptyKnetArray(size...=(0,0);innerType=Float32)

Return an empty CuArray with the specified dimensions. The array may be empty (i.e. one dimension 0) or elements will be undefined.

By default an empty matrix is returned.

Examples:

>>> emptyKnetArray(0,0)
 0×0 Knet.KnetArrays.KnetMatrix{Float32}
 
@@ -294,7 +294,7 @@
 0×0 Knet.KnetArrays.KnetMatrix{Float32}
 
 >>> emptyKnetArray(0)
-0-element Knet.KnetArrays.KnetVector{Float32}
source

Utils for Bioinformatics

NNHelferlein.aminoacid_tokenizerFunction
aminoacid_tokenizer(sec; ignore_unknown=true)

Tokenize a protein sequence into amino acids using the following table:

    Amino acid | Token | Description
+0-element Knet.KnetArrays.KnetVector{Float32}
source

Utils for Bioinformatics

NNHelferlein.aminoacid_tokenizerFunction
aminoacid_tokenizer(sec; ignore_unknown=true)

Tokenize a protein sequence into amino acids using the following table:

    Amino acid | Token | Description
     --------------------------------
     C          | 1     | Cysteine
     S          | 2     | Serine
@@ -322,10 +322,10 @@
     J          | 23    | Leucine or Isoleucine
     U          | 24    | Selenocysteine
     X          | 25    | Unknown amino acid
-    .          | 26    | padding token

Arguments:

  • sec: A string containing the protein sequence in uppercase or lowercase. All other letters or symbols will be converted to the unknwon token.
  • ignore_unknown: If true, unkown amino acids (i.e. "X") will be converted to the padding token. If false, the embedding for "X" will be trained as for all other amino acids.
source
NNHelferlein.embed_blosum62Function
embed_blosum62(x)

Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer function. x can be any AbstractArray of Int and a dimension of size 21 will be added as the first dimension.

source
NNHelferlein.embed_vhse8Function
embed_vhse8(x)

Embed a protein sequence into a 8-dimensional vector using the VHSE8 amino acid embedding scheme. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer function. x can be any AbstractArray of Int and a dimension of size 21 will be added as the first dimension.

source
NNHelferlein.EmbedAminoAcidsType
EmbedAminoAcids <: AbstractLayer

Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix or as a 8-dimensional vector using the VHSE8 parameters. Aminoacids must be encoded acording to NNHelferlein's aminoacid tokenizer function.

Layer input a is a n-dimensional array of an Integer type. Output is a (n+1)-dimensional array of Float32 type with a first (added) dimension of size 21 or 8.

Constructor:

  • EmbedAminoAcids(embedding::Symbol=:blosum62):
    • embedding=:blosum62: Either :blosum62 or :vhse8 to select the embedding scheme.
source

Saving, loading and inspection of models

NNHelferlein.save_networkFunction
save_network(fname, mdl)

Save a model as jld2-file.

Arguments:

  • fname: filename; if the name does not end with the extension .jld2, it will be added.
  • mdl: network model to be saved. The model will be copied to a cpu-based model via copy_network(mdl, to=:cpu) before saving, to remove hardware dependencies of parameters on the gpu.
source
NNHelferlein.load_networkFunction
load_network(fname; to=:gpu)

Load a model from a jld2-file.

Arguments:

  • fname: filename; if the name does not end with the extension .jld2, it will be added.
  • to=:gpu: by default, parameters are loaded as CuArrays, if a functional gpu is detected. If to=:cpu is specified parameters are loaded as cpu-arrays.
source
NNHelferlein.copy_networkFunction
copy_network(mdl::AbstractNN; to=:gpu)

Returns a copy of a Helferlein model. cave: the copy is generated by Adapt.adapt() and no deep copy!

Arguments:

  • mdl: Network model of type AbstractNN.
  • to=:gpu: by default all parameters of the copy are CuArrays for GPU usage. If to=:cpu is specified, parameters are Arrays and the model will be processed in the cpu.
source
Base.summaryFunction
function summary(mdl)

Print a network summary of any model of Type AbstractNN, AbstractChain or AbstractLayer.

source

Datasets

NNHelferlein.dataset_mit_nsrFunction
function dataset_mit_nsr(records=nothing; force=false)

Retrieve the Physionet ECG data set: "MIT-BIH Normal Sinus Rhythm Database". If necessary the data is downloaded from Zenodo (and stored in the NNHelferlein data directory, DOI).

All 18 recordings are returned as a list of DataFrames.

ECGs from the MIT-NSR database with some modifications to make them more suitable as playground data set for machine learning.

  • all 18 ECGs are trimmed to approx. 50000 heart beats from a region without recording errors
  • scaled to a range -1 to 1 (non-linear/tanh)
  • heart beats annotation as time series with value 1.0 at the point of the annotated beat and 0.0 for all other times
  • additional heart beat column smoothed by applying a gaussian filter
  • provided as csv with columns "time in sec", "channel 1", "channel 2", "beat" and "smooth".

Arguments:

  • force=false: if true the download will be forced and local data will be overwitten.
  • records: list of records names to be downloaded.

Examples:

nsr_16265 = dataset_mit_nsr("16265")
+    .          | 26    | padding token

Arguments:

  • sec: A string containing the protein sequence in uppercase or lowercase. All other letters or symbols will be converted to the unknwon token.
  • ignore_unknown: If true, unkown amino acids (i.e. "X") will be converted to the padding token. If false, the embedding for "X" will be trained as for all other amino acids.
source
NNHelferlein.embed_blosum62Function
embed_blosum62(x)

Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer function. x can be any AbstractArray of Int and a dimension of size 21 will be added as the first dimension.

source
NNHelferlein.embed_vhse8Function
embed_vhse8(x)

Embed a protein sequence into a 8-dimensional vector using the VHSE8 amino acid embedding scheme. Aminoacid are encoded as with NNHelferleins aminoacid tokenizer function. x can be any AbstractArray of Int and a dimension of size 21 will be added as the first dimension.

source
NNHelferlein.EmbedAminoAcidsType
EmbedAminoAcids <: AbstractLayer

Embed a protein sequence into a 21-dimensional vector using the BLOSUM62 amino acid substitution matrix or as a 8-dimensional vector using the VHSE8 parameters. Aminoacids must be encoded acording to NNHelferlein's aminoacid tokenizer function.

Layer input a is a n-dimensional array of an Integer type. Output is a (n+1)-dimensional array of Float32 type with a first (added) dimension of size 21 or 8.

Constructor:

  • EmbedAminoAcids(embedding::Symbol=:blosum62):
    • embedding=:blosum62: Either :blosum62 or :vhse8 to select the embedding scheme.
source

Saving, loading and inspection of models

NNHelferlein.save_networkFunction
save_network(fname, mdl)

Save a model as jld2-file.

Arguments:

  • fname: filename; if the name does not end with the extension .jld2, it will be added.
  • mdl: network model to be saved. The model will be copied to a cpu-based model via copy_network(mdl, to=:cpu) before saving, to remove hardware dependencies of parameters on the gpu.
source
NNHelferlein.load_networkFunction
load_network(fname; to=:gpu)

Load a model from a jld2-file.

Arguments:

  • fname: filename; if the name does not end with the extension .jld2, it will be added.
  • to=:gpu: by default, parameters are loaded as CuArrays, if a functional gpu is detected. If to=:cpu is specified parameters are loaded as cpu-arrays.
source
NNHelferlein.copy_networkFunction
copy_network(mdl::AbstractNN; to=:gpu)

Returns a copy of a Helferlein model. cave: the copy is generated by Adapt.adapt() and no deep copy!

Arguments:

  • mdl: Network model of type AbstractNN.
  • to=:gpu: by default all parameters of the copy are CuArrays for GPU usage. If to=:cpu is specified, parameters are Arrays and the model will be processed in the cpu.
source
Base.summaryFunction
function summary(mdl)

Print a network summary of any model of Type AbstractNN, AbstractChain or AbstractLayer.

source

Datasets

NNHelferlein.dataset_mit_nsrFunction
function dataset_mit_nsr(records=nothing; force=false)

Retrieve the Physionet ECG data set: "MIT-BIH Normal Sinus Rhythm Database". If necessary the data is downloaded from Zenodo (and stored in the NNHelferlein data directory, DOI).

All 18 recordings are returned as a list of DataFrames.

ECGs from the MIT-NSR database with some modifications to make them more suitable as playground data set for machine learning.

  • all 18 ECGs are trimmed to approx. 50000 heart beats from a region without recording errors
  • scaled to a range -1 to 1 (non-linear/tanh)
  • heart beats annotation as time series with value 1.0 at the point of the annotated beat and 0.0 for all other times
  • additional heart beat column smoothed by applying a gaussian filter
  • provided as csv with columns "time in sec", "channel 1", "channel 2", "beat" and "smooth".

Arguments:

  • force=false: if true the download will be forced and local data will be overwitten.
  • records: list of records names to be downloaded.

Examples:

nsr_16265 = dataset_mit_nsr("16265")
 nsr_16265 = dataset_mit_nsr(["16265", "19830"])
-nsr_all = dataset_mit_nsr()
source
NNHelferlein.dataset_mnistFunction
function dataset_mnist(; force=false)

Download the MNIST dataset with help of MLDatasets.jl from Yann LeCun's official website. 4 arrays xtrn, ytrn, xtst, ytst are returned.

xtrn and xtst will be the images as a multi-dimensional array, and ytrn and ytst the corresponding labels as integers.

The image(s) is/are returned in the horizontal-major memory layout as a single numeric array of eltype Float32. The values are scaled to be between 0 and 1. The labels are returned as a vector of Int8.

In the teaching input (i.e. y) the digit 0 is encoded as 10.

The data is stored in the Helferlein data directory and only downloaded the files are not already saved.

Ref.: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998 http://yann.lecun.com/exdb/mnist/.

Arguments:

  • force=false: if true, the dataset download will be forced.
source
NNHelferlein.dataset_fashion_mnistFunction
function dataset_fashion_mnist(; force=false)

Download Zalando's Fashion-MNIST datset with help of MLDatasets.jl from https://github.com/zalandoresearch/fashion-mnist.

4 arrays xtrn, ytrn, xtst, ytst are returned in the same structure as the original MNIST dataset.

The data is stored in the Helferlein data directory and only downloaded the files are not already saved.

Authors: Han Xiao, Kashif Rasul, Roland Vollgraf

Arguments:

  • force=false: if true, the dataset download will be forced.
source
NNHelferlein.dataset_pfamFunction
function dataset_pfam(records; force=false)

Retrieve the curated PFAM protein families database from Zenodo including 46872 sequences from 62 families. Sequences are between 100 and 1000 amino acids long and families have between 100 and 200 memebers. Training and test data are padded to a length of 1000 amino acids with the padding token of the amino acid tokenizer (26).

More information about the data set can be found at https://zenodo.org/record/8138939, including PDB sequence IDs for each data table.

Available records:

  • :raw: dataframe with all (46872) rows of data and the columns ID (PDB-ID), family (family name) and sequence (amino acid sequence)
  • :families: list of all family names as dataframe with the columns class (cnumeric class ID 1-62), family (family name) and and count (number of family members in the dataset)
  • :aminoacids: list of amino acid tokes as dataframe with the columns Token (aa token 1-26), One-Letter (one-letter code of the amino acid), and Amino acid (full name of the amino acid)
  • :train: dataframe with 42187 rows of training data and labels with the class ID as first column and the amino acid tokens as columns 2-1001 (padded to 1000 amino acids)
  • :test: dataframe with 4687 rows of test data in the same format as the training data
  • :balanced_train: dataframe with 111601 rows of balanced training data in the same format as the training data. The data is balanced by sampling 1800 sequences from each family.
  • :balanced_test: dataframe with 12401 rows of balanced test data in the same format as the training data.
source

Pretrained networks

NNHelferlein.get_vgg16Function
function get_vgg16(; filters_only=false, trainable=true)

Return a VGG16 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications.

Arguments

  • filters_only=false: if true, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.
  • trainable=true: if true, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.

Details:

The model weights are imported from the respective Keras Application, which is trained with preprocessed images of size 224x224 pixel. Image data format must be colour channels BGR and colour values 0.0 - 1.0.

This can be re-built by using a preprocessing pipeline and the Helferlein-function preproc_imagenet_vgg() from a directory img_path with images:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
+nsr_all = dataset_mit_nsr()
source
NNHelferlein.dataset_mnistFunction
function dataset_mnist(; force=false)

Download the MNIST dataset with help of MLDatasets.jl from Yann LeCun's official website. 4 arrays xtrn, ytrn, xtst, ytst are returned.

xtrn and xtst will be the images as a multi-dimensional array, and ytrn and ytst the corresponding labels as integers.

The image(s) is/are returned in the horizontal-major memory layout as a single numeric array of eltype Float32. The values are scaled to be between 0 and 1. The labels are returned as a vector of Int8.

In the teaching input (i.e. y) the digit 0 is encoded as 10.

The data is stored in the Helferlein data directory and only downloaded the files are not already saved.

Ref.: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998 http://yann.lecun.com/exdb/mnist/.

Arguments:

  • force=false: if true, the dataset download will be forced.
source
NNHelferlein.dataset_fashion_mnistFunction
function dataset_fashion_mnist(; force=false)

Download Zalando's Fashion-MNIST datset with help of MLDatasets.jl from https://github.com/zalandoresearch/fashion-mnist.

4 arrays xtrn, ytrn, xtst, ytst are returned in the same structure as the original MNIST dataset.

The data is stored in the Helferlein data directory and only downloaded the files are not already saved.

Authors: Han Xiao, Kashif Rasul, Roland Vollgraf

Arguments:

  • force=false: if true, the dataset download will be forced.
source
NNHelferlein.dataset_pfamFunction
function dataset_pfam(records; force=false)

Retrieve the curated PFAM protein families database from Zenodo including 46872 sequences from 62 families. Sequences are between 100 and 1000 amino acids long and families have between 100 and 200 memebers. Training and test data are padded to a length of 1000 amino acids with the padding token of the amino acid tokenizer (26).

More information about the data set can be found at https://zenodo.org/record/8138939, including PDB sequence IDs for each data table.

Available records:

  • :raw: dataframe with all (46872) rows of data and the columns ID (PDB-ID), family (family name) and sequence (amino acid sequence)
  • :families: list of all family names as dataframe with the columns class (cnumeric class ID 1-62), family (family name) and and count (number of family members in the dataset)
  • :aminoacids: list of amino acid tokes as dataframe with the columns Token (aa token 1-26), One-Letter (one-letter code of the amino acid), and Amino acid (full name of the amino acid)
  • :train: dataframe with 42187 rows of training data and labels with the class ID as first column and the amino acid tokens as columns 2-1001 (padded to 1000 amino acids)
  • :test: dataframe with 4687 rows of test data in the same format as the training data
  • :balanced_train: dataframe with 111601 rows of balanced training data in the same format as the training data. The data is balanced by sampling 1800 sequences from each family.
  • :balanced_test: dataframe with 12401 rows of balanced test data in the same format as the training data.
source

Pretrained networks

NNHelferlein.get_vgg16Function
function get_vgg16(; filters_only=false, trainable=true)

Return a VGG16 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications.

Arguments

  • filters_only=false: if true, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.
  • trainable=true: if true, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.

Details:

The model weights are imported from the respective Keras Application, which is trained with preprocessed images of size 224x224 pixel. Image data format must be colour channels BGR and colour values 0.0 - 1.0.

This can be re-built by using a preprocessing pipeline and the Helferlein-function preproc_imagenet_vgg() from a directory img_path with images:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
 mini_batches = mk_image_minibatch(img_path, 2, train=false, 
-        aug_pipl=pipl, pre_proc=preproc_imagenet_vgg)

Model structure is: VGG16 topology plot created by netron

source
NNHelferlein.get_resnet50v2Function
function get_resnet50v2(; filters_only=false, trainable=true)

Return a ResNet50 v2 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications.

Arguments

  • filters_only=false: if true, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.
  • trainable=true: if true, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.

Details:

The model weights are imported from the respective Keras Application, which is trained with images of size 224x224 pixel. Cave: The training set images have not been preprocessed with the imagenet default procedure! In contrats image data format must be colour channels RGB and colour values 0.0 - 1.0.

This can be re-built by using a preprocessing pipeline with application preproc_imagenet_resnetv2() from a directory img_path with images:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
+        aug_pipl=pipl, pre_proc=preproc_imagenet_vgg)

Model structure is: VGG16 topology plot created by netron

source
NNHelferlein.get_resnet50v2Function
function get_resnet50v2(; filters_only=false, trainable=true)

Return a ResNet50 v2 model with pretrained parameters from Tensorflow/Keras applications API. For details about original model and training see Keras Applications.

Arguments

  • filters_only=false: if true, only the filterstack is returned (without Flatten() and classifier) to be integrated in to any chain.
  • trainable=true: if true, the filterstack is set trainable, otherwise only the classifier part is trainable and the filter weights are fixed.

Details:

The model weights are imported from the respective Keras Application, which is trained with images of size 224x224 pixel. Cave: The training set images have not been preprocessed with the imagenet default procedure! In contrats image data format must be colour channels RGB and colour values 0.0 - 1.0.

This can be re-built by using a preprocessing pipeline with application preproc_imagenet_resnetv2() from a directory img_path with images:

pipl = CropRatio(ratio=1.0) |> Resize(224,224)
 mini_batches = mk_image_minibatch(img_path, 2, train=false, 
-        aug_pipl=pipl, pre_proc=preproc_imagenet_resnetv2)

Model structure is: ResNet50 V2 topology plot created by netron

source
+ aug_pipl=pipl, pre_proc=preproc_imagenet_resnetv2)

Model structure is: ResNet50 V2 topology plot created by netron

source diff --git a/dev/api_overview/index.html b/dev/api_overview/index.html index 69a3e348..01f47c1b 100644 --- a/dev/api_overview/index.html +++ b/dev/api_overview/index.html @@ -1,2 +1,2 @@ -API Overview · NNHelferlein.jl

Networks and chains

Network helpers

Layers

Fully connected layers

Convolutional

Layers for convolutional networks:

Recurrent

Layers for recurrent networks:

Helpers for recurrent networks

Other layers

Attention Mechanisms

Tranformer API

Activation functions

Helferlein-style is to provide all functions (such activation or loss functions) as functions. Therefore any function from any package or any custom function may be provided as actf to the layer constructors.

  • ... see Knet docu for all activation functions provided by Knet (elu, relu, selu, sigm, ...).

  • Helferlein provides some derived funs, such as leaky_relu, leaky_tanh, leaky_sigm or swish.

Data provider utilities

For tabular data

For image data

Image to array tools

ImageNet tools

Text data

Text corpus example data download

Iteration utilities

Training

  • tb_train! - high-level training utility with tenorboard support and (maybe too) many optional arguments

Evaluation and accuracy

Loss functions

Accuracy functions

Other utils

Utils for array manipulation

Utils for fixing types in GPU context

Datasets

Pretrained networks

Pretrained network weights, derived from Keras applications.

+API Overview · NNHelferlein.jl

Networks and chains

Network helpers

Layers

Fully connected layers

Convolutional

Layers for convolutional networks:

Recurrent

Layers for recurrent networks:

Helpers for recurrent networks

Other layers

Attention Mechanisms

Tranformer API

Activation functions

Helferlein-style is to provide all functions (such activation or loss functions) as functions. Therefore any function from any package or any custom function may be provided as actf to the layer constructors.

  • ... see Knet docu for all activation functions provided by Knet (elu, relu, selu, sigm, ...).

  • Helferlein provides some derived funs, such as leaky_relu, leaky_tanh, leaky_sigm or swish.

Data provider utilities

For tabular data

For image data

Image to array tools

ImageNet tools

Text data

Text corpus example data download

Iteration utilities

Training

  • tb_train! - high-level training utility with tenorboard support and (maybe too) many optional arguments

Evaluation and accuracy

Loss functions

Accuracy functions

Other utils

Utils for array manipulation

Utils for fixing types in GPU context

Datasets

Pretrained networks

Pretrained network weights, derived from Keras applications.

diff --git a/dev/changelog/index.html b/dev/changelog/index.html index f3830e61..29881c20 100644 --- a/dev/changelog/index.html +++ b/dev/changelog/index.html @@ -1,2 +1,2 @@ -Changelog · NNHelferlein.jl

ChangeLog of NNHelferlein package

todo

  • use CUDA.CuIterator in train?
  • padding no longer imported from NNlib (incompatibility wirh AutoGrad)

1.3.2

  • tidy-up dependency jungle
  • Padding added to emebdding layer

v1.3.1

  • l1 and l2 decay always parallel to learning rate decay
  • severeal bioinformatics tools (aminoacid embedding, blosum, vhse8)
  • dataframe_minibatch default "y" changed to nothing.
  • Bioinformatics: Aminoacid tokenisation added
  • GPU selection added (not yet exported)
  • grouped convolutions fixed

v1.3

  • Transformer API added for Bert-like architectures
  • Transformer example
  • ramp-up of beta added to VAE
  • disambiguate vae signature

v1.2

  • imagenet preprocessing fixed for vgg and resnet
  • ResNetBlock added
  • ResNet added
  • Padding layer added
  • print_network changed to summary
  • Pretrained nets saved at zenodo and simplified constructors added
  • AbstractNN and AbstractLayer added
  • copy model and save/load as JLD2 added

v1.1.2

  • Depthwise conv-layer added (experimental)
  • focal loss functions added to classifier
  • FeatureSelection layer added
  • explicit signature added for 3d-convolution
  • train: possibility to disable tensorboard logs
  • train: possibility to return losses and accs for plotting after training

v1.1.1

  • some docstring cosmetics
  • Activation Layers added
  • layer GlobalAveragePoling added
  • pre-trained vgg example fixed for changed "import-HDF"-interface
  • hdf5 import with all kwargs possible
  • added: Layer + Layer = Chain
  • changelog added to docu

v1.1.0

  • documentation for release added
  • split_minibatches() made stable (never returns an empty iterator)
  • docs slightly re-organised
  • Gaussian Layer added
  • minibatch iterator for masking added

v1.0.0

  • initial release
+Changelog · NNHelferlein.jl

ChangeLog of NNHelferlein package

todo

  • use CUDA.CuIterator in train?
  • padding no longer imported from NNlib (incompatibility wirh AutoGrad)

1.3.2

  • tidy-up dependency jungle
  • Padding added to emebdding layer

v1.3.1

  • l1 and l2 decay always parallel to learning rate decay
  • severeal bioinformatics tools (aminoacid embedding, blosum, vhse8)
  • dataframe_minibatch default "y" changed to nothing.
  • Bioinformatics: Aminoacid tokenisation added
  • GPU selection added (not yet exported)
  • grouped convolutions fixed

v1.3

  • Transformer API added for Bert-like architectures
  • Transformer example
  • ramp-up of beta added to VAE
  • disambiguate vae signature

v1.2

  • imagenet preprocessing fixed for vgg and resnet
  • ResNetBlock added
  • ResNet added
  • Padding layer added
  • print_network changed to summary
  • Pretrained nets saved at zenodo and simplified constructors added
  • AbstractNN and AbstractLayer added
  • copy model and save/load as JLD2 added

v1.1.2

  • Depthwise conv-layer added (experimental)
  • focal loss functions added to classifier
  • FeatureSelection layer added
  • explicit signature added for 3d-convolution
  • train: possibility to disable tensorboard logs
  • train: possibility to return losses and accs for plotting after training

v1.1.1

  • some docstring cosmetics
  • Activation Layers added
  • layer GlobalAveragePoling added
  • pre-trained vgg example fixed for changed "import-HDF"-interface
  • hdf5 import with all kwargs possible
  • added: Layer + Layer = Chain
  • changelog added to docu

v1.1.0

  • documentation for release added
  • split_minibatches() made stable (never returns an empty iterator)
  • docs slightly re-organised
  • Gaussian Layer added
  • minibatch iterator for masking added

v1.0.0

  • initial release
diff --git a/dev/examples/index.html b/dev/examples/index.html index 8f14368f..22c5c30d 100644 --- a/dev/examples/index.html +++ b/dev/examples/index.html @@ -1,2 +1,2 @@ -Examples · NNHelferlein.jl

Examples

Examples may be used as templates for new projects... All examples are at GitHub/examples:

  • Simple MLP: A simple multi-layer perceptron for MNIST classification, build with Knet and Helferlein-types in just one line of code (or so).
  • Vanilla Autoencoder: A simple autoencoder design with help of Knet in Helferlein-style.

  • Convolutional Autoencoder: A convolutional autoencoder design with help of Knet in Helferlein-style.

  • Variational Autoencoder: Example for a simple VAE utilising the NNHelferlein-type VAE and demonstrating the fascinating regularisation of a VAE.

  • Simple sequence-to-sequence network: Simple s2s network to demonstrate how to setup macghine translation with a rnn.

  • Sequence-to-sequence RNN for machine translation: RNN to demonstrate how to setup machine translation with a bidirectional encoder RNN and attention.

  • RNN Sequence tagger for annotation of ECGs: RNN to demonstrate how to set-up a sequence tagger to detect heart beats. Only one layer with 8 units is necessary to achieve almost 100% correct predictions. The example includes the definition on peephole LSTMs to display how to integrate non-standard rnn-units with the NNHelfrelein framework.

  • Import a Keras model: The notebook shows the import of a pretrained VGG16 model from Tensorflow/Keras into a Knet-style CNN and its application to example images utilising the Helferlein imagenet-utilities.

  • Transformer for machine translation: A simple transformer architecture is set up according to the 2017 Vaswani paper Attention is All You Need with help of NNHelferlein-utils.

Pretrained Nets

Based on the Keras import constructors, it is easy to import pretrained models from the TF/Keras ecosystem.

+Examples · NNHelferlein.jl

Examples

Examples may be used as templates for new projects... All examples are at GitHub/examples:

  • Simple MLP: A simple multi-layer perceptron for MNIST classification, build with Knet and Helferlein-types in just one line of code (or so).
  • Vanilla Autoencoder: A simple autoencoder design with help of Knet in Helferlein-style.

  • Convolutional Autoencoder: A convolutional autoencoder design with help of Knet in Helferlein-style.

  • Variational Autoencoder: Example for a simple VAE utilising the NNHelferlein-type VAE and demonstrating the fascinating regularisation of a VAE.

  • Simple sequence-to-sequence network: Simple s2s network to demonstrate how to setup macghine translation with a rnn.

  • Sequence-to-sequence RNN for machine translation: RNN to demonstrate how to setup machine translation with a bidirectional encoder RNN and attention.

  • RNN Sequence tagger for annotation of ECGs: RNN to demonstrate how to set-up a sequence tagger to detect heart beats. Only one layer with 8 units is necessary to achieve almost 100% correct predictions. The example includes the definition on peephole LSTMs to display how to integrate non-standard rnn-units with the NNHelfrelein framework.

  • Import a Keras model: The notebook shows the import of a pretrained VGG16 model from Tensorflow/Keras into a Knet-style CNN and its application to example images utilising the Helferlein imagenet-utilities.

  • Transformer for machine translation: A simple transformer architecture is set up according to the 2017 Vaswani paper Attention is All You Need with help of NNHelferlein-utils.

Pretrained Nets

Based on the Keras import constructors, it is easy to import pretrained models from the TF/Keras ecosystem.

diff --git a/dev/index.html b/dev/index.html index 5d9a28c7..f5473cec 100644 --- a/dev/index.html +++ b/dev/index.html @@ -135,4 +135,4 @@ y = Int8[5, 10, 4, 1, 9, 2, 1, 3] 2.3798099f0

... or with an iterator of minibatches to get the mean loss for the dataset:

julia> lenet(dtrn)
 
-2.6070921f0

The next step is to have a look at the examples in the GitHub repo:

Overview

Datasets

Some datasets as playground-data are provided with the package. Maybe more will follow...

API Reference

Index

Changelog

The history can be found here: ChangeLog of NNHelferlein package

+2.6070921f0

The next step is to have a look at the examples in the GitHub repo:

Overview

Datasets

Some datasets as playground-data are provided with the package. Maybe more will follow...

API Reference

Index

Changelog

The history can be found here: ChangeLog of NNHelferlein package

diff --git a/dev/license/index.html b/dev/license/index.html index c79c00a8..ab975fe1 100644 --- a/dev/license/index.html +++ b/dev/license/index.html @@ -1,2 +1,2 @@ -License · NNHelferlein.jl

The NNHelferlein.jl package is licensed under the MIT License:

Copyright (c) 2023 Andreas Dominik, THM University of Applied Sciences, Gießen, Germany

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+License · NNHelferlein.jl

The NNHelferlein.jl package is licensed under the MIT License:

Copyright (c) 2023 Andreas Dominik, THM University of Applied Sciences, Gießen, Germany

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

diff --git a/dev/overview/index.html b/dev/overview/index.html index 586279d6..8bd2995a 100644 --- a/dev/overview/index.html +++ b/dev/overview/index.html @@ -1,2 +1,2 @@ -Overview · NNHelferlein.jl

Overview

The section provides a brief overview of the functionality provided by NNHelferlen. For more details, please visit the API-Section.

Neural network definitions

The abstract type AbstractNN provides signatures to be called as

  • (m::AbstractNN)(x): evaluate x (sample or minibatch)
  • (m::AbstractNN)(x,y): evaluate x and calculate the loss
  • (m::AbstractNN)(d): return the mean loss for a dataset, if d is an iterator of type Knet.Data or NNHelferlen.DataLoader
  • (m::AbstractNN)((x,y)): return the mean loss for a x,y-tuple.

Explicit signatures exist for types Classifier and Regressor with negative log-likelihood and square loss as loss, respectively. For variational autoencoders the type VAE exists.

The type Chain wraps a list of layers that are executed sequentially.

Types Transformer and TokenTransformer are provided to build Bert-like transformer networks from the rspective TFEncoder and TFDecoder layers.

A network summary can be printed with summary(mdl::AbstractNN) and a more detailed list of all layers with print_network(mdl::AbstractNN).

Layer definitions

Several layers are predefined with executable signatures:

  • MLPs: different flavours of the simple layer: Dense: default layer for a vector (i.e. sample) or matrix (i.e. mininbatch) as input with logistic actvation as default. Linear: TensorFlow-style layer to process high-dimensional arrays and identity as default activation. Embed: embedding layer that adds a first dimension with the embeddings to the input.

  • Convolutional NNs: to build CNNs Conv, DeConv, Pool UnPool and Flat layers are provided with standard functionality. The utilitys include methods for array manipulation, such as clipping arrays or adding dimensions.

  • Recurrent Layers: a Recurrent layer is defined as wrapper around the basic Knet RNN type.

  • Others: additional layers include (please see the API-section for a complete list!): Softmax, Dropout, trainable BatchNorm, trainable LayerNorm.

Attention Mechanisms

Some attention mechanisms are implemented for use in sequence-to-sequence networks. If possible projections of values are precomputed to reduce computational cost:

  • AttnBahdanau: concat- or additive-style attention according to Bahdanau, 2015.
  • AttnLuong: multiplicative-or general-stype attention according to Luong, 2015.
  • AttnDot: dot-product-style attention according to Luong, 2015.
  • AttnLocation: dot-product-style attention according to Luong, 2015.
  • AttnInFeed: input-feeding attention according to Luong, 2015.

A generalised dot-product attention can be computed from (Query, Key, Value) tuple: dot_prod_attn(q, k, v).

Helpers for transformer networks include functions for positional encoding, generating padding- and peek-akead-masks and computing scaled multi-headed attention, according to Vaswani, 2017.

Data provider

Image data

The function mk_image_minibatch() can be used to create an iterator over images, organised in directories, with the first directory-level as class labels.

Helper functions (such as image2array(), array2image(), array2RGB()) can be used to transform image data to arrays. Imagenet-style preprocessing can be achieved with preproc_imagenet(), readable Imagenet class labels of the top predictions are printed by predict_imagenet().

DataFrames

Helpers for tabular date include:

  • dataframe_read: read a csv-file and return a DataFrame
  • dataframe_split: split tabular data in a DataFrame into train and validation data; optionally with balancing.
  • dataframe_minibatch: data provider to turn tabular data from a DataFrame (with one sample per row) into a Knet-like iterator of minibatches of type Knet.Data.
  • mk_class_ids(labels): may be used to turn class label strings into class-IDs.

Texts and NLP

Some utilities are provided for NLP data handling:

  • WordTokenizer: a simple tool to encode words as ids. The type comes with signatures to en- and decode in both directions.
  • get_tatoeba_corpus: download dual-language corpi and provide corresponding lists of sentences in two languages.

sequence_minibatch() function returns an iterator to sequence or sequence-to-secuence minibatches. Also helpers for padding and truncating sequences are provided.

Minibatch iteration utilities

A number of iterators are provided to wrap and manipulate minibatch iterators:

  • PartialIterator(it, states) returns an iterator that only iterates the given states of iterator it.
  • MBNoiser(it, σ) applies Gaussian noise to the x-values of minibatches, provided by iterator it.
  • MBMasquerade(it, ρ) applies a mask to the x-values of minibatches, provided by iterator it.

Working with pretrained networks

Layers of pre-trained models can be created from TensorFlow HDF5-parameter files. It is possible to build a network from any pretrained TensorFlow model by importing the parameters by HDF5-constructors for the layers Dense, Conv. The flatten-layer PyFlat allows for Python-like row-major-flattening, necessary to make sure, that the parameters of an imported layer after flattening are in the correct order.

NNHelferlein provides an increasing number of pretrained models from the Tensorflow/Keras model zoo, such as vgg or resnet. Please see the reference section for a up-to-date list.

Training

Although Knet-style is to avoid havyweight interfaces and train networks with lightweight and flexible optimisers, a train interface is added that provides TensorBoard logs with online reporting of minibatch loss, training and validation loss and accuracy.

Utilities

A number of additional utilities are included. Please have a look at the utilities section of the API documentation.

Bioinformatics

A number of utilities for bioinformatics are provided, including an amino acid tokenizer to convert amino acid sequences from String to vectors of integers and embedding of amino acids with BLOSUM62 or VHSE8 parameter sets.

Please have a look at the bioinformatics section of the API documentation.

+Overview · NNHelferlein.jl

Overview

The section provides a brief overview of the functionality provided by NNHelferlen. For more details, please visit the API-Section.

Neural network definitions

The abstract type AbstractNN provides signatures to be called as

  • (m::AbstractNN)(x): evaluate x (sample or minibatch)
  • (m::AbstractNN)(x,y): evaluate x and calculate the loss
  • (m::AbstractNN)(d): return the mean loss for a dataset, if d is an iterator of type Knet.Data or NNHelferlen.DataLoader
  • (m::AbstractNN)((x,y)): return the mean loss for a x,y-tuple.

Explicit signatures exist for types Classifier and Regressor with negative log-likelihood and square loss as loss, respectively. For variational autoencoders the type VAE exists.

The type Chain wraps a list of layers that are executed sequentially.

Types Transformer and TokenTransformer are provided to build Bert-like transformer networks from the rspective TFEncoder and TFDecoder layers.

A network summary can be printed with summary(mdl::AbstractNN) and a more detailed list of all layers with print_network(mdl::AbstractNN).

Layer definitions

Several layers are predefined with executable signatures:

  • MLPs: different flavours of the simple layer: Dense: default layer for a vector (i.e. sample) or matrix (i.e. mininbatch) as input with logistic actvation as default. Linear: TensorFlow-style layer to process high-dimensional arrays and identity as default activation. Embed: embedding layer that adds a first dimension with the embeddings to the input.

  • Convolutional NNs: to build CNNs Conv, DeConv, Pool UnPool and Flat layers are provided with standard functionality. The utilitys include methods for array manipulation, such as clipping arrays or adding dimensions.

  • Recurrent Layers: a Recurrent layer is defined as wrapper around the basic Knet RNN type.

  • Others: additional layers include (please see the API-section for a complete list!): Softmax, Dropout, trainable BatchNorm, trainable LayerNorm.

Attention Mechanisms

Some attention mechanisms are implemented for use in sequence-to-sequence networks. If possible projections of values are precomputed to reduce computational cost:

  • AttnBahdanau: concat- or additive-style attention according to Bahdanau, 2015.
  • AttnLuong: multiplicative-or general-stype attention according to Luong, 2015.
  • AttnDot: dot-product-style attention according to Luong, 2015.
  • AttnLocation: dot-product-style attention according to Luong, 2015.
  • AttnInFeed: input-feeding attention according to Luong, 2015.

A generalised dot-product attention can be computed from (Query, Key, Value) tuple: dot_prod_attn(q, k, v).

Helpers for transformer networks include functions for positional encoding, generating padding- and peek-akead-masks and computing scaled multi-headed attention, according to Vaswani, 2017.

Data provider

Image data

The function mk_image_minibatch() can be used to create an iterator over images, organised in directories, with the first directory-level as class labels.

Helper functions (such as image2array(), array2image(), array2RGB()) can be used to transform image data to arrays. Imagenet-style preprocessing can be achieved with preproc_imagenet(), readable Imagenet class labels of the top predictions are printed by predict_imagenet().

DataFrames

Helpers for tabular date include:

  • dataframe_read: read a csv-file and return a DataFrame
  • dataframe_split: split tabular data in a DataFrame into train and validation data; optionally with balancing.
  • dataframe_minibatch: data provider to turn tabular data from a DataFrame (with one sample per row) into a Knet-like iterator of minibatches of type Knet.Data.
  • mk_class_ids(labels): may be used to turn class label strings into class-IDs.

Texts and NLP

Some utilities are provided for NLP data handling:

  • WordTokenizer: a simple tool to encode words as ids. The type comes with signatures to en- and decode in both directions.
  • get_tatoeba_corpus: download dual-language corpi and provide corresponding lists of sentences in two languages.

sequence_minibatch() function returns an iterator to sequence or sequence-to-secuence minibatches. Also helpers for padding and truncating sequences are provided.

Minibatch iteration utilities

A number of iterators are provided to wrap and manipulate minibatch iterators:

  • PartialIterator(it, states) returns an iterator that only iterates the given states of iterator it.
  • MBNoiser(it, σ) applies Gaussian noise to the x-values of minibatches, provided by iterator it.
  • MBMasquerade(it, ρ) applies a mask to the x-values of minibatches, provided by iterator it.

Working with pretrained networks

Layers of pre-trained models can be created from TensorFlow HDF5-parameter files. It is possible to build a network from any pretrained TensorFlow model by importing the parameters by HDF5-constructors for the layers Dense, Conv. The flatten-layer PyFlat allows for Python-like row-major-flattening, necessary to make sure, that the parameters of an imported layer after flattening are in the correct order.

NNHelferlein provides an increasing number of pretrained models from the Tensorflow/Keras model zoo, such as vgg or resnet. Please see the reference section for a up-to-date list.

Training

Although Knet-style is to avoid havyweight interfaces and train networks with lightweight and flexible optimisers, a train interface is added that provides TensorBoard logs with online reporting of minibatch loss, training and validation loss and accuracy.

Utilities

A number of additional utilities are included. Please have a look at the utilities section of the API documentation.

Bioinformatics

A number of utilities for bioinformatics are provided, including an amino acid tokenizer to convert amino acid sequences from String to vectors of integers and embedding of amino acids with BLOSUM62 or VHSE8 parameter sets.

Please have a look at the bioinformatics section of the API documentation.