Release Marian v1.6.0 · marian-nmt/marian

[1.6.0] - 2018-08-08

Faster training (20-30%) by optimizing gradient popagation of biases
Returning Moses-style hard alignments during decoding single models, ensembles and n-best
lists
Hard alignment extraction strategy taking source words that have the
attention value greater than the threshold
Refactored sync sgd for easier communication and integration with NCCL
Smaller memory-overhead for sync-sgd
NCCL integration (version 2.2.13)
New binary format for saving/load of models, can be used with *.bin
extension (can be memory mapped)
Memory-mapping of graphs for inferece with ExpressionGraph::mmap(const void* ptr) function. (assumes *.bin model is mapped or in buffer)
Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of
RNN cells.
RNN auto-regression layers in transformer (--transformer-decoder-autreg rnn), work with gru, lstm, tanh, relu, sru cells.
Recurrently stacked layers in transformer (--transformer-tied-layers 1 1 1 2 2 2 means 6 layers with 1-3 and 4-6 tied parameters, two groups of
parameters)

A couple of bugs in "selection" (transpose, shift, cols, rows) operators during
back-prob for a very specific case: one of the operators is the first operator after
a branch, in that case gradient propgation might be interrupted. This did not affect
any of the existing models as such a case was not present, but might have caused
future models to not train properly.
Bug in mini-batch-fit, tied embeddings would result in identical embeddings in fake
source and target batch. Caused under-estimation of memory usage and re-allocation.
Seamless training continuation with exponential smoothing