You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Faster training (20-30%) by optimizing gradient popagation of biases
Returning Moses-style hard alignments during decoding single models, ensembles and n-best
lists
Hard alignment extraction strategy taking source words that have the
attention value greater than the threshold
Refactored sync sgd for easier communication and integration with NCCL
Smaller memory-overhead for sync-sgd
NCCL integration (version 2.2.13)
New binary format for saving/load of models, can be used with *.bin
extension (can be memory mapped)
Memory-mapping of graphs for inferece with ExpressionGraph::mmap(const void* ptr) function. (assumes *.bin model is mapped or in buffer)
Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of
RNN cells.
RNN auto-regression layers in transformer (--transformer-decoder-autreg rnn), work with gru, lstm, tanh, relu, sru cells.
Recurrently stacked layers in transformer (--transformer-tied-layers 1 1 1 2 2 2 means 6 layers with 1-3 and 4-6 tied parameters, two groups of
parameters)
Fixed
A couple of bugs in "selection" (transpose, shift, cols, rows) operators during
back-prob for a very specific case: one of the operators is the first operator after
a branch, in that case gradient propgation might be interrupted. This did not affect
any of the existing models as such a case was not present, but might have caused
future models to not train properly.
Bug in mini-batch-fit, tied embeddings would result in identical embeddings in fake
source and target batch. Caused under-estimation of memory usage and re-allocation.
Seamless training continuation with exponential smoothing