Skip to content

Marian v1.6.0

Compare
Choose a tag to compare
@emjotde emjotde released this 08 Aug 23:24
· 2530 commits to master since this release

[1.6.0] - 2018-08-08

Added

  • Faster training (20-30%) by optimizing gradient popagation of biases
  • Returning Moses-style hard alignments during decoding single models, ensembles and n-best
    lists
  • Hard alignment extraction strategy taking source words that have the
    attention value greater than the threshold
  • Refactored sync sgd for easier communication and integration with NCCL
  • Smaller memory-overhead for sync-sgd
  • NCCL integration (version 2.2.13)
  • New binary format for saving/load of models, can be used with *.bin
    extension (can be memory mapped)
  • Memory-mapping of graphs for inferece with ExpressionGraph::mmap(const void* ptr) function. (assumes *.bin model is mapped or in buffer)
  • Added SRU (--dec-cell sru) and ReLU (--dec-cell relu) cells to inventory of
    RNN cells.
  • RNN auto-regression layers in transformer (--transformer-decoder-autreg rnn), work with gru, lstm, tanh, relu, sru cells.
  • Recurrently stacked layers in transformer (--transformer-tied-layers 1 1 1 2 2 2 means 6 layers with 1-3 and 4-6 tied parameters, two groups of
    parameters)

Fixed

  • A couple of bugs in "selection" (transpose, shift, cols, rows) operators during
    back-prob for a very specific case: one of the operators is the first operator after
    a branch, in that case gradient propgation might be interrupted. This did not affect
    any of the existing models as such a case was not present, but might have caused
    future models to not train properly.
  • Bug in mini-batch-fit, tied embeddings would result in identical embeddings in fake
    source and target batch. Caused under-estimation of memory usage and re-allocation.
  • Seamless training continuation with exponential smoothing