Skip to content

Latest commit

 

History

History
160 lines (117 loc) · 6.35 KB

CHANGELOG.md

File metadata and controls

160 lines (117 loc) · 6.35 KB

[Unreleased]

New features

  • Display sentence length distribution in preprocess
  • Support vectors as inputs using Kaldi input format
  • Support parallel file alignment by index in addition to line-by-line
  • Add script to convert and/or generate pretrained word embeddings

Fixes and improvements

  • Improve correctness of DBiEncoder and PDBiEncoder implementation
  • Fix translation error of models profiled during training
  • Fix error when using one-layer GRU
  • Do not allow duplicate commandline options and do not print help on errors

v0.6.0 (2017-04-07)

New features

  • Add new encoders: deep bidirectional and pyramidal deep bidirectional
  • Add attention variants: no attention and dot, general or concat global attention
  • Add alternative learning rate decay strategy for SGD training
  • Introduce dynamic parameter change for dropout and fixed word embeddings
  • Add length and coverage normalization during the beam search
  • Add translation option to dump input sentence encoding
  • Add TensorBoard metrics visualisation with Crayon
  • [experimental] Add sequence tagger model

Fixes and improvements

  • [Breaking, changed option] -fix_word_vecs options now accept 0 and 1 for a better retraining experience
  • Check consistency of option settings when training from checkpoints
  • Save and restore random number generator states from checkpoints
  • Output more dataset metrics during the preprocessing
  • Improve error message on invalid options
  • Fix missing n-best hypotheses list in the output file
  • Fix individual losses that were always computed when using random sampling
  • Fix duplicated logs in parallel mode

v0.5.3 (2017-03-30)

Fixes and improvements

  • Fix data loading during training

v0.5.2 (2017-03-29)

Fixes and improvements

  • Improve compatibility with older Torch versions missing the fmod implementation

v0.5.1 (2017-03-28)

Fixes and improvements

  • Fix translation with FP16 precision
  • Fix regression that make tds mandatory for translation

v0.5.0 (2017-03-06)

New features

  • Training code is now part of the library
  • Add -fallback_to_cpu option to continue execution on CPU if GPU can't be used
  • Add standalone script to generate vocabularies
  • Add script to extract word embeddings
  • Add option to prune vocabularies by minimum word frequency
  • New REST server
  • [experimental] Add data sampling during training
  • [experimental] Add half floating point (fp16) support (with cutorch@359ee80)

Fixes and improvements

  • Make sure released model does not contain any serialized function
  • Reduce size of released BRNN models (up to 2x smaller)
  • Reported metrics are no longer averaged on the entire epoch
  • Improve logging in asynchronous training
  • Allow fixing word embeddings without providing pre-trained embeddings
  • Fix pretrained word embeddings that were overriden by parameters initialization
  • Fix error when using translation server with GPU model
  • Fix gold data perplexity reporting during translation
  • Fix wrong number of attention vectors returned by the translator

v0.4.1 (2017-02-16)

Fixes and improvements

  • Fix translation server error when clients send escaped unicode sequences
  • Fix compatibility issue with the :split() function

v0.4.0 (2017-02-10)

New features

  • Profiler option
  • Support hypotheses filtering during the beam search
  • Support individually setting features vocabulary and embedding size
  • [experimental] Scripts to interact with the benchmark platform
  • [experimental] Language modeling example

Fixes and improvements

  • [Breaking, new API] Improve translator API consistency
  • Improve beam search speed (up to 90% faster)
  • Reduce released model size (up to 2x smaller)
  • Fix tokenization of text containing the joiner marker character
  • Fix -joiner_new option when using BPE
  • Fix joiner marker generated without the option enabled
  • Fix translation server crash on Lua errors
  • Fix error when loading configuration files containing the gpuid option
  • Fix BLEU drop when applying beam search on some models
  • Fix error when using asynchronous parallel mode
  • Fix non SGD model serialization after retraining
  • Fix error when using -replace_unk with empty sentences in the batch
  • Fix error when translating empty batch

v0.3.0 (2017-01-23)

New features

  • ZeroMQ translation server
  • Advanced log management
  • GRU cell
  • Tokenization option to make the token separator an independent token
  • Tokenization can run in parallel mode

Fixes and improvements

  • [Breaking, renamed option] Rename -epochs option to -end_epoch to clarify its behavior
  • [Breaking, removed option] Remove -nparallel option and support a list of comma-separated identifiers on -gpuid
  • [Breaking, renamed option] Zero-Width Joiner unicode character (ZWJ) is now tokenizing - but as a joiner
  • Fix Hangul tokenization
  • Fix duplicated tokens in aggressive tokenization
  • Fix error when using BRNN and multiple source features
  • Fix error when preprocessing empty lines and using additional features
  • Fix error when translating empty sentences
  • Fix error when retraining a BRNN model on multiple GPUs

v0.2.0 (2017-01-02)

New features

  • [Breaking, renamed option] Control maximum source and target length independently
  • Asynchronous SGD
  • Detokenization
  • BPE support in tokenization

Fixes and improvements

  • Smaller memory footprint during training
  • Smaller released model size after a non-SGD training
  • Fix out of memory errors in preprocessing
  • Fix BRNN models serialization and release
  • Fix error when retraining a model
  • Fix error when using more than one feature

v0.1.0 (2016-12-19)

Initial release.