Skip to content

Node v0.8.0

Compare
Choose a tag to compare
@n1t0 n1t0 released this 02 Sep 18:12

BREACKING CHANGES

  • Many improvements on the Trainer (#519).
    The files must now be provided first when calling tokenizer.train(files, trainer).

Features

  • Adding the TemplateProcessing
  • Add WordLevel and Unigram models (#490)
  • Add nmtNormalizer and precompiledNormalizer normalizers (#490)
  • Add templateProcessing post-processor (#490)
  • Add digitsPreTokenizer pre-tokenizer (#490)
  • Add support for mapping to sequences (#506)
  • Add splitPreTokenizer pre-tokenizer (#542)
  • Add behavior option to the punctuationPreTokenizer (#657)
  • Add the ability to load tokenizers from the Hugging Face Hub using fromPretrained (#780)

Fixes

  • Fix a bug where long tokenizer.json files would be incorrectly deserialized (#459)
  • Fix RobertaProcessing deserialization in PostProcessorWrapper (#464)