Releases: kermitt2/delft
Releases · kermitt2/delft
Version 0.3.4
- support multiple GPU training/inference (
--multi-gpu
parameter) - support safetensors model weights format
- support private HuggingFace models
- in application scripts: add
max-epoch
parameter, learning rate parameter - add grobid model for funding and acknowledgement information
- more parameter information printed when training
- some dependency updates
Version 0.3.3
with PyPI:
pip install delft==0.3.3
- support for incremental training
- fix SciBERT tokenier initialization from HuggingFace model
- updated HuggingFace transformers library to 4.25.1 and tensorflow to 2.9.3
- review the support of BPE tokenizers in the case of pre-tokenized input with the updated transformers library for most transformer models using it (tested with Roberta/GPT2, CamemBERT, bart-base, albert-base-v2, and XLM)
- addition of some model variants for sequence labeling (BERT_FEATURES, BERT_ChainCRF_FEATURES)
Version 0.3.2
with PyPI:
pip install delft==0.3.2
- Print model parameters at creation and load time
- Dataset recognition
- Model updates
- Set feature channel embeddings trainable
Full Changelog: v0.3.1...v0.3.2
Version 0.3.1
- fix a problem with CRF tensorflow-addons when batch size is 1
Version 0.3.0
- Migration of DeLFT to TensorFlow 2.7
- Support of HuggingFace transformer library (auto* library)
- New architectures and updated models
- General usage of optimizer with learning rate decay
- Updated docs now via readthedoc
- Improved ELMo embeddings
- Transformers wrapper to limit usage of Hugging Face hub only necessary, model with transformer layer fully portable without hub access
Version 0.2.6
- add automatic download of embeddings if not locally available
- enable embedding preload script for docker image
Version 0.2.5
- fix serialization of models with feature preprocessor (PR #110)
- update grobid models with features
- some other models and score updates
- add "software was used" classification model for software citations
- update tensorflow dependency
Version 0.2.4
- generic support for feature channel in sequence labeling, test with Grobid training data
- fix issues #40 #44 #48 #50 #52 #54 #56 #66 #69 #71 #94 #100 #103
- update eval (average field level n-fold cross-validation)
- dataseer and software use classification models
- review and improvement for BERT sequence labeling and classification (unicode, binary/multi-label, test SciBERT, bioBERT, ...)
- force split lemonde corpus evaluation (to be compared with some publication results using this)
- fixing truncation in sequence labeling
- more documentation
- various bug fixing