Skip to content

Release 2.1.0

Compare
Choose a tag to compare
@github-actions github-actions released this 07 Jan 12:30
· 67 commits to main since this release

2.1.0 (2025-01-07)

Feat

  • prepare: outputs additional column signature_morgans
  • learning: add transformer code
  • dataset: add code to compute model tokens
  • dataset: add code for download and prepare datasets
  • transformer/train: additional arg for setting source / target max length
  • transformer/train: implement gradient accumulation
  • transformer/train: define num of data loader workers from args
  • transformer/train: make modele compilation by Torch optional
  • transformer/train: generalize mixed precision scaler usage
  • transformer/model: refine state_dict Module's method
  • transformer/train: check for NaNs in loss
  • transformer/train: model dir output as arg
  • transformer/train: experimentation with mixed precision floats
  • transformer/train: make use of pin_memory=true in dataloaders expected to increase GPU perf
  • transformer/train: first working version
  • transformer: in dev code
  • new code to download and make use of the signature code (#10)

Fix

  • prepare: remove deprecated import
  • get_smiles: remove superflous Hs
  • prepare: sanitize molecule after stereo-isomer enumeration
  • prepare: add missing header
  • update changelog on version bump
  • attempt to trigger GA
  • main instead of master branch name
  • dataset: remove unused code
  • transformer/train: load_checkpoint
  • transformer/train: effective batch indexes
  • transformer/train: duplicated loss normalization
  • transformer/train: wrong arg name
  • transformer/train: take into account remaining remaining batches for the sceduler counts
  • transformer/train: propagate gradient for last batches of epoch
  • transformer/train: remove multiple calls to unscale_
  • transformer/train: use save_checkpoint
  • transformer/train: refine save and load methods
  • transformer/train: correct seq length arg
  • transformer/train: stop sending to preset device
  • dataset/utils.py: forward pass logger in recursive calls
  • tokenizer: allow additional depictions

Refactor

  • remove old code
  • .env: ignore local env file
  • erase old code
  • transformer: sweep code
  • dataset: clean deprecated code
  • transformer: remove deprecated code
  • transformer/train: refine gradient accumulation
  • transformer/config: reduce learning rate to prevent NaN / Inf values
  • transformer/train: make GPU pinned memory an option
  • transformer/train: add few debug messages
  • transformer/config: update
  • transformer/config: update
  • transformer/train: get the number of epochs from config
  • transformer/train: better log Nan / Inf value issues
  • transformer/config: increase learning rate
  • transformer/config: increase learning rate
  • transformer/config: reduce learning rate
  • transformer/train: update default log level
  • transformer/train: better handle device arg
  • transformer/config.yaml: update training values
  • model: remove unecessary code
  • dataset/utils.py: don't sort config keys
  • download: update paths

Perf

  • transformer/train: AdamW optimizer instead of Adam, OneCycleLR scheduler