Skip to content

Releases: brsynth/molecule-signature-paper

Release 3.0.0

31 Jan 15:02
Compare
Choose a tag to compare

3.0.0 (2025-01-31)

Feat

  • predict: predict call from CLI now always outputs something
  • 2.enumeration_results: notebook to perform the molecule enumeration
  • 1.enumeration_create_alphabets: notebook to create alphabets
  • notebooks: add notebook
  • configure: add interface for simple configuration
  • predict: add interface for predicting (without evaluation)
  • model: fully batch-vectorized version for beam search
  • predict: test for equality of canonic SMILES
  • predict: decode beam in parallel

Fix

  • predict: convert python objects to strings
  • predict: dataclass attribute poorly tested
  • predict: remove pickle writing
  • evaluate: add output file handling to write results
  • evaluate: correctly deal with the max number of rows
  • evaluate: include chirality in ECFP
  • predict: stop crashing when beam size > vocab size
  • predict: allow selection of the accelerator device
  • predict: column indexes

Refactor

  • predict: update default arg values
  • predict: update result refinements
  • configure: add default output path to None
  • config: remove unused method
  • imports: refine
  • add empty data folder
  • notebooks: merge cells
  • notebooks: rename nb
  • notebooks: rename notebook for fig 2
  • utils: move utilities functions
  • evaluate: remove file
  • predict: make col names more explicit
  • evaluate: sweep code
  • utils: sweep code
  • predict: refine outputs
  • predict: remove unused args
  • utils: additional shared functions
  • predict: delagate results refining to subsequent code
  • predict: allow calls from other script
  • predict: better print
  • predict: improve imports
  • predict: print result to stdout on request
  • predict: print default values

Release 2.1.0

07 Jan 12:30
Compare
Choose a tag to compare

2.1.0 (2025-01-07)

Feat

  • prepare: outputs additional column signature_morgans
  • learning: add transformer code
  • dataset: add code to compute model tokens
  • dataset: add code for download and prepare datasets
  • transformer/train: additional arg for setting source / target max length
  • transformer/train: implement gradient accumulation
  • transformer/train: define num of data loader workers from args
  • transformer/train: make modele compilation by Torch optional
  • transformer/train: generalize mixed precision scaler usage
  • transformer/model: refine state_dict Module's method
  • transformer/train: check for NaNs in loss
  • transformer/train: model dir output as arg
  • transformer/train: experimentation with mixed precision floats
  • transformer/train: make use of pin_memory=true in dataloaders expected to increase GPU perf
  • transformer/train: first working version
  • transformer: in dev code
  • new code to download and make use of the signature code (#10)

Fix

  • prepare: remove deprecated import
  • get_smiles: remove superflous Hs
  • prepare: sanitize molecule after stereo-isomer enumeration
  • prepare: add missing header
  • update changelog on version bump
  • attempt to trigger GA
  • main instead of master branch name
  • dataset: remove unused code
  • transformer/train: load_checkpoint
  • transformer/train: effective batch indexes
  • transformer/train: duplicated loss normalization
  • transformer/train: wrong arg name
  • transformer/train: take into account remaining remaining batches for the sceduler counts
  • transformer/train: propagate gradient for last batches of epoch
  • transformer/train: remove multiple calls to unscale_
  • transformer/train: use save_checkpoint
  • transformer/train: refine save and load methods
  • transformer/train: correct seq length arg
  • transformer/train: stop sending to preset device
  • dataset/utils.py: forward pass logger in recursive calls
  • tokenizer: allow additional depictions

Refactor

  • remove old code
  • .env: ignore local env file
  • erase old code
  • transformer: sweep code
  • dataset: clean deprecated code
  • transformer: remove deprecated code
  • transformer/train: refine gradient accumulation
  • transformer/config: reduce learning rate to prevent NaN / Inf values
  • transformer/train: make GPU pinned memory an option
  • transformer/train: add few debug messages
  • transformer/config: update
  • transformer/config: update
  • transformer/train: get the number of epochs from config
  • transformer/train: better log Nan / Inf value issues
  • transformer/config: increase learning rate
  • transformer/config: increase learning rate
  • transformer/config: reduce learning rate
  • transformer/train: update default log level
  • transformer/train: better handle device arg
  • transformer/config.yaml: update training values
  • model: remove unecessary code
  • dataset/utils.py: don't sort config keys
  • download: update paths

Perf

  • transformer/train: AdamW optimizer instead of Adam, OneCycleLR scheduler

Release 1.1.0

30 Oct 15:07
Compare
Choose a tag to compare

1.1.0 (2023-10-30)

Features

  • download_metanetx: generate sig alphabet with nbit and neighbors (8b749d6)
  • library: update to RevSig1.5 (8de9a0d)
  • paper: construct alphabet for sig-nbit (866437d)
  • paper: download, add emolecules (093fcfe)
  • paper: download, add FP count and extract test_small (9987ee8)
  • paper: download, enable formalCharge in sanitize (d4c66e3)
  • paper: enable sig-nbit (4f2c125)
  • paper: img, add (9469384)
  • paper: img, add degenerescence (d2e6730)
  • paper: tokenizer, use ECFP4_COUNT (9e31b56)
  • tokenize: write SIG-NEIGH-NBIT datasets (f84dc93)
  • tokenizer: increase script verbosity (0bfe1ab)
  • tokenizer: new arguments to select tokenizer model, depic to treat and pairs to build (19d03e3)
  • tokenizer: produce SIG-NEIGH-NBIT datasets (daf0454)
  • tokenizer: refactor and enable unigram model type (45400fe)
  • tokenizer: use all tokens available and support unigram model (de449e2)

Bug Fixes

  • download_metanetx: fix paths (0c44e9b)
  • download_metanetx: fix paths (1febe59)
  • paper: dataset, ecfp4 duplicate index number according to the count (d7abd1e)
  • paper: tokenizer, use the right function (73c9f21)
  • signature: use ECFP instead of FCFP (97972fb)
  • tokenizer: fix regular expression (7e9bcdc)
  • tokenizer: spelling in AROMATIC bond regex (624d678)
  • tokenizer: stop omitting bounds in regex (0398217)
  • tokenizer: stop spliting SIG bond tokens (0f004cd)

Code Refactoring

  • download_metanetx: print settings (3921c6c)
  • download_metanetx: progress bar and more logs (14fe1fe)
  • download_metanetx: store file paths in args.dict (a898d9c)

Build Systems

Documentation

Styles

  • tokenizer: black file (c33f3b4)
  • indent comments (782673d)
  • download_metanetx: add comments (095692a)
  • download_metanetx: more explicit argparse help (15aa802)
  • tokenizer: fix flake8 warnings (9c1c105)

Release 1.0.0

09 Aug 09:55
Compare
Choose a tag to compare

1.0.0 (2023-08-09)

⚠ BREAKING CHANGES

  • tokenizer:

Features

  • download: introduce default output dir (09765ec)
  • library: update with "RevSig1.2" (1608cd7)
  • paper: add tokenizer signature (545cfa6)
  • retrosig: add utils/cmd.py file (8f398db)
  • tokenizer: add sentencepiece tokenizer (3ee9f5c)
  • tokenizer: build vocabularies and dataset pairs (f4ae35d)
  • tokenizer: only output on-bits in ECFP4 (62f9dca)

Bug Fixes

  • download: create ouput dir if it not exists (6679734)
  • download: fix argparse crash due to percent sign in help (#6) (2db597e)
  • download: prevent removing raw mnx file (f3c2d85)
  • download: put back right path for rdkit method (#7) (1402a45)
  • download: shuffle data only once (f51742f)
  • tokenizer: fingerprints name in upper case to match expectation (256021a)

Build Systems

Code Refactoring

  • download: change default value of test and valid datasets (6ceb79a)
  • download: disable shuffling before sanitizing (0934919)
  • download: pointing out unexpected filtered smiles (6c99f96)
  • download: update ouput name for the signature alphabet file (a2ef08c)
  • sweep imports (0a49399)
  • download: simplify args usage (90bf3f5)
  • tokenizer: change file pairs extension (576139c)

Styles

  • download: rename variables (bb88d89)
  • download: sweep imports (de34f44)
  • download: sweep imports (c6313dd)
  • download: update helps of arguments (82a5afb)
  • blacked files (856f0fc)

Documentation

  • download: make fingerprint size explicit (024281f)
  • README: update (4a31529)
  • README: update (fe73fc0)
  • README: update install instructions (389bfc4)