v1.13.0 release PR #482

mart-r · 2024-08-28T13:44:31Z

Preparing for next minor release.

1) Added model.zero_grad to clear accumulated gradients 2) Fixed config save issue 3) Re-structured data preparation for oversampled data

Pushing ml_utils file which was missed in the last commit

The workflow for inference is: load() and inference For training: init() and train() Train will always not load the model dict, except when the phase_number is set to 2 for 2 phase learning's second phase

BERT test cases: Testing for BERT model along with 2 phase learning

* Small addition to contribution guidelines (#420) * CU-8694cbcpu: Allow specifying an AU Snomed when preprocessing (#421) * CU-8694dpy1c: Return empty generator upon empty stream (#423) * CU-8694dpy1c: Return empty generator upon empty stream * CU-8694dpy1c: Fix empty generator returns * CU-8694dpy1c: Simplify empty generator returns * Relation extraction (#173) * Added files. * More additions to rel extraction. * Rel base. * Update. * Updates. * Dependency parsing. * Updates. * Added pre-training steps. * Added training & model utils. * Cleanup & fixes. * Update. * Evaluation updates for pretraining. * Removed duplicate relation storage. * Moved RE model file location. * Structure revisions. * Added custom config for RE. * Implemented custom dataset loader for RE. * More changes. * Small fix. * Latest additions to RelCAT (pipe + predictions) * Setup.py fix. * RE utils update. * rel model update. * rel dataset + tokenizer improvements. * RelCAT updates. * RelCAT saving/loading improvements. * RelCAT saving/loading improvements. * RelCAT model fixes. * Attempted gpu learning fix. Dataset label generation fixes. * Minor train dataset gen fix. * Minor train dataset gen fix No.2. * Config updates. * Gpu support fixes. Added label stats. * Evaluation stat fixes. * Cleaned stat output mode during training. * Build fix. * removed unused dependencies and fixed code formatting * Mypy compliance. * Fixed linting. * More Gpu mode train fixes. * Fixed model saving/loading issues when using other baes models. * More fixes to stat evaluation. Added proper CAT integration of RelCAT. * Setup.py typo fix. * RelCAT loading fix. * RelCAT Config changes. * Type fix. Minor additions to RelCAT model. * Type fixes. * Type corrections. * RelCAT update. * Type fixes. * Fixed type issue. * RelCATConfig: added seed param. * Adaptations to the new codebase + type fixes.. * Doc/type fixes. * Fixed input size issue for model. * Fixed issue(s) with model size and config. * RelCAT: updated configs to new style. * RelCAT: removed old refs to logging. * Fixed GPU training + added extra stat print for train set. * Type fixes. * Updated dev requirements. * Linting. * Fixed pin_memory issue when training on CPU. * Updated RelCAT dataset get + default config. * Updated RelDS generator + default config * Linting. * Updated RelDatset + config. * Pushing updates to model Made changes to: 1) Extracting given number of context tokens left and right of the entities 2) Extracting hidden state from bert for all the tokens of the entities and performing max pooling on them * Fixing formatting * Update rel_dataset.py * Update rel_dataset.py * Update rel_dataset.py * RelCAT: added test resource files. * RelCAT: Fixed model load/checkpointing. * RelCAT: updated to pipe spacy doc call. * RelCAT: added tests. * Fixed lint/type issues & added rel tag to test DS. * Fixed ann id to token issue. * RelCAT: updated test dataset + tests. * RelCAT: updates to requested changes + dataset improvements. * RelCAT: updated docs/logs according to commends. * RelCAT: type fix. * RelCAT: mct export dataset updates. * RelCAT: test updates + requested changes p2. * RelCAT: log for MCT export train. * Updated docs + split train_test & dataset for benchmarks. * type fixes. --------- Co-authored-by: Shubham Agarwal <66172189+shubham-s-agarwal@users.noreply.github.com> Co-authored-by: mart-r <mart.ratas@gmail.com> * CU-8694fae3r: Avoid publishing PyPI release when doing GH pre-releases (#424) * CU-8694fae3r: Avoid publishing PyPI release when doing GH pre-releases * CU-8694fae3r: Fix pre-releases tagging * CU-8694fae3r: Allow actions to run on release edit --------- Co-authored-by: Mart Ratas <mart.ratas@gmail.com> Co-authored-by: Vlad Dinu <62345326+vladd-bit@users.noreply.github.com>

This reverts commit fbcdb70.

Adding Bert-style model for MetaCAT

#433) * CU-8694hukwm: Document the materialising of generator when multiprocessing and batching for docs * CU-8694hukwm: Add TODO note for where the generator is materialised * CU-8694hukwm: Add warning from large amounts of generator data (10k items) is materialised by the docs size mp method

* CU-8694fk90r: Move backwards compatibility method from CDB to config utils * CU-8694fk90r: Move weighted_average_function from config to CDB; create necessary backwards compatibility workarounds * CU-8694fk90r: Move usage of weighted_average_function in tests * CU-8694fk90r: Add JSON encode and decoder for re.Pattern * CU-8694fk90r: Rebuild custom decoder if needed * CU-8694fk90r: Add method to detect old style config * CU-8694fk90r: Use regular json serialisation for config; Retain option to read old jsonpickled config * CU-8694fk90r: Add test for config serialisation * CU-8694fk90r: Make sure to fix weighted_average_function upon setting it * CU-8694fk90t: Add missing tests for config utils * CU-8694fk90t: Add tests for better raised exception upon old way of using weighted_average_function * CU-8694fk90t: Fix exception type in an added test * CU-8694fk90t: Add further tests for exception payload * CU-8694fk90t: Add improved exceptions when using old/unsupported value of weighted_average_function in config * CU-8694fk90t: Add typing fix exceptions * CU-8694fk90t: Make custom exception derive from AttributeError to correctly handle hasattr calls

Run CodeQL to identify vulnerabilities. This will run on any push or pull request to `master`, but also runs once every day in case some new vulnerabilities are discovered (or something else changes).

* CU-8694n493m: Add deprecation and removal versions to deprecation decorator * CU-8694n493m: Deprecation version to existing deprecated methods. Made the removal version 2 minor versions from the minor version in which the method was deprecated, or the next minor version if the method had been deprecated for longer. * CU-8694n4ff0: Raise exception upon deprecated method call at test time * CU-8694n4ff0: Fix usage of deprecated methods call during test time

…or model pack loading

This reverts commit 2737ced.

Changes to documentation for metacat

Fix issues with compute_class_weights JSON serialization and enforce fc2 usage when fc3 is enabled * Resolved an issue where compute_class_weights returns a NumPy array, causing an error when saving the configuration as JSON (since JSON does not support NumPy arrays). The fix ensures compatibility by converting the NumPy array to a JSON-serializable format. * Added a safeguard in the model_architecture_config for meta_cat_config. The current architecture assumes fc3 is only used when fc2 is enabled. If fc2 is set to False and fc3 is True, the model would fail due to a mismatch in hidden layer sizes. The fix automatically enables fc2 if fc3 is set to True, preventing potential errors.

Fixing bug for metacat

…us version (#465) * CU-86956duhb: Add method to backport a model pack from 1.12 to previous version * CU-86956duhb: Fix some doc string issues * CU-86956duhb: Add deprecation decorator to old config-fix * CU-86956duhb: Mark backporting method as deprecated and to be removed in 1.14

…#462) * CU-8694cd9t2: Allow merging config into model pack config before init

Use the loaded model hash for usage monitor instead of recalculating it

…tus (#479) Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk>

* CU-86956du3q: Move to placeholder-based replacement * CU-86956du3q: Update regression tests to a more reasonable state. Make sure to compare the correct annotation, not just hoping for any CUI annotated to match the one we are looking for. Output the specifics of the type of match that was found: - Identical - Bigger / smaller span - Random overlap - Parents / grandparetns, or children Add strictness options to summary (success / failure). * CU-86956du3q: Further fixes for regression checking: Remove 'Failure reason' and 'Failre descriptor' - now using Finding instead. Remove simplified success/failure metrics wherever relevant. Fix tests that relied on old logic and fix test-time replacement/cui location. * CU-86956du3q: Add documentation for new clases and methods * CU-86956du3q: Rename enum constant (SPAN_OVERLAP -> PARTIAL_OVERLAP) * CU-86956du3q: Add matching for partially overlapping children * CU-86956du3q: Add tests for partially overlapping children * CU-86956du3q: Update regression checking to generate multiple sub-cases for multiple placeholders * CU-86956du3q: Update some tests for new format * CU-86956du3q: Remove old / unused / irrelevant tests and test-code * CU-86956du3q: Some renaming (filter -> placeholders) * CU-86956du3q: Add some additional fail safes for option set * CU-86956du3q: Fix option set for only 1 placeholder * CU-86956du3q: Fix targeting * CU-86956du3q: Add tests for targeting * CU-86956du3q: Remove MCT export conversion (at least for now) * CU-86956du3q: Remove MCT export conversion tests (at least for now) * CU-86956du3q: Remove suite editing (at least for now) * CU-86956du3q: Remove category separation (at least for now) * CU-86956du3q: Remove unused regression utils (at least for now) * CU-86956du3q: Remove serialisation tests (at least for now) * CU-86956du3q: Improve quality of default regression test set * CU-86956du3q: Improve exceptions in targeting * CU-86956du3q: Fix docstring issue regarding exceptions * CU-86956du3q: Update test with correct exceptions * CU-86956du3q: Add utils for partial substitutions and corresponding tests * CU-86956du3q: Allow multiple of the same placeholder in a phrase. And more specifically, treat each one as their own sub-case * CU-86956du3q: Add relevant tests for multi-placeholder checking * CU-86956du3q: Allow changing of multiple pre-processing placeholders * CU-86956du3q: Fix 1-placeholder sub-case yielding * CU-86956du3q: Remove debug output * CU-86956du3q: Replace separator (~) with whitespace when checking * CU-86956du3q: Add utility method to limit string length for output * CU-86956du3q: Improve string length limiting method * CU-86956du3q: Add a few tests for string length limiting method * CU-86956du3q: Add an ANYTHING strictness (mostly for example disbaling) * CU-86956du3q: Add storage of examples (of a certain strictness) as well as relevant output * CU-86956du3q: Fix type (missing ending bracket) in report output * CU-86956du3q: Fix examples header appearing for every example * CU-86956du3q: Print the same phrase fewer times for examples * CU-86956du3q: Update fake CDB with (default) config * CU-86956du3q: Add finding to examples and output * CU-86956du3q: Add config to another fake CDB during test time * CU-86956du3q: Allow strictness to propagate to parts when looking at examples * CU-86956du3q: Add placeholder to examples output * CU-86956du3q: Refactor report output generation slightly * CU-86956du3q: Show all non-identical examples * CU-86956du3q: Update example checking with strictness requirement (instead of simple boolean) * CU-86956du3q: Simplify targeting somewhat (remove unnecessary method) * CU-86956du3q: Allow changing of ouptut phrase max length * CU-86956du3q: Fix doc string for changed method * CU-86956du3q: Small whitespace fix * CU-86956du3q: Fix total-included checking iteration * CU-86956du3q: Add strictness and max phrase length to CLI * CU-86956du3q: Add examople strictness to CLI * CU-86956du3q: Fix default value for strictness in CLI * CU-86956du3q: Update to use number of sub-cases for tqdm/progress bar * CU-86956du3q: Remove option to set the total for progress bar (the automated one works fine now) * CU-86956du3q: Simplify the progress bar by combining all cases * CU-86956du3q: Split subcase iteration * CU-86956du3q: Rename regression checker to regression suite * CU-86956du3q: Streamline typing and the like by using intermediate data classes * CU-86956du3q: Remove redundant method * CU-86956du3q: Remove redundant method and acommpanying test * CU-86956du3q: Remove redundant class * CU-86956du3q: Add another intermediate data class * CU-86956du3q: Remove completed TODO notes and redundant method * CU-86956du3q: Add documentation to new methods and clases. Simplify example keeping. * CU-86956du3q: Small update for how default test suite is handled for CLI * CU-86956du3q: Small to report output format * CU-86956du3q: Add easier to read exception when unable to load a placeholder * CU-86956du3q: Update percentages output to avoid as many decimal places * CU-86956du3q: Use preferred name for run-to-run consistency * CU-86956du3q: Update test time fake CDBs * CU-86956du3q: Update default regression tests with new extensive (yet simple) test case * CU-86956du3q: Add initial README for regression stuff * CU-86956du3q: Add option to for failing with having found another concept. Added other incorrect cui that was found (if applicable). Fixed issue with finding grandparents. * CU-86956du3q: Add tests for parent and grandparent finding; fix tests for new changes (with optionally found alternative CUI) * CU-86956du3q: Add preferred name to wrong CUI found * CU-86956du3q: Fix tests for new form of determine cui description; add test for exact span grandchild * CU-86956du3q: Fix determining partial matches for grandchildren and beyond * CU-86956du3q: Add test for partial matches of grandchildren * Fixing bug for metacat Fix issues with compute_class_weights JSON serialization and enforce fc2 usage when fc3 is enabled * Resolved an issue where compute_class_weights returns a NumPy array, causing an error when saving the configuration as JSON (since JSON does not support NumPy arrays). The fix ensures compatibility by converting the NumPy array to a JSON-serializable format. * Added a safeguard in the model_architecture_config for meta_cat_config. The current architecture assumes fc3 is only used when fc2 is enabled. If fc2 is set to False and fc3 is True, the model would fail due to a mismatch in hidden layer sizes. The fix automatically enables fc2 if fc3 is set to True, preventing potential errors. * CU-86956duhb: Add method to backport a model pack from 1.12 to previous version (#465) * CU-86956duhb: Add method to backport a model pack from 1.12 to previous version * CU-86956duhb: Fix some doc string issues * CU-86956duhb: Add deprecation decorator to old config-fix * CU-86956duhb: Mark backporting method as deprecated and to be removed in 1.14 * CU-8694cd9t2: Allow merging config into model pack config before init (#462) * CU-8694cd9t2: Allow merging config into model pack config before init * CU-8694fwyje: Update all configs with pre-load parts documented (#473) * CU-86956du3q: Add converter from MCT export * CU-86956du3q: Add documentation to MCT export converter * CU-86956du3q: Add option to create a regression suite from an MCT export * CU-86956du3q: Add option to create a regression suite from an MCT export to CLI * CU-86956du3q: Add a small note for converter placeholder * CU-86956du3q: Add tests for MedCATtrainer export converter * CU-86956du3q: Add tests for regression suite generation based on MCT export * CU-86956du3q: Simplify regression case creation tests somewhat * CU-86956du3q: Add option to create a regression suite YAML from MCT export * CU-86956du3q: Add option to stop at MCT export conversion * CU-86956du3q: Make use of only-prefnames option * CU-86956du3q: Fix loading of only-prefnames option from yaml * CU-86956du3q: Add comment for only using preferred names to the default regression suite yaml * CU-86956du3q: Fix tests broken due to pref-name only change * CU-86956du3q: Add utility method to set runtime doc strings for enum constants * CU-86956du3q: Add tests for runtime doc string addition * CU-86956du3q: Add more tests for runtime doc string addition (to make sure it fails without the change) * CU-86956du3q: Make Finding enum has runtime doc strings * CU-86956du3q: Add CLI option to show the various descriptions of the finding types (--only-describe) * CU-86956du3q: Update dict and json methods for some results for JSON serialisation * CU-86956du3q: Add a few json serialisation tests * CU-86956du3q: Add json serialisation example strictness to CLI * CU-86956du3q: Add a few more json serialisation tests * CU-86956du3q: Add usage of regression suite name from the name of the file being read * CU-86956du3q: Fix tests by adding the regression suite name where applicable * CU-86956du3q: Avoid examples in ResultDescriptor * CU-86956du3q: Make sure strictness propagates accross all parts of a multi-result descriptor * CU-86956du3q: Update tests: Use correct reporting for generating fake reports * CU-86956du3q: Fix small test issue * CU-86956du3q: Update tests for manual success/fail for results * CU-86956du3q: Separate calculation section of report finding * CU-86956du3q: Add a few more tests for report/results * CU-86956du3q: Add option to force a non-0 exit status upon any regression test failure * CU-86956du3q: Add files for regression model creation and checking * CU-86956du3q: Add new part to main workflow to create and regression check a simple model pack * CU-86956du3q: Update a mistyped comment * CU-86956du3q: Make regression run at STRICTEST strictness at GHA workflow time * CU-86956du3q: Fix strictness matrix for anything-typed strictness * CU-86956du3q: Add strictness matrix information to --describe-only * CU-86956du3q: Add python version to created model pack for test time * CU-86956du3q: Use the python version of creat model pack during test time to avoid conflicts with other python versions running in parallel * CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking * Revert "CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking" This reverts commit 4bf3089. * CU-86956du3q: Make full model path the last line of the output upon creation model for regression * CU-86956du3q: Move regression workflow logic to a separate bash script * CU-86956du3q: Update comments in regression bash script * CU-8694pz44d: Fix model cleanup during regression * CU-86956du3q: Fix typos in utils * CU-86956du3q: Fix a bunch of various typos in doc strings and comments --------- Co-authored-by: shubham-s-agarwal <66172189+shubham-s-agarwal@users.noreply.github.com>

* CU-8695j1be2: Remove deprecated method on CDB * CU-8695j1be2: Remove unused import due to removal of deprecated method

shubham-s-agarwal and others added 30 commits April 19, 2024 11:53

Pushing changes for bert-style models for MetaCAT

4d36f8a

Pushing fix for LSTM

da9ab06

Pushing changes for flake8 and type fixes

cb65fc3

Pushing type fixes

869eeae

Fixing type issue

3e02eed

Pushing changes

c899c9c

1) Added model.zero_grad to clear accumulated gradients 2) Fixed config save issue 3) Re-structured data preparation for oversampled data

Pushing change and type fixes

d1321b8

Pushing ml_utils file which was missed in the last commit

Fixing flake8 issues

9091d9b

Pushing flake8 fixes

c57dcfe

Pushing fixes for flake8

364fdd4

Pushing flake8 fix

7272168

Adding peft to list of libraries

619c565

Pushing changes with load and train workflow and type fixes

2a546c3

The workflow for inference is: load() and inference For training: init() and train() Train will always not load the model dict, except when the phase_number is set to 2 for 2 phase learning's second phase

Pushing changes with type hints and new documentation

8efd2a9

Pushing type fix

aa5044e

Fixing type issue

fcdc867

Adding test case for BERT and reverting config changes

88ee8e7

BERT test cases: Testing for BERT model along with 2 phase learning

Merge branch 'master' into metacat_bert

563c3d4

Pushing changed tests and removing empty change

decfbfb

Pushing change for logging

fbcdb70

Revert "Pushing change for logging"

2657515

This reverts commit fbcdb70.

Merge pull request #419 from CogStack/metacat_bert

fbe9745

Adding Bert-style model for MetaCAT

CU-8694gza88: Create codeql.yml (#434)

8e7c77b

Run CodeQL to identify vulnerabilities. This will run on any push or pull request to `master`, but also runs once every day in case some new vulnerabilities are discovered (or something else changes).

CU-8694mbn03: Remove the web app (#441)

0c8f5a8

CU-8694pey4u: extract cdb load to cls method, to be used in trainer f…

9d6a4e0

…or model pack loading

CU-8694pey4u: extract meta cat loading also to a cls method

61b5979

shubham-s-agarwal and others added 29 commits August 6, 2024 10:45

Update config_meta_cat.py

1e087ef

Update config_meta_cat.py

5969e11

Pushing formatting changes

2c20864

Update meta_cat.py

7b01c1a

Update meta_cat.py

4b3c024

Update meta_cat.py

2541cae

Update meta_cat.py

9eb6376

Update meta_cat.py

2737ced

Revert "Update meta_cat.py"

f3289e1

This reverts commit 2737ced.

Update meta_cat.py

18fa925

Update config_meta_cat.py

9a9ca71

Update config_meta_cat.py

9e52002

Update config_meta_cat.py

71dfbae

Update config_meta_cat.py

d6a3bab

Update config_meta_cat.py

4d73b1a

Fixing flake8 issues

5328cba

Update config_meta_cat.py

289e68d

Merge pull request #472 from CogStack/metacat_documentation_upd

33e32fd

Changes to documentation for metacat

Merge pull request #474 from CogStack/metacat_bug_resolve

b7658ee

Fixing bug for metacat

CU-8694cd9t2: Allow merging config into model pack config before init (…

76c2fa2

…#462) * CU-8694cd9t2: Allow merging config into model pack config before init

CU-8694fwyje: Update all configs with pre-load parts documented (#473)

c82ad4b

Use the loaded model hash for usage monitor instead of recalculating it

62e603a

Merge pull request #477 from CogStack/usageMonitorHashRecalcFix

c907c97

Use the loaded model hash for usage monitor instead of recalculating it

fixed issue where the key name has not been declared in name2cuis2sta…

209c5e4

…tus (#479) Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk>

CU-8695hydt9: Fix various typos (#480)

6d1247a

CU-8695j1be2: Remove deprecated method on CDB (#481)

540224c

* CU-8695j1be2: Remove deprecated method on CDB * CU-8695j1be2: Remove unused import due to removal of deprecated method

mart-r merged commit 34e5cde into production Aug 28, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13.0 release PR #482

v1.13.0 release PR #482

mart-r commented Aug 28, 2024

v1.13.0 release PR #482

v1.13.0 release PR #482

Conversation

mart-r commented Aug 28, 2024