Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13.0 release PR #482

Merged
merged 100 commits into from
Aug 28, 2024
Merged

v1.13.0 release PR #482

merged 100 commits into from
Aug 28, 2024

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Aug 28, 2024

Preparing for next minor release.

shubham-s-agarwal and others added 30 commits April 19, 2024 11:53
1) Added model.zero_grad to clear accumulated gradients
2) Fixed config save issue
3) Re-structured data preparation for oversampled data
Pushing ml_utils file which was missed in the last commit
The workflow for inference is: load() and inference
For training: init() and train()
Train will always not load the model dict, except when the phase_number is set to 2 for 2 phase learning's second phase
BERT test cases: Testing for BERT model along with 2 phase learning
* Small addition to contribution guidelines (#420)

* CU-8694cbcpu: Allow specifying an AU Snomed when preprocessing (#421)

* CU-8694dpy1c: Return empty generator upon empty stream (#423)

* CU-8694dpy1c: Return empty generator upon empty stream

* CU-8694dpy1c: Fix empty generator returns

* CU-8694dpy1c: Simplify empty generator returns

* Relation extraction (#173)

* Added files.

* More additions to rel extraction.

* Rel base.

* Update.

* Updates.

* Dependency parsing.

* Updates.

* Added pre-training steps.

* Added training & model utils.

* Cleanup & fixes.

* Update.

* Evaluation updates for pretraining.

* Removed duplicate relation storage.

* Moved RE model file location.

* Structure revisions.

* Added custom config for RE.

* Implemented custom dataset loader for RE.

* More changes.

* Small fix.

* Latest additions to RelCAT (pipe + predictions)

* Setup.py fix.

* RE utils update.

* rel model update.

* rel dataset + tokenizer improvements.

* RelCAT updates.

* RelCAT saving/loading improvements.

* RelCAT saving/loading improvements.

* RelCAT model fixes.

* Attempted gpu learning fix. Dataset label generation fixes.

* Minor train dataset gen fix.

* Minor train dataset gen fix No.2.

* Config updates.

* Gpu support fixes. Added label stats.

* Evaluation stat fixes.

* Cleaned stat output mode during training.

* Build fix.

* removed unused dependencies and fixed code formatting

* Mypy compliance.

* Fixed linting.

* More Gpu mode train fixes.

* Fixed model saving/loading issues when using other baes models.

* More fixes to stat evaluation. Added proper CAT integration of RelCAT.

* Setup.py typo fix.

* RelCAT loading fix.

* RelCAT Config changes.

* Type fix. Minor additions to RelCAT model.

* Type fixes.

* Type corrections.

* RelCAT update.

* Type fixes.

* Fixed type issue.

* RelCATConfig: added seed param.

* Adaptations to the new codebase + type fixes..

* Doc/type fixes.

* Fixed input size issue for model.

* Fixed issue(s) with model size and config.

* RelCAT: updated configs to new style.

* RelCAT: removed old refs to logging.

* Fixed GPU training + added extra stat print for train set.

* Type fixes.

* Updated dev requirements.

* Linting.

* Fixed pin_memory issue when training on CPU.

* Updated RelCAT dataset get + default config.

* Updated RelDS generator + default config

* Linting.

* Updated RelDatset + config.

* Pushing updates to model

Made changes to:
1) Extracting given number of context tokens left and right of the entities
2) Extracting hidden state from bert for all the tokens of the entities and performing max pooling on them

* Fixing formatting

* Update rel_dataset.py

* Update rel_dataset.py

* Update rel_dataset.py

* RelCAT: added test resource files.

* RelCAT: Fixed model load/checkpointing.

* RelCAT: updated to pipe spacy doc call.

* RelCAT: added tests.

* Fixed lint/type issues & added rel tag to test DS.

* Fixed ann id to token issue.

* RelCAT: updated test dataset + tests.

* RelCAT: updates to requested changes + dataset improvements.

* RelCAT: updated docs/logs according to commends.

* RelCAT: type fix.

* RelCAT: mct export dataset updates.

* RelCAT: test updates + requested changes p2.

* RelCAT: log for MCT export train.

* Updated docs + split train_test & dataset for benchmarks.

* type fixes.

---------

Co-authored-by: Shubham Agarwal <66172189+shubham-s-agarwal@users.noreply.github.com>
Co-authored-by: mart-r <mart.ratas@gmail.com>

* CU-8694fae3r: Avoid publishing PyPI release when doing GH pre-releases (#424)

* CU-8694fae3r: Avoid publishing PyPI release when doing GH pre-releases

* CU-8694fae3r: Fix pre-releases tagging

* CU-8694fae3r: Allow actions to run on release edit

---------

Co-authored-by: Mart Ratas <mart.ratas@gmail.com>
Co-authored-by: Vlad Dinu <62345326+vladd-bit@users.noreply.github.com>
Adding Bert-style model for MetaCAT
#433)

* CU-8694hukwm: Document the materialising of generator when multiprocessing and batching for docs

* CU-8694hukwm: Add TODO note for where the generator is materialised

* CU-8694hukwm: Add warning from large amounts of generator data (10k items) is materialised by the docs size mp method
* CU-8694fk90r: Move backwards compatibility method from CDB to config utils

* CU-8694fk90r: Move weighted_average_function from config to CDB; create necessary backwards compatibility workarounds

* CU-8694fk90r: Move usage of weighted_average_function in tests

* CU-8694fk90r: Add JSON encode and decoder for re.Pattern

* CU-8694fk90r: Rebuild custom decoder if needed

* CU-8694fk90r: Add method to detect old style config

* CU-8694fk90r: Use regular json serialisation for config; Retain option to read old jsonpickled config

* CU-8694fk90r: Add test for config serialisation

* CU-8694fk90r: Make sure to fix weighted_average_function upon setting it

* CU-8694fk90t: Add missing tests for config utils

* CU-8694fk90t: Add tests for better raised exception upon old way of using weighted_average_function

* CU-8694fk90t: Fix exception type in an added test

* CU-8694fk90t: Add further tests for exception payload

* CU-8694fk90t: Add improved exceptions when using old/unsupported value of weighted_average_function in config

* CU-8694fk90t: Add typing fix exceptions

* CU-8694fk90t: Make custom exception derive from AttributeError to correctly handle hasattr calls
Run CodeQL to identify vulnerabilities.
This will run on any push or pull request to `master`, but also runs once every day in case some new vulnerabilities are discovered (or something else changes).
* CU-8694n493m: Add deprecation and removal versions to deprecation decorator

* CU-8694n493m: Deprecation version to existing deprecated methods.

Made the removal version 2 minor versions from the minor version
in which the method was deprecated, or the next minor version if
the method had been deprecated for longer.

* CU-8694n4ff0: Raise exception upon deprecated method call at test time

* CU-8694n4ff0: Fix usage of deprecated methods call during test time
shubham-s-agarwal and others added 29 commits August 6, 2024 10:45
This reverts commit 2737ced.
Fix issues with compute_class_weights JSON serialization and enforce fc2 usage when fc3 is enabled

* Resolved an issue where compute_class_weights returns a NumPy array, causing an error when saving the configuration as JSON (since JSON does not support NumPy arrays). The fix ensures compatibility by converting the NumPy array to a JSON-serializable format.

* Added a safeguard in the model_architecture_config for meta_cat_config. The current architecture assumes fc3 is only used when fc2 is enabled. If fc2 is set to False and fc3 is True, the model would fail due to a mismatch in hidden layer sizes. The fix automatically enables fc2 if fc3 is set to True, preventing potential errors.
…us version (#465)

* CU-86956duhb: Add method to backport a model pack from 1.12 to previous version

* CU-86956duhb: Fix some doc string issues

* CU-86956duhb: Add deprecation decorator to old config-fix

* CU-86956duhb: Mark backporting method as deprecated and to be removed in 1.14
…#462)

* CU-8694cd9t2: Allow merging config into model pack config before init
Use the loaded model hash for usage monitor instead of recalculating it
…tus (#479)

Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk>
* CU-86956du3q: Move to placeholder-based replacement

* CU-86956du3q: Update regression tests to a more reasonable state.

Make sure to compare the correct annotation, not just hoping for any CUI annotated to match the one we are looking for.
Output the specifics of the type of match that was found:
 - Identical
 - Bigger / smaller span
 - Random overlap
 - Parents / grandparetns, or children
Add strictness options to summary (success / failure).

* CU-86956du3q: Further fixes for regression checking:

Remove 'Failure reason' and 'Failre descriptor' - now using Finding instead.
Remove simplified success/failure metrics wherever relevant.
Fix tests that relied on old logic and fix test-time replacement/cui location.

* CU-86956du3q: Add documentation for new clases and methods

* CU-86956du3q: Rename enum constant (SPAN_OVERLAP -> PARTIAL_OVERLAP)

* CU-86956du3q: Add matching for partially overlapping children

* CU-86956du3q: Add tests for partially overlapping children

* CU-86956du3q: Update regression checking to generate multiple sub-cases for multiple placeholders

* CU-86956du3q: Update some tests for new format

* CU-86956du3q: Remove old / unused / irrelevant tests and test-code

* CU-86956du3q: Some renaming (filter -> placeholders)

* CU-86956du3q: Add some additional fail safes for option set

* CU-86956du3q: Fix option set for only 1 placeholder

* CU-86956du3q: Fix targeting

* CU-86956du3q: Add tests for targeting

* CU-86956du3q: Remove MCT export conversion (at least for now)

* CU-86956du3q: Remove MCT export conversion tests (at least for now)

* CU-86956du3q: Remove suite editing (at least for now)

* CU-86956du3q: Remove category separation (at least for now)

* CU-86956du3q: Remove unused regression utils (at least for now)

* CU-86956du3q: Remove serialisation tests (at least for now)

* CU-86956du3q: Improve quality of default regression test set

* CU-86956du3q: Improve exceptions in targeting

* CU-86956du3q: Fix docstring issue regarding exceptions

* CU-86956du3q: Update test with correct exceptions

* CU-86956du3q: Add utils for partial substitutions and corresponding tests

* CU-86956du3q: Allow multiple of the same placeholder in a phrase.

And more specifically, treat each one as their own sub-case

* CU-86956du3q: Add relevant tests for multi-placeholder checking

* CU-86956du3q: Allow changing of multiple pre-processing placeholders

* CU-86956du3q: Fix 1-placeholder sub-case yielding

* CU-86956du3q: Remove debug output

* CU-86956du3q: Replace separator (~) with whitespace when checking

* CU-86956du3q: Add utility method to limit string length for output

* CU-86956du3q: Improve string length limiting method

* CU-86956du3q: Add a few tests for string length limiting method

* CU-86956du3q: Add an ANYTHING strictness (mostly for example disbaling)

* CU-86956du3q: Add storage of examples (of a certain strictness) as well as relevant output

* CU-86956du3q: Fix type (missing ending bracket) in report output

* CU-86956du3q: Fix examples header appearing for every example

* CU-86956du3q: Print the same phrase fewer times for examples

* CU-86956du3q: Update fake CDB with (default) config

* CU-86956du3q: Add finding to examples and output

* CU-86956du3q: Add config to another fake CDB during test time

* CU-86956du3q: Allow strictness to propagate to parts when looking at examples

* CU-86956du3q: Add placeholder to examples output

* CU-86956du3q: Refactor report output generation slightly

* CU-86956du3q: Show all non-identical examples

* CU-86956du3q: Update example checking with strictness requirement (instead of simple boolean)

* CU-86956du3q: Simplify targeting somewhat (remove unnecessary method)

* CU-86956du3q: Allow changing of ouptut phrase max length

* CU-86956du3q: Fix doc string for changed method

* CU-86956du3q: Small whitespace fix

* CU-86956du3q: Fix total-included checking iteration

* CU-86956du3q: Add strictness and max phrase length to CLI

* CU-86956du3q: Add examople strictness to CLI

* CU-86956du3q: Fix default value for strictness in CLI

* CU-86956du3q: Update to use number of sub-cases for tqdm/progress bar

* CU-86956du3q: Remove option to set the total for progress bar (the automated one works fine now)

* CU-86956du3q: Simplify the progress bar by combining all cases

* CU-86956du3q: Split subcase iteration

* CU-86956du3q: Rename regression checker to regression suite

* CU-86956du3q: Streamline typing and the like by using intermediate data classes

* CU-86956du3q: Remove redundant method

* CU-86956du3q: Remove redundant method and acommpanying test

* CU-86956du3q: Remove redundant class

* CU-86956du3q: Add another intermediate data class

* CU-86956du3q: Remove completed TODO notes and redundant method

* CU-86956du3q: Add documentation to new methods and clases. Simplify example keeping.

* CU-86956du3q: Small update for how default test suite is handled for CLI

* CU-86956du3q: Small to report output format

* CU-86956du3q: Add easier to read exception when unable to load a placeholder

* CU-86956du3q: Update percentages output to avoid as many decimal places

* CU-86956du3q: Use preferred name for run-to-run consistency

* CU-86956du3q: Update test time fake CDBs

* CU-86956du3q: Update default regression tests with new extensive (yet simple) test case

* CU-86956du3q: Add initial README for regression stuff

* CU-86956du3q: Add option to for failing with having found another concept.

Added other incorrect cui that was found (if applicable).
Fixed issue with finding grandparents.

* CU-86956du3q: Add tests for parent and grandparent finding; fix tests for new changes (with optionally found alternative CUI)

* CU-86956du3q: Add preferred name to wrong CUI found

* CU-86956du3q: Fix tests for new form of determine cui description; add test for exact span grandchild

* CU-86956du3q: Fix determining partial matches for grandchildren and beyond

* CU-86956du3q: Add test for partial matches of grandchildren

* Fixing bug for metacat

Fix issues with compute_class_weights JSON serialization and enforce fc2 usage when fc3 is enabled

* Resolved an issue where compute_class_weights returns a NumPy array, causing an error when saving the configuration as JSON (since JSON does not support NumPy arrays). The fix ensures compatibility by converting the NumPy array to a JSON-serializable format.

* Added a safeguard in the model_architecture_config for meta_cat_config. The current architecture assumes fc3 is only used when fc2 is enabled. If fc2 is set to False and fc3 is True, the model would fail due to a mismatch in hidden layer sizes. The fix automatically enables fc2 if fc3 is set to True, preventing potential errors.

* CU-86956duhb: Add method to backport a model pack from 1.12 to previous version (#465)

* CU-86956duhb: Add method to backport a model pack from 1.12 to previous version

* CU-86956duhb: Fix some doc string issues

* CU-86956duhb: Add deprecation decorator to old config-fix

* CU-86956duhb: Mark backporting method as deprecated and to be removed in 1.14

* CU-8694cd9t2: Allow merging config into model pack config before init (#462)

* CU-8694cd9t2: Allow merging config into model pack config before init

* CU-8694fwyje: Update all configs with pre-load parts documented (#473)

* CU-86956du3q: Add converter from MCT export

* CU-86956du3q: Add documentation to MCT export converter

* CU-86956du3q: Add option to create a regression suite from an MCT export

* CU-86956du3q: Add option to create a regression suite from an MCT export to CLI

* CU-86956du3q: Add a small note for converter placeholder

* CU-86956du3q: Add tests for MedCATtrainer export converter

* CU-86956du3q: Add tests for regression suite generation based on MCT export

* CU-86956du3q: Simplify regression case creation tests somewhat

* CU-86956du3q: Add option to create a regression suite YAML from MCT export

* CU-86956du3q: Add option to stop at MCT export conversion

* CU-86956du3q: Make use of only-prefnames option

* CU-86956du3q: Fix loading of only-prefnames option from yaml

* CU-86956du3q: Add comment for only using preferred names to the default regression suite yaml

* CU-86956du3q: Fix tests broken due to pref-name only change

* CU-86956du3q: Add utility method to set runtime doc strings for enum constants

* CU-86956du3q: Add tests for runtime doc string addition

* CU-86956du3q: Add more tests for runtime doc string addition (to make sure it fails without the change)

* CU-86956du3q: Make Finding enum has runtime doc strings

* CU-86956du3q: Add CLI option to show the various descriptions of the finding types (--only-describe)

* CU-86956du3q: Update dict and json methods for some results for JSON serialisation

* CU-86956du3q: Add a few json serialisation tests

* CU-86956du3q: Add json serialisation example strictness to CLI

* CU-86956du3q: Add a few more json serialisation tests

* CU-86956du3q: Add usage of regression suite name from the name of the file being read

* CU-86956du3q: Fix tests by adding the regression suite name where applicable

* CU-86956du3q: Avoid examples in ResultDescriptor

* CU-86956du3q: Make sure strictness propagates accross all parts of a multi-result descriptor

* CU-86956du3q: Update tests: Use correct reporting for generating fake reports

* CU-86956du3q: Fix small test issue

* CU-86956du3q: Update tests for manual success/fail for results

* CU-86956du3q: Separate calculation section of report finding

* CU-86956du3q: Add a few more tests for report/results

* CU-86956du3q: Add option to force a non-0 exit status upon any regression test failure

* CU-86956du3q: Add files for regression model creation and checking

* CU-86956du3q: Add new part to main workflow to create and regression check a simple model pack

* CU-86956du3q: Update a mistyped comment

* CU-86956du3q: Make regression run at STRICTEST strictness at GHA workflow time

* CU-86956du3q: Fix strictness matrix for anything-typed strictness

* CU-86956du3q: Add strictness matrix information to --describe-only

* CU-86956du3q: Add python version to created model pack for test time

* CU-86956du3q: Use the python version of creat model pack during test time to avoid conflicts with other python versions running in parallel

* CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking

* Revert "CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking"

This reverts commit 4bf3089.

* CU-86956du3q: Make full model path the last line of the output upon creation model for regression

* CU-86956du3q: Move regression workflow logic to a separate bash script

* CU-86956du3q: Update comments in regression bash script

* CU-8694pz44d: Fix model cleanup during regression

* CU-86956du3q: Fix typos in utils

* CU-86956du3q: Fix a bunch of various typos in doc strings and comments

---------

Co-authored-by: shubham-s-agarwal <66172189+shubham-s-agarwal@users.noreply.github.com>
* CU-8695j1be2: Remove deprecated method on CDB

* CU-8695j1be2: Remove unused import due to removal of deprecated method
@mart-r mart-r merged commit 34e5cde into production Aug 28, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants