-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CU-8693bpq82 fallback spacy model #384
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tomolopolis
approved these changes
Dec 21, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mart-r
added a commit
that referenced
this pull request
Jan 8, 2024
* Bump urllib3 from 1.26.5 to 1.26.17 in /webapp/webapp Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.5 to 1.26.17. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.5...1.26.17) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Cu 8692wbcq5 docs builds (#359) * CU-8692wbcq5: Pin max version of numpy * CU-8692wbcq5: Pin max version of numpy in setup.py * CU-8692wbcq5: Bump python version for readthedocs workflow * CU-8692wbcq5: Pin all requirement versions in docs requirements * CU-8692wbcq5: Move docs requirements before setuptools * CU-8692wbcq5: Fix typo in docs requirements * CU-8692wbcq5: Remove some less relevant stuff from docs requirements * CU-8692wbcq5: Add back sphinx-based requirements to docs requirements * CU-8692wbcq5: Move back to python 3.9 on docs build workflow * CU-8692wbcq5: Bump sphinx-autoapi version * CU-8692wbcq5: Bump sphinx version * CU-8692wbcq5: Bump python version back to 3.10 for future-proofing * CU-8692wbcq5: Undo pinning numpy to max version in setup.py * CU-8692wbcq5: Remove docs-build specific dependencies in setup.py * Bump urllib3 from 1.26.17 to 1.26.18 in /webapp/webapp Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.17 to 1.26.18. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.17...1.26.18) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * CU-8692uznvd: Allow empty-dict config.linking.filters.cuis and convert to set in memory (#352) * CU-8692uznvd: Allow empty-dict config.linking.filters.cuis and convert to set in memory * CU-8692uznvd: Move the empty-set detection and conversion from validator to init * CU-8692uznvd: Remove unused import * CU-8692t3fdf separate config on save (#350) * CU-8692t3fdf Move saving config outside of the cdb.dat; Add test to make sure the config does not get saved with the CDB; patch a few existing tests * CU-8692t3fdf Use class methods on class instead of instance in a few tests * CU-8692t3fdf Fix typing issue * CU-8692t3fdf Add additional tests for 2 configs and zero configs when loading model pack * CU-8692t3fdf: Make sure CDB is linked to the correct config; Treat incorrect configs as dirty CDBs and force a recalc of the hash * CU-2cdpd4t: Unify default addl_info in different methdos. (#363) * Bump django from 3.2.20 to 3.2.23 in /webapp/webapp Bumps [django](https://github.com/django/django) from 3.2.20 to 3.2.23. - [Commits](django/django@3.2.20...3.2.23) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Changing cdb.add_concept to a protected method * Re-added deprecated method with deprecated flag and addtional comments * Initial commit for merge_cdb method * Added indentation to make merge_cdb a class method * fixed syntax issues * more lint fixes * more lint fixes * bug fixes of merge_cdb * removed print statements * CU-86931prq4: Update GHA versions (checkout and setup-python) to v4 (#368) * Cu 1yn0v9e duplicate multiprocessing methods (#364) * CU-1yn0v9e: Rename and deprecate one of the multiprocessing methods; Add docstring. Trying to be more explicit regarding usage and differences between different methods * CU-1yn0v9e: Rename and deprecate the multiprocessing_pipe method; Add docstring. Trying to be more explicit regarding usage and differences between different methods * CU-1yn0v9e: Fix typo in docstring; more consistent naming * 869377m3u: Add comment regarding demo link load times to README (#376) * intermediate changes of merge_cdb and testing function * Added README.md documentation for CPU only installations (#365) * changed README.md to reflect installation options. * added setup script to demonstrate how wrapper could look for CPU installations * removed setup.sh as unnessescary for cpu only builds * Initial commit for merge_cdb method * Added indentation to make merge_cdb a class method * fixed syntax issues * more lint fixes * more lint fixes * bug fixes of merge_cdb * removed print statements * Added commentary on disk space usage of pytorch-gpu * removed merge_cdb from branch --------- Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk> * Cu 8692zguyq no preferred name (#367) * CU-8692zguyq: Slight simplification of minimum-name-length logic * CU-8692zguyq: Add some tests for prepare_name preprocessor * CU-8692zguyq: Add warning if no preferred name was added along a new CUI * CU-8692zguyq: Add additional warning messages when adding/training a new CUI with no preferred name * CU-8692zguyq: Make no preferred name warnings only run if name status is preferred * CU-8692zguyq: Add tests for no-preferred name warnings * CU-8692zguyq: Add Vocab.make_unigram_table to CAT tests * CU-8692zguyq: Move to built in asserting for logging instead of patching the method * CU-8692zguyq: Add workaround for assertNoLogs on python 3.8 and 3.9 * Add trainer callbacks for Transformer NER (#377) CU-86938vf30 add trainer callbacks for Transformer NER * changes to merge_cdb and adding unit tests for method * fixing lint issues * fixing flake8 linting * bug fixes, additional tests, and more documentation * moved set up of cdbs to be merged to tests.helper * moved merge_cdb to utils and created test_cdb_utils * removed class wrapper in cdb utils and fixed class set up in tests * changed test object setup to class setup * removed erroneous new line * CU-2e77a31 improve print stats (#366) * Add base class for CAT * Add CDB base class * Some whitespace fixes for base modules * CU-2e77a31: Move print stats to their own module and class * CU-2e77a31: Fix issues introduced by moving print stats * CU-2e77a31: Rename print_stats to get_stats and add option to avoid printed output * CU-2e77a31: Add test for print_stats * CU-2e77a31: Remove unused import * CU-2e77a31: Add new package to setup.py * CU-2e77a31: Fix a bunch of typing issues * CU-2e77a31: Revert CAT and CDB abstraction * Load stopwords in Defaults before spacy model * CU-8693az82g Remove cdb tests side effects (#380) * 8693az82g: Add method to CDBMaker to reset the CDB * 8693az82g: Add test in CDB tests to ensure a new CDB is used for each test * 8693az82g: Reset CDB in CDB tests before each test to avoid side effects * Added tests * CU-8693bpq82 fallback spacy model (#384) * CU-8693bpq82: Add fallback spacy model along with test * CU-8693bpq82: Remove debug output * CU-8693bpq82: Add exception info to warning upon spacy model load failure and fallback * Remove tests of internals where possible * Add test for skipping of stopwords * Avoid supporting only English for stopwords * Remove debug output * Make sure stopwords language getter works for file-path spacy models * CU-8693cv3w0 Fix fallback spacy model existance on pip installs (#386) * CU-8693cv3w0: Add method to ensure spacy model and use it when falling back to default model * CU-8693cv3w0: Add logged output when installing/downloading spacy model * CU-8693b0a61 Add method to get spacy model version (#381) * CU-8693b0a61: Add method to find spacy folder in model pack along with some tests * CU-8693b0a61: Add test for spacy folder finding (full path) * CU-8693b0a61: Add method for finding spacy model in model pack along with tests * CU-8693b0a61: Add method for finding current spacy version * CU-8693b0a61: Add method for getting spacy model version installed * CU-8693b0a61: Fix getting spacy model folder return path * CU-8693b0a61: Add method to get name and meta of spacy model based on model pack * CU-8693b0a61: Add missing fake spacy model meta * CU-8693b0a61: Add missing docstrings * CU-8693b0a61: Change name of method for clarity * CU-8693b0a61: Add method to get spacy model name and version from model pack path * CU-8693b0a61: Fix a few typing issues * CU-8693b0a61: Add a missing docstring * CU-8693b0a61: Match folder name of fake spacy model to its name * CU-8693b0a61: Make the final method return true name of spacy model instead of folder name * Add additional output to method for getting spacy model version - the compatible spacy versions * CU-8693b0a61: Add method for querying whether the spacy version is compatible with a range * CU-8693b0a61: Add better abstraction for spacy version mocking in tests * CU-8693b0a61: Add some more abstraction for fake model pack in tests * CU-8693b0a61: Add method for checking whethera model pack has a spacy model compatible with installed spacy version * CU-8693b0a61: Improve abstraction within tests * CU-8693b0a61: Add method to check which of two versions is older * CU-8693b0a61: Fix fake spacy model versioning * CU-8693b0a61: Add method for determining whether a model pack has semi-compatible spacy model * CU-8693b0a61: Add missing word in docstring. * CU-8693b0a61: Change some method to protected ones --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: tomolopolis <tsearle88@gmail.com> Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk> Co-authored-by: adam-sutton-1992 <60137864+adam-sutton-1992@users.noreply.github.com> Co-authored-by: Xi Bai <82581439+baixiac@users.noreply.github.com> Co-authored-by: jenniferajiang <jennigato95@gmail.com> Co-authored-by: Jennifer Jiang <37081323+jenniferajiang@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There can be issues when a model doesn't ship with a working
spacy
model. So far, in those cases, thePipe
creation fails.What this PR does is allow a fallback to the per-requirement installed
en_core_web_md
spacy model if there's an issue loading the spacy model that is shipped with the model pack. The issue could be that the sapcy model wasn't packed with the model or that the spacy model isn't compatible with the currently installedspacy
version.Along with using a fallback, the PR also generates a warning for the library user to know that they'd be better off shipping with a working spacy model.