-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cu 1yn0v9e duplicate multiprocessing methods #364
Conversation
Add docstring. Trying to be more explicit regarding usage and differences between different methods
Add docstring. Trying to be more explicit regarding usage and differences between different methods
medcat/cat.py
Outdated
"""Run multiprocessing NOT FOR TRAINING | ||
"""Run multiprocessing NOT FOR TRAINING. | ||
|
||
Thios method batches the data based on the number of documents as specified by the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
medcat/cat.py
Outdated
return_dict=return_dict, | ||
batch_factor=batch_factor) | ||
|
||
def multiprocessing_batch_nr_of_docs(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to multiprocessing_batch_docs_size
for consistency with above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor changes - will approve once fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Bump urllib3 from 1.26.5 to 1.26.17 in /webapp/webapp Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.5 to 1.26.17. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.5...1.26.17) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Cu 8692wbcq5 docs builds (#359) * CU-8692wbcq5: Pin max version of numpy * CU-8692wbcq5: Pin max version of numpy in setup.py * CU-8692wbcq5: Bump python version for readthedocs workflow * CU-8692wbcq5: Pin all requirement versions in docs requirements * CU-8692wbcq5: Move docs requirements before setuptools * CU-8692wbcq5: Fix typo in docs requirements * CU-8692wbcq5: Remove some less relevant stuff from docs requirements * CU-8692wbcq5: Add back sphinx-based requirements to docs requirements * CU-8692wbcq5: Move back to python 3.9 on docs build workflow * CU-8692wbcq5: Bump sphinx-autoapi version * CU-8692wbcq5: Bump sphinx version * CU-8692wbcq5: Bump python version back to 3.10 for future-proofing * CU-8692wbcq5: Undo pinning numpy to max version in setup.py * CU-8692wbcq5: Remove docs-build specific dependencies in setup.py * Bump urllib3 from 1.26.17 to 1.26.18 in /webapp/webapp Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.17 to 1.26.18. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.17...1.26.18) --- updated-dependencies: - dependency-name: urllib3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * CU-8692uznvd: Allow empty-dict config.linking.filters.cuis and convert to set in memory (#352) * CU-8692uznvd: Allow empty-dict config.linking.filters.cuis and convert to set in memory * CU-8692uznvd: Move the empty-set detection and conversion from validator to init * CU-8692uznvd: Remove unused import * CU-8692t3fdf separate config on save (#350) * CU-8692t3fdf Move saving config outside of the cdb.dat; Add test to make sure the config does not get saved with the CDB; patch a few existing tests * CU-8692t3fdf Use class methods on class instead of instance in a few tests * CU-8692t3fdf Fix typing issue * CU-8692t3fdf Add additional tests for 2 configs and zero configs when loading model pack * CU-8692t3fdf: Make sure CDB is linked to the correct config; Treat incorrect configs as dirty CDBs and force a recalc of the hash * CU-2cdpd4t: Unify default addl_info in different methdos. (#363) * Bump django from 3.2.20 to 3.2.23 in /webapp/webapp Bumps [django](https://github.com/django/django) from 3.2.20 to 3.2.23. - [Commits](django/django@3.2.20...3.2.23) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Changing cdb.add_concept to a protected method * Re-added deprecated method with deprecated flag and addtional comments * Initial commit for merge_cdb method * Added indentation to make merge_cdb a class method * fixed syntax issues * more lint fixes * more lint fixes * bug fixes of merge_cdb * removed print statements * CU-86931prq4: Update GHA versions (checkout and setup-python) to v4 (#368) * Cu 1yn0v9e duplicate multiprocessing methods (#364) * CU-1yn0v9e: Rename and deprecate one of the multiprocessing methods; Add docstring. Trying to be more explicit regarding usage and differences between different methods * CU-1yn0v9e: Rename and deprecate the multiprocessing_pipe method; Add docstring. Trying to be more explicit regarding usage and differences between different methods * CU-1yn0v9e: Fix typo in docstring; more consistent naming * 869377m3u: Add comment regarding demo link load times to README (#376) * intermediate changes of merge_cdb and testing function * Added README.md documentation for CPU only installations (#365) * changed README.md to reflect installation options. * added setup script to demonstrate how wrapper could look for CPU installations * removed setup.sh as unnessescary for cpu only builds * Initial commit for merge_cdb method * Added indentation to make merge_cdb a class method * fixed syntax issues * more lint fixes * more lint fixes * bug fixes of merge_cdb * removed print statements * Added commentary on disk space usage of pytorch-gpu * removed merge_cdb from branch --------- Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk> * Cu 8692zguyq no preferred name (#367) * CU-8692zguyq: Slight simplification of minimum-name-length logic * CU-8692zguyq: Add some tests for prepare_name preprocessor * CU-8692zguyq: Add warning if no preferred name was added along a new CUI * CU-8692zguyq: Add additional warning messages when adding/training a new CUI with no preferred name * CU-8692zguyq: Make no preferred name warnings only run if name status is preferred * CU-8692zguyq: Add tests for no-preferred name warnings * CU-8692zguyq: Add Vocab.make_unigram_table to CAT tests * CU-8692zguyq: Move to built in asserting for logging instead of patching the method * CU-8692zguyq: Add workaround for assertNoLogs on python 3.8 and 3.9 * Add trainer callbacks for Transformer NER (#377) CU-86938vf30 add trainer callbacks for Transformer NER * changes to merge_cdb and adding unit tests for method * fixing lint issues * fixing flake8 linting * bug fixes, additional tests, and more documentation * moved set up of cdbs to be merged to tests.helper * moved merge_cdb to utils and created test_cdb_utils * removed class wrapper in cdb utils and fixed class set up in tests * changed test object setup to class setup * removed erroneous new line * CU-2e77a31 improve print stats (#366) * Add base class for CAT * Add CDB base class * Some whitespace fixes for base modules * CU-2e77a31: Move print stats to their own module and class * CU-2e77a31: Fix issues introduced by moving print stats * CU-2e77a31: Rename print_stats to get_stats and add option to avoid printed output * CU-2e77a31: Add test for print_stats * CU-2e77a31: Remove unused import * CU-2e77a31: Add new package to setup.py * CU-2e77a31: Fix a bunch of typing issues * CU-2e77a31: Revert CAT and CDB abstraction * Load stopwords in Defaults before spacy model * CU-8693az82g Remove cdb tests side effects (#380) * 8693az82g: Add method to CDBMaker to reset the CDB * 8693az82g: Add test in CDB tests to ensure a new CDB is used for each test * 8693az82g: Reset CDB in CDB tests before each test to avoid side effects * Added tests * CU-8693bpq82 fallback spacy model (#384) * CU-8693bpq82: Add fallback spacy model along with test * CU-8693bpq82: Remove debug output * CU-8693bpq82: Add exception info to warning upon spacy model load failure and fallback * Remove tests of internals where possible * Add test for skipping of stopwords * Avoid supporting only English for stopwords * Remove debug output * Make sure stopwords language getter works for file-path spacy models * CU-8693cv3w0 Fix fallback spacy model existance on pip installs (#386) * CU-8693cv3w0: Add method to ensure spacy model and use it when falling back to default model * CU-8693cv3w0: Add logged output when installing/downloading spacy model * CU-8693b0a61 Add method to get spacy model version (#381) * CU-8693b0a61: Add method to find spacy folder in model pack along with some tests * CU-8693b0a61: Add test for spacy folder finding (full path) * CU-8693b0a61: Add method for finding spacy model in model pack along with tests * CU-8693b0a61: Add method for finding current spacy version * CU-8693b0a61: Add method for getting spacy model version installed * CU-8693b0a61: Fix getting spacy model folder return path * CU-8693b0a61: Add method to get name and meta of spacy model based on model pack * CU-8693b0a61: Add missing fake spacy model meta * CU-8693b0a61: Add missing docstrings * CU-8693b0a61: Change name of method for clarity * CU-8693b0a61: Add method to get spacy model name and version from model pack path * CU-8693b0a61: Fix a few typing issues * CU-8693b0a61: Add a missing docstring * CU-8693b0a61: Match folder name of fake spacy model to its name * CU-8693b0a61: Make the final method return true name of spacy model instead of folder name * Add additional output to method for getting spacy model version - the compatible spacy versions * CU-8693b0a61: Add method for querying whether the spacy version is compatible with a range * CU-8693b0a61: Add better abstraction for spacy version mocking in tests * CU-8693b0a61: Add some more abstraction for fake model pack in tests * CU-8693b0a61: Add method for checking whethera model pack has a spacy model compatible with installed spacy version * CU-8693b0a61: Improve abstraction within tests * CU-8693b0a61: Add method to check which of two versions is older * CU-8693b0a61: Fix fake spacy model versioning * CU-8693b0a61: Add method for determining whether a model pack has semi-compatible spacy model * CU-8693b0a61: Add missing word in docstring. * CU-8693b0a61: Change some method to protected ones --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: tomolopolis <tsearle88@gmail.com> Co-authored-by: adam-sutton-1992 <adam.sutton@kcl.ac.uk> Co-authored-by: adam-sutton-1992 <60137864+adam-sutton-1992@users.noreply.github.com> Co-authored-by: Xi Bai <82581439+baixiac@users.noreply.github.com> Co-authored-by: jenniferajiang <jennigato95@gmail.com> Co-authored-by: Jennifer Jiang <37081323+jenniferajiang@users.noreply.github.com>
There are multiple methods for multiprocessing.
But it is not always clear why one should prefer one to another.
This PR attempts to fix that by deprecating the old methods and renaming them with a new, more descriptive name. It also adds some documentation regarding their differences to the docstrings.
From my research, here's what the differences of the existing methods are:
What I managed to dig up from code and tutorials:
multiprocessing
multiprocessing_pipe
get_entities_multi_texts
multiprocessing_pipe