Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MedCat [#101]: extract biomedical concepts/entities from (free) text #367

Merged
merged 23 commits into from
May 11, 2022

Conversation

Imipenem
Copy link
Collaborator

@Imipenem Imipenem commented Apr 9, 2022

PR Checklist

  • This comment contains a description of changes (with reason)
  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

  • MedCat object now only serves for cdb and vocab (not for actual processing functions)

  • processing functions are now part of the public API rather than static object methods

  • text annotation returns a DataFrame as a base for further analysis

  • basic overview table exposed via API (with optional csv save)

  • prep for adding to anndata object

  • updated tsne, umap,pca and scatter plot to color by extracted entities

  • paga, draw_graph, ... WIP

- refactored exsting code

- removed most unused functions

- MedCat object now only serves for cdb and vocab (not for actual processing functions)

- processing functions are now part of the public API rather than static object methods

- downstream analysis WIP (prepare annotation results etc, see issue)
@github-actions github-actions bot added the enhancement New feature or request label Apr 9, 2022
- flatten the annotated results dict to prepare it for creation of a pd.DataFrame
  so this could be used in further analysis
- the dataframe contains all extracted and annotated infos from the input data in adata.obs

- this will be the base for further analysis
- started overview plotting (simple dataframe or data table), bit harder than expected ;)

- bar plot top entities found (customizable)

- MedCAT object now stores annotation results in attribute rather than returning a bare dataframe
- no MultiIndex DataFrame anymore, just a simple single level one
- added an API function to show basic stats of the anntotated results in a nice rich table

- duplicated rows are now removed from the annotated results (duplicates are entities from the same row with same meta_anns and cui)
…xt data

- added a function to indicate whether a specific entity has been found in that row or not; this is useful for downstream plotting such as coloring by this entity in a umap for example

- extracted freetext dataframe is now sorted by extracted row number per default
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to see it in action to make a better assessment.

Maybe we need a quick call. As mentioned, I would like to see all of these function scoped into a class. IMO it makes sense to call these functions as ep.tl.mc.MEDCAT FUNCTION

The plots as well. ep.pl.mc.SOMETHING

ehrapy/plot/plot_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the chore label Apr 24, 2022
@Zethson Zethson linked an issue Apr 25, 2022 that may be closed by this pull request
15 tasks
docs/usage/usage.md Outdated Show resolved Hide resolved
- cause medcat is an extra dependency, it might not be installed locally at the users machine/env
@Imipenem Imipenem marked this pull request as ready for review April 29, 2022 20:43
- updated draw_graph function to color by extracted medcat ents

- refactored plotting function calls into partial callables so most of the args are pre-initialized

- fixed a bug that caused columns to be deleted from obs when they were colored by

- fixed a bug that caused ehrapy to crash when trying to plot a column in var_names with a MedCat object

- fixed a docs rendering bug introduced earlier on in this PR
@Imipenem
Copy link
Collaborator Author

Imipenem commented Apr 30, 2022

@Zethson I had this in mind as well, but I'm not sure whether its working because rich import will be sorted by our pre-commit CI AFTER medcat import, so I'm not sure whether rich is always available or not. Might just have to try.

Edit: Does not work, unfortunately

Imipenem added 3 commits May 1, 2022 21:21
…w of dtype cat

- most plotting functions got updated to use medcat entities if needed (clustermap, dendrogram, violin, stacked violin, ...)

- extracted entities from medcat in .obs are now of dtype categorical (was numerical)

- still missing: embedding, embedding_densitiy and spatial
@Imipenem Imipenem requested a review from Zethson May 3, 2022 18:23
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer. I'll really need to see the adapted tutorial.

👍

ehrapy/plot/plot_medcat.py Outdated Show resolved Hide resolved
ehrapy/plot/plot_medcat.py Outdated Show resolved Hide resolved
ehrapy/plot/plot_medcat.py Outdated Show resolved Hide resolved
ehrapy/plot/plot_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Outdated Show resolved Hide resolved
ehrapy/tools/nlp/_medcat.py Show resolved Hide resolved
@Imipenem Imipenem merged commit 3841d5b into development May 11, 2022
@Zethson Zethson deleted the feature/medcat branch December 8, 2022 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration of MedCAT
2 participants