MedCat [#101]: extract biomedical concepts/entities from (free) text #367

Imipenem · 2022-04-09T20:10:29Z

PR Checklist

This comment contains a description of changes (with reason)
Referenced issue is linked
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes

MedCat object now only serves for cdb and vocab (not for actual processing functions)
processing functions are now part of the public API rather than static object methods
text annotation returns a DataFrame as a base for further analysis
basic overview table exposed via API (with optional csv save)
prep for adding to anndata object
updated tsne, umap,pca and scatter plot to color by extracted entities
paga, draw_graph, ... WIP

- refactored exsting code - removed most unused functions - MedCat object now only serves for cdb and vocab (not for actual processing functions) - processing functions are now part of the public API rather than static object methods - downstream analysis WIP (prepare annotation results etc, see issue)

- flatten the annotated results dict to prepare it for creation of a pd.DataFrame so this could be used in further analysis

- the dataframe contains all extracted and annotated infos from the input data in adata.obs - this will be the base for further analysis

- started overview plotting (simple dataframe or data table), bit harder than expected ;) - bar plot top entities found (customizable) - MedCAT object now stores annotation results in attribute rather than returning a bare dataframe

- no MultiIndex DataFrame anymore, just a simple single level one

- added an API function to show basic stats of the anntotated results in a nice rich table - duplicated rows are now removed from the annotated results (duplicates are entities from the same row with same meta_anns and cui)

…xt data - added a function to indicate whether a specific entity has been found in that row or not; this is useful for downstream plotting such as coloring by this entity in a umap for example - extracted freetext dataframe is now sorted by extracted row number per default

Zethson

I'll have to see it in action to make a better assessment.

Maybe we need a quick call. As mentioned, I would like to see all of these function scoped into a class. IMO it makes sense to call these functions as ep.tl.mc.MEDCAT FUNCTION

The plots as well. ep.pl.mc.SOMETHING

ehrapy/plot/plot_medcat.py

ehrapy/tools/nlp/_medcat.py

docs/usage/usage.md

- cause medcat is an extra dependency, it might not be installed locally at the users machine/env

- updated draw_graph function to color by extracted medcat ents - refactored plotting function calls into partial callables so most of the args are pre-initialized - fixed a bug that caused columns to be deleted from obs when they were colored by - fixed a bug that caused ehrapy to crash when trying to plot a column in var_names with a MedCat object - fixed a docs rendering bug introduced earlier on in this PR

Imipenem · 2022-04-30T21:01:26Z

@Zethson I had this in mind as well, but I'm not sure whether its working because rich import will be sorted by our pre-commit CI AFTER medcat import, so I'm not sure whether rich is always available or not. Might just have to try.

Edit: Does not work, unfortunately

…medcat.py

…w of dtype cat - most plotting functions got updated to use medcat entities if needed (clustermap, dendrogram, violin, stacked violin, ...) - extracted entities from medcat in .obs are now of dtype categorical (was numerical) - still missing: embedding, embedding_densitiy and spatial

Zethson

Getting closer. I'll really need to see the adapted tutorial.

👍

ehrapy/plot/plot_medcat.py

ehrapy/tools/nlp/_medcat.py

…nto feature/medcat

github-actions bot added the enhancement New feature or request label Apr 9, 2022

Imipenem added 7 commits April 10, 2022 21:44

Implemented function to flatten annotated results dicts

df4161b

- flatten the annotated results dict to prepare it for creation of a pd.DataFrame so this could be used in further analysis

Updated annotate_text to return a prepared dataframe

e48826c

- the dataframe contains all extracted and annotated infos from the input data in adata.obs - this will be the base for further analysis

Refactored internal annotation result representation

220ff77

- no MultiIndex DataFrame anymore, just a simple single level one

Formatting and nox

1db413f

Zethson requested changes Apr 24, 2022

View reviewed changes

PR feedback + docs fix

defab58

github-actions bot added the chore label Apr 24, 2022

Zethson linked an issue Apr 25, 2022 that may be closed by this pull request

Integration of MedCAT #101

Closed

15 tasks

Imipenem mentioned this pull request Apr 25, 2022

Plot coloring by boolean variables #373

Closed

Imipenem added 2 commits April 25, 2022 21:30

Added freetext umap plotting and PR feedback

744af42

Nox

24774b7

Zethson reviewed Apr 27, 2022

View reviewed changes

docs/usage/usage.md Outdated Show resolved Hide resolved

Imipenem added 2 commits April 29, 2022 21:39

Updated pca, scatter and tsne plots for medcat entities

b018621

Medcat module_not_found_error now throws a warning + fix CI

a84d693

- cause medcat is an extra dependency, it might not be installed locally at the users machine/env

Imipenem marked this pull request as ready for review April 29, 2022 20:43

Imipenem added 2 commits April 30, 2022 22:53

Reformatting

114b1db

Imipenem added 3 commits May 1, 2022 21:21

Refactor partial in scapi pl + Fixed print/removed unsued import in _…

6c30964

…medcat.py

Updated embedding plot for medcat

8062b8d

Imipenem requested a review from Zethson May 3, 2022 18:23

Zethson requested changes May 4, 2022

View reviewed changes

Zethson and others added 2 commits May 4, 2022 10:01

fix typo

bd2ed43

str matcher for plots and PR feedback

d1a0b34

Imipenem added 3 commits May 9, 2022 20:19

Merge branch 'feature/medcat' of https://github.com/theislab/ehrapy i…

5878c51

…nto feature/medcat

Updated plots for string matcher + fixed sequence bugs temporarily

24eaadb

PR feedback

8d2f3ad

Imipenem merged commit 3841d5b into development May 11, 2022

Zethson deleted the feature/medcat branch December 8, 2022 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MedCat [#101]: extract biomedical concepts/entities from (free) text #367

MedCat [#101]: extract biomedical concepts/entities from (free) text #367

Imipenem commented Apr 9, 2022 •

edited

Loading

Zethson left a comment

Imipenem commented Apr 30, 2022 •

edited

Loading

Zethson left a comment

MedCat [#101]: extract biomedical concepts/entities from (free) text #367

MedCat [#101]: extract biomedical concepts/entities from (free) text #367

Conversation

Imipenem commented Apr 9, 2022 • edited Loading

Zethson left a comment

Choose a reason for hiding this comment

Imipenem commented Apr 30, 2022 • edited Loading

Zethson left a comment

Choose a reason for hiding this comment

Imipenem commented Apr 9, 2022 •

edited

Loading

Imipenem commented Apr 30, 2022 •

edited

Loading