-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Admin_mschuemi
authored and
Admin_mschuemi
committed
Aug 20, 2024
1 parent
535304b
commit 021d607
Showing
1 changed file
with
38 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,75 +1,69 @@ | ||
OhdsiAuthorGraph | ||
================ | ||
# OhdsiAuthorGraph | ||
|
||
Code for visualizing the network of OHDSI authors. Nodes represent authors, links represent co-authorships. | ||
|
||
This repo contains the code for preparing the data for visualization. There are three ways to do the visualization: | ||
|
||
1. Using [Cytoscape](https://cytoscape.org/). Cytoscape is an open source software platform for visualizing complex networks. | ||
1. Using [Cytoscape](https://cytoscape.org/). Cytoscape is an open source software platform for visualizing complex networks. | ||
|
||
2. Using JavaScript embedded in a web page. This uses the [D3 JavaScript library](https://d3js.org/). | ||
2. Using JavaScript embedded in a web page. This uses the [D3 JavaScript library](https://d3js.org/). | ||
|
||
3. Using Python matplotlib (preferred). | ||
3. Using Python matplotlib (preferred). | ||
|
||
## How to use | ||
|
||
1. Get a list of PMIDs (PubMed IDs) of OHDSI papers. | ||
1. Get a list of PMIDs (PubMed IDs) of OHDSI papers. | ||
|
||
2. Run [PrepareGraphData.R](PrepareGraphData.R). | ||
|
||
3. Either view the results in the provided HTML page (see [docs](docs) folder), load the `.tsv` files in the [cytoscape](cytoscape) folder in Cytoscape, or proceed to instructions for matplotlib (below). | ||
2. Run [PrepareGraphData.R](PrepareGraphData.R). | ||
|
||
3. Either view the results in the provided HTML page (see [docs](docs) folder), load the `.tsv` files in the [cytoscape](cytoscape) folder in Cytoscape, or proceed to instructions for matplotlib (below). | ||
|
||
## Cytoscape instructions | ||
|
||
File --> Import --> Network from file --> Select `links.tsv` | ||
File --\> Import --\> Network from file --\> Select `links.tsv` | ||
|
||
File --> Import --> Table from file --> Select `authors.tsv` | ||
File --\> Import --\> Table from file --\> Select `authors.tsv` | ||
|
||
You probably want to select the 'Always Show Graphics Details' (looks like a pixelated diamond) in the bottom right of the graph pane. Else you won't see the labels etc. in the preview. | ||
|
||
In Style - Node (see bottom tab): | ||
|
||
- Fill color: | ||
- Column: firstYear | ||
- Mapping Type: Continuous mapping | ||
- Label Font Size: 15 | ||
- Label Position (Click Properties dropdown to show): | ||
- Node Anchor Points: East | ||
- Label Anchor Points: West | ||
- Label Justification: Left Justified | ||
- X Offset Value: 1 | ||
- Y Offset Value: 0 | ||
- Shape | ||
- Ellipse | ||
- Lock node width and height: check | ||
- Size: | ||
- Column: paperCount | ||
- Mapping Type: Continuous mapping | ||
- Fill color: | ||
- Column: firstYear | ||
- Mapping Type: Continuous mapping | ||
- Label Font Size: 15 | ||
- Label Position (Click Properties dropdown to show): | ||
- Node Anchor Points: East | ||
- Label Anchor Points: West | ||
- Label Justification: Left Justified | ||
- X Offset Value: 1 | ||
- Y Offset Value: 0 | ||
- Shape | ||
- Ellipse | ||
- Lock node width and height: check | ||
- Size: | ||
- Column: paperCount | ||
- Mapping Type: Continuous mapping | ||
|
||
In Style - Edge (see bottom tab): | ||
|
||
- Stroke color: RGB all at 150 | ||
- Transparency: | ||
- Column: paperCount | ||
- Mapping Type: Continuous mapping | ||
- Open mapping. Double click left box, set value to 40. Double click right box, set value to 100 | ||
|
||
|
||
Layout --> yFiles Organic Layout --> yFiles Remove Overlaps | ||
(Tip: temporarily change node shape to rectangle, uncheck lock node width and heigh, set height to 25 and width to 50, node anchor to west. This will cause layout to avoid (most) label overlap) | ||
- Stroke color: RGB all at 150 | ||
- Transparency: | ||
- Column: paperCount | ||
- Mapping Type: Continuous mapping | ||
- Open mapping. Double click left box, set value to 40. Double click right box, set value to 100 | ||
|
||
Next, move nodes manually to fill screen and avoid label overlap (may take a while) | ||
Layout --\> yFiles Organic Layout --\> yFiles Remove Overlaps (Tip: temporarily change node shape to rectangle, uncheck lock node width and heigh, set height to 25 and width to 50, node anchor to west. This will cause layout to avoid (most) label overlap) | ||
|
||
File --> Export --> Network to image | ||
Next, move nodes manually to fill screen and avoid label overlap (may take a while) | ||
|
||
File --\> Export --\> Network to image | ||
|
||
# Matplotlib instructions | ||
|
||
The Matplotlib code colors authors by the type of papers they publish. For this, we need to first classify their papers by type, for which we use LLMs: | ||
|
||
1. Run ExtractPubAbstractTitle.R. This will save the titles and abstracts as XML in the `intermediaryData` folder. | ||
2. Run PaperClassification.R. This requires access to an LLM like GPT-4. This will write the classifications to the `paperClassification` folder. | ||
3. Run matplotlib/PlotAuthorGraph.py. Make sure to run the pickle files (`positionsSpringForce.pkl` and `positionsNoOverlap.pkl`) first. These are caches from a previous run. | ||
4. In some image editor (e.g. Gimp), combine the plot (`matplotlib/plot.png`) with the legend (`matplotlib/legend.png`). | ||
|
||
1. Run ExtractPubAbstractTitle.R. This will save the titles and abstracts as XML in the `intermediaryData` folder. | ||
2. Run PaperClassification.R. This requires access to an LLM like GPT-4. This will write the classifications to the `paperClassification` folder. | ||
3. Run matplotlib/PlotAuthorGraph.py. Make sure to run the pickle files (`positionsSpringForce.pkl` and `positionsNoOverlap.pkl`) first. These are caches from a previous run. | ||
4. In some image editor (e.g. Gimp), combine the plot (`matplotlib/plot.png`) with the legend (`matplotlib/legend.png`). |