Added to the planning overview and updated links to new docs

ranking-agent · Oct 7, 2021 · 7c2c435 · 7c2c435
1 parent 4a0e465
commit 7c2c435
Show file tree

Hide file tree

Showing 4 changed files with 87 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -37,7 +37,9 @@ You can also run tests and coverage reports withou the management script. Check
 
 The local development environment also includes a built-in profiler for debugging performance issues. To use this, set `PROFILER=true` in a `.env` file in the root of the repository. Once the application is running the profiler will automatically be run on all incoming requests. To view profiles you can visit [localhost:5781/profiles](http://localhost:5781/profiles), which will give you a list of the captured profiles. These captured profiles can be used with the [snakeviz](https://jiffyclub.github.io/snakeviz/) utility to easily diagnose performance issues.
 
-## [Testing](tests/README.md)
+## Testing
+
+Documentation for testing can be found in the [tests README](tests/README.md). Additional high level testing architecture overview can be found in the [docs folder](docs/TESTING_INFRASTRUCTURE.md). 
 
 ## Deployment
 
@@ -65,3 +67,7 @@ docker-compose -f docker-compose.yml -f docker-compose.prod.yml up --build
 ## Usage
 
 `http://<HOST>:5781/docs`
+
+## Documentation
+
+High level documentation can be found in the [docs folder](docs/README.md).
diff --git a/docs/PLANNING_OVERVIEW.md b/docs/PLANNING_OVERVIEW.md
@@ -2,12 +2,21 @@
 
 This document provides an overview of how query planning works. Query planning is very complex. The best way to understand query planning in Strider is to use this document in conjunction with the [test_query_planner.py](tests/test_query_planner.py) file and the source code in [query_planner.py](strider/query_planner.py). 
 
+This document includes some graphs written with the [mermaid syntax](https://mermaid-js.github.io/mermaid/#/). For the best experience, please activate the Github+Mermaid extension for either [Chrome](https://chrome.google.com/webstore/detail/github-%20-mermaid/goiiopgdnkogdbjmncgedmgpoajilohe?hl=en) or [Firefox](https://addons.mozilla.org/en-US/firefox/addon/github-mermaid/?src=recommended).
 
 ## What is a plan?
 
 Before a query can be executed, a plan needs to be created. This plan describes which edges will be traversed in what order and what KPs will be contacted for each edge. Here is an example of a looped query graph and associated plan:
 
-![Example QG](example_qgraph.png)
+```mermaid
+graph LR
+        n0(( id <br/>MONDO:0008114 ))
+        n1(( category <br/>biolink:PhenotypicFeature ))
+        n2(( category <br/>biolink:ChemicalSubstance ))
+        n0-- biolink:has_phenotype -->n1
+        n2-- biolink:treats -->n0
+        n2-- biolink:treats -->n1
+```
 
 ```json
 {
@@ -40,13 +49,42 @@ Before a query can be executed, a plan needs to be created. This plan describes
 
 Things to notice about the plan:
 
-* Keys in the plan are subject, edge, object. This is necessary because we are allowed to traverse edges in either the forward or backwards direction. In this case, we could traverse edge n1n2 in the reverse direction by changing the predicate from `biolink:treats` to `biolink:treated_by`. 
-* The plan always starts from a pinned node (one with an ID). This is because we have to send an ID to the KPs.
+* Keys in the plan are subject, edge, object. This is necessary because we are allowed to traverse edges in either the forward or backwards direction. In this case, we could traverse edge n1n2 in the reverse direction by changing the predicate from `biolink:treats` to `biolink:treated_by`. Note that in the above plan the keys have been converted to strings, but in the code the keys are stored as tuples.
+* The plan always starts from a pinned node (one with an ID). This is because we have to send an ID to the KPs. See the execution overview documentation for more details.
 * The KP list associated with each edge in the plan includes categories and predicates. This is because we are allowed to change the predicate when contacting a KP. For example, if there is a KP that accepts `positively_correlated_with` as the predicate we can convert `correlated_with` to `positively_correlated_with` when we contact it. 
 
 ## Plan creation
 
-To facilitate query planning the first thing we do is generate an operation graph. An operation is defined as a source, predicate, category triple. This isn't the same as an edge in the query graph because edges can be traversed in multiple directions and also sometimes with different predicates. A simple example would be an edge that has the predicate `biolink:treats` would be expanded to include an edge in the reverse direction with `biolink:treated_by`.
+To facilitate query planning the first thing we do is generate an operation graph. An operation is defined as a (source, predicate, target) triple. This isn't the same as an edge in the query graph because edges can be traversed in multiple directions and also sometimes with different predicates. A simple example would be an edge that has the predicate `biolink:treats` would be expanded to include an edge in the reverse direction with `biolink:treated_by`.
+
+Edges can also be traversed in the reverse direction. For this reason edges in the operation graph also specify the predicate direction using arrows. Here are examples:
+* `Disease<-treats-Drug` means that when solving, we can use a Disease identifier to find Drugs
+* `Drug-treats->Disease` means that when solving, we can use a Drug identifier to find Diseases
+
+We call these edges *reverse*. This is distinct from *symmetric* predicates which can be looked up in either direction such as related_to. This means that when given a query graph edge, we could have up to four different operations associated:
+
+Query Graph:
+
+```mermaid
+graph LR
+        n0(( category <br/>biolink:Disease ))
+        n0-- biolink:related_to -->n1
+        n1(( category <br/>biolink:ChemicalSubstance ))
+```
+
+Operation Graph:
+
+```mermaid
+graph LR
+        n0(( category <br/>biolink:Disease ))
+        n1(( category <br/>biolink:ChemicalSubstance ))
+        n0-- n0n1<br/>-biolink:related_to-> -->n1
+        n1-- n0n1.reverse<br/><-biolink:related_to- -->n0
+        n0-- n0n1.symmetric<br/>-biolink:related_to-> -->n1
+        n1-- n0n1.reverse.symmetric<br/><-biolink:related_to- -->n0
+```
+
+Building the operation graph in this way is done to help facilitate planning. This helps because when looking up the KPs that can solve this edge we want to include any of those four possibilities. 
 
 The next step is adding descendants to the operation graph. When we receive a query graph with a category `biolink:MolecularEntity` we assume that this node is also allowed to be any subclass of `biolink:MolecularEntity` including `biolink:ChemicalSubstance` and `biolink:Protein`. 
 
@@ -105,6 +143,39 @@ After creating the operation graph we use the KP registry to look for KPs that s
 }
 ```
 
-We then copy these KPs back over to the query graph. At this point, we may have incompatible KPs attached to edges. For example, in the annotated operation graph we have `kp0` attached to `n1n0` which has the signature `Disease-treated_by->Drug`. This KP is incompatible with kp2 which has the signature `MolecularEntity-decreases_abundance_of->GeneOrGeneProduct` because a MolecularEntity is not necessarily a Drug. To solve this issue we generate permutations of the query graph. This involves simply testing every combination of KPs for validitiy. In the small example above we would test kp0 with kp2, and kp3 with kp2.
+We then copy these KPs back over to the query graph. At this point, we may have incompatible KPs attached to edges. For example, in the annotated operation graph we have `kp0` attached to `n1n0` which has the signature `Disease-treated_by->Drug`. This KP is incompatible with kp2 which has the signature `MolecularEntity-decreases_abundance_of->GeneOrGeneProduct` because a MolecularEntity is not necessarily a Drug. To solve this issue we generate permutations of the query graph. This involves simply testing every combination of KPs for validitiy:
+
+#### Annotated Query Graph with KPs:
+
+```mermaid
+graph TD
+        n0(( category biolink:MolecularEntity ))
+        n1(( id MONDO:0005148 ))
+        n2(( category biolink:GeneOrGeneProduct ))
+        n1-- biolink:treated_by<br/>kp0 = Disease-treated_by->Drug<br/>kp3 = Disease-treated_by->MolecularEntity -->n0
+        n0-- biolink:affects_abundance_of<br/>kp2 = MolecularEntity-decreases_abundance_of->GeneOrGeneProduct -->n2
+```
+
+#### Permutation 1 (Invalid because Drug != MolecularEntity)
+
+```mermaid
+graph TD
+        n0(( category biolink:MolecularEntity ))
+        n1(( id MONDO:0005148 ))
+        n2(( category biolink:GeneOrGeneProduct ))
+        n1-- biolink:treated_by<br/>kp0 = Disease-treated_by->Drug-->n0
+        n0-- biolink:affects_abundance_of<br/>kp2 = MolecularEntity-decreases_abundance_of->GeneOrGeneProduct -->n2
+```
+
+#### Permutation 2 (Valid because MolecularEntity == MolecularEntity)
+
+```mermaid
+graph TD
+        n0(( category biolink:MolecularEntity ))
+        n1(( id MONDO:0005148 ))
+        n2(( category biolink:GeneOrGeneProduct ))
+        n1-- biolink:treated_by<br/>kp3 = Disease-treated_by->MolecularEntity -->n0
+        n0-- biolink:affects_abundance_of<br/>kp2 = MolecularEntity-decreases_abundance_of->GeneOrGeneProduct -->n2
+```
 
-TODO finish this description
+TODO finish
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,3 @@
+# Documentation
+
+This folder contains high level documentation for Strider. The focus is mostly on architecture. Documentation for specific Python objects and functions can be found in the code itself.
diff --git a/docs/example_qgraph.png b/docs/example_qgraph.png
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Documentation

		This folder contains high level documentation for Strider. The focus is mostly on architecture. Documentation for specific Python objects and functions can be found in the code itself.