-
Notifications
You must be signed in to change notification settings - Fork 16
[DevDoc] Notes on the API implementation
Some rough, outdated, to-be-reviewed notes of mine (MB) regarding the way the KnetMiner API is implemented. You can get a decent dev-level intro of our code from here, especially if you open the mentioned components.
- Request hub, gets DS, "mode" (ie, name of API call) and general params
- And then dispatches to handleRaw()
- TODO
- Invokes
DS.$method
, having got the method from mode
- Searches synonyms, using UIService.renderSynonymTable()
- Uses searchService.searchTopConceptsByName() to get relevant concepts
- Uses
luceneMgr.searchTopConceptsByIdxField()
- Prepares a table where, for each keyword, there is an entry conceptName, conceptType, conceptId
- Uses
- Uses searchService.searchTopConceptsByName() to get relevant concepts
@param keyword
DS.countHits()
-
new SemanticMotifSearchMgr( keyword )
, assumingkeyword && ! geneList
-
luceneConcepts:Map<Concept -> Score>: SearchService.searchGeneRelatedConcepts ()
* Split keyword into list, get 'not query' *notList = this.searchTopConceptsByName()
if necessary * Populateshit2score (Concept->Score)
with a series of Lucene searches, involvingkeyword
(search string) andnotList
-
countLinkGenes()
- Uses
luceneConcepts
andSM.concepts2Genes
to count SM-linked concepts (luceneDocumentsLinked
) and matched unique genes (numConnectedGenes
)
- Uses
-
-
- Puts SMSearchMgr counts into the response
-
DataService.getLociGeneCount()
to count the loci in the request's QTL - Used in the genome regions input
@param keyword, list, listMode, qtl
-
DS.genome()
, preparesGenomeResponse
, callsDS._keyword()
- Extracts the
userGenes
, usingKGUtils.filterGenesByAccessionKeywords()
-
This tunrs the list into genes, using 1) searches over accessions and names and 2) filter on taxId
- Probably not to be filtered with user taxId (check it's valid and configured)
-
Adds
qtl
touserGenes
, using genome regions, viaKGUtils.fetchQTLs ( ONDEXGraph graph, List<String> taxIds, List<String> qtlsStr )
-
QTL.fromStringList ( qtlsStr )
to build QTL region strucutures- Then double loop over all regions and all genes in the graph
-
-
smSearchMgr = new SemanticMotifSearchMgr ( searchString, genes )
- Like said above, searches concepts based on keywords and scores them
-
candidateGenesMap = smSearchMgr.getSortedGeneCandidates() # Map<Concept->Score>
This is based onSemanticMotifsSearchResult.getScoredGenes ( Lucene-scored concepts )
, which works like:- From lucene-hit concepts, compute gene2HitConcepts, ie, a subfilter over gene->concepts map (coming from sem motifs)
- use gene2HitConcepts to compute knet scores for each gene =>
scoredGeneCandidates: Map<Gene -> KnetScore>
- return gene -> score result, ranked by score and with a filter over (unlikely) duplicated genes
-
Then, this is (possibly) filtered using user genes + QTL genes
-
Finally, we have
genesMap
andgenes
-
Next is the chromosome view
- what to do with multi-specie case?
-
Next is
exportService.exportGeneTable()
-
Next is
exportService.exportEvidenceTable()
-
- Extracts the
- Does the same gene filtering as _keyword()
-
ondexServiceProvider.getSemanticMotifService ().findSemanticMotifs( keyword, seed (genes) )
Map<ONDEXConcept, Float> luceneResults = searchService.searchGeneRelatedConcepts ( keyword, seed, false )
- Then, semanticMotifDataService.getGraphTraverser () with the seed genes
=>
Map<ONDEXConcept, List<EvidencePathNode>> results
- Splits the search string into actual keyowrds (
SearchUtils.getSearchWords()
)- get a colour map for them (
UIUtils.createHilightColorMap()
) - Uses the found paths to create the network view graph
- highlights paths and node labels based on the search keywords
- get a colour map for them (
- General info on the current dataset
- Served by
DatasetInfo DatasetInfoService.datasetInfo()
- Mostly based on the dataset section in the config YAML
- Gets per-type topological information. Used by the 'Release notes' button
- Served by
DatasetInfoService.networkStats()
- Based on the JSON file produced by
KnetMinerInitializer.exportGraphStats()
- which mostly get data from the Semantic Motif summary data
- Served by
DatasetInfoService.knetSpaceURL()
- Using a dedicated config variable
- @param keyword, used to extract an
evidenceOndexId
- list: usual gene list (except QTL)
- Similar to /network, see #631
- No longer used, removed
- Replaced by
/dataset-info/network-stats
, see #657 - Fetches stats on the whole dataset,
- which were computed by
ExportService.exportGraphStats()
- which was invoked by
OSP.initData()
- which were computed by
- Searches genes bases on user input (uses
KGUtils.filterGenesByAccessionKeywords()
as above) - Adds genes in QTL regions, as above+
- Finds sem motifs and builds the subgraph
- exports the subgraph to JSON
- puts counts into the response
- WTH?!?!?!?
- No longer used, removed
- Prepares data to perform a network view request
- Then forwards to genepage.jsp (via MVC)
- which will know how to invoke /network
- We moved it to the client, where it belongs
- Works similarly to genepage above
- Replaced by
/dataset-info/knetspace-url
. - returns the KnetSpace host, set in the config.
-
Replaced by
/dataset-info
. -
Some general info. Very rubbish format, it puts JSON into a string, instead of the usual fields in the response class. The taxIds overwrite each other:
summaryJSON.put("dbVersion", dataService.getDatasetVersion () ); summaryJSON.put("sourceOrganization", dataService.getDatasetOrganization ()); dataService.getTaxIds ().forEach( taxID -> { summaryJSON.put("speciesTaxid", taxID); }); summaryJSON.put("speciesName", dataService.getSpecies()); // TODO: in future, this might come from OXL metadata (the graph descriptor) SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm"); var timestampStr = formatter.format ( oxlFile.lastModified () ); summaryJSON.put("dbDateCreated", timestampStr); summaryJSON.put("provider", dataService.getDatasetProvider () ); String jsonString = summaryJSON.toString(); // Removing the pesky double quotes jsonString = jsonString.substring(1, jsonString.length() - 1); log.info("response.dataSource= " + jsonString); response.dataSource = jsonString;
-
It's used by
save-knet.js
, forexportAsJson()
. This is very messy -
It's also used in
showNetworkStats.js::fetchStats()
, butdbVersion
only is fetched from the API out
-
Map<ONDEXConcept, Float> scoredConcepts: the keyword-related concepts, got from Lucene
- Based on
SearchService.searchGeneRelatedConcepts()
(see below)
- Based on
-
SemanticMotifsSearchResult searchResult
-
Uses SearchService.getScoredGenes ( scoredConcepts, this.taxId )
(see below)
-
- Counts concepts in
scoredConcepts
, just using its size - Counts the genes linked to
scoredConcepts
- For each concept:
- Get genes in concept2Genes.get ( concept )
- Filter by taxId
- Eventually, count
- For each concept:
Case there is only a gene list:
(gene list is normalised)
for each gene in gene list: add genes2Concepts ( gene ) to the result, with score = 1
Case with keyword
- get the notQuery expression from keywords
- Search concepts via Lucene, using keywords
Map<Integer, Set<Integer>> gene2HitConcepts
- For each concept in scoredConcepts:
- add concept2Genes.get ( concept ) to result
- possibly, filter by taxId
- add concept2Genes.get ( concept ) to result
- Then, group by gene
Map<ONDEXConcept, Double> scoredGeneCandidates
-
for each gene in
gene2HitConcepts
:- for concept in
gene2HitConcepts.get ( gene )
-
luceneScore = scoredEvidenceConcepts.get ( concept )
igf = log ( genesCount / concepts2Gene.get ( concept ).size () )
-
invGraphDist = 1 / genes2PathLens.get ( gene, concept )
-
knetScore
= the three above combined
-
- Sum of
knetScore
for each concept isknetScore ( gene )
-
- for concept in
-
scoredGeneCandidates
are sorted -
The final
SemanticMotifsSearchResult
result contains:-
geneId2RelatedConceptIds = gene2HitConcepts
gene2Score = sorted scoredGeneCandidates
-
-
genesCount
is the total no of genes in the traverser seed, which belong to one of the configured specie In Neo4j: needs to be stored? -
concepts2Gene.get ( concept ).size ()
, needs to be stored in Neo4j? -
genes2PathLens.get ( gene, concept )
in Neo4j, is in the gene/concept link
Params:
* List<ONDEXConcept> candidateGenes
* Set<ONDEXConcept> userGenes
* List<String> userQtlsStr
* String listMode
* SemanticMotifsSearchResult searchResult
- Best name function in ondex
- The gene's evidences are got from
searchResult.getGeneId2RelatedConceptIds()
- The gene score is got from searchResult.getGene2Score ()
- The graph distances are got from
genes2PathLengths
(SemMotif summaries)- In Neo4j, gene/concept links
Params:
* String keywords // To be removed, not used
* Map<ONDEXConcept, Float> foundConcepts
* Set<ONDEXConcept> userGenes
* List<String> userQtlsStr
* boolean doSortResult
- score is summed up for each evidence concept using
foundConcepts
- For each concept,
conceptGenes
are fetched fromconcepts2Genes
-
startGenesSize
=conceptGenes.size()
-
- For each gene in
conceptGenes
:-
matchedInGeneList
++ if the gene is inuserGenes
-
- At the end:
notMatchedInGeneList = userGenes.size - matchedInGeneList
matchedNotInGeneList = startGenesSize - matchedInGeneList
notMatchedNotInGeneList = genes2Concepts.size - matchedNotInGeneList - matchedInGeneList - notMatchedInGeneList
- These are used for Fisher test, from which pvalue is computed
- At the end:
- returns the found concept
- returns the concept score as Lucene score
- returns pvalue as computed above (ie, Fisher test)
- returns
startGenesSize
(the no of SM genes associated to the concept) - returns the matching user genes
- Sorts by pvalue, score and others