Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Fix incorrect limit issue #509

Closed
gaurav opened this issue May 4, 2022 · 3 comments
Closed

Fix incorrect limit issue #509

gaurav opened this issue May 4, 2022 · 3 comments
Assignees

Comments

@gaurav
Copy link
Member

gaurav commented May 4, 2022

The limit parameter doesn't currently work correctly, since the SPARQL query (where the limit is applied) picks up distinct node IDs (?n0, ?n1) in addition to the node types (?n0_type, ?n1_type), but only the node type information is reported back in TRAPI. So if we get the same type-to-type relations from multiple sources, this information is currently collapsed into a single type-to-type relation, and so we end up sending back fewer results than we got from the SPARQL query. This causes a big headache to users, since a user asking for 100 results who gets back 20 results would incorrectly think that there are only 20 relevant results in CAM-KP.

Currently in progress in PR #499.

@gaurav
Copy link
Member Author

gaurav commented May 4, 2022

I'm currently working on ways of tweaking the SPARQL query so that it returns distinct type-to-type relations, which would allow the SPARQL LIMIT to provide the correct number of results. I have two SPARQL queries I'm currently looking at:

  • By comparison, here is a slightly optimized version of the query from PR Add Limit test to document how the "limit" parameter is supposed to work #499. It takes ~24 seconds to return 311 results.
  • I can use SPARQL GROUP BY to look for a distinct (?n0_type ?e ?n1_type) set, and include both the graph URL and the prov:wasDerivedFrom graph URL in the search results using GROUP_CONCAT. This takes ~26 seconds to return 311 results, but does not suffer from the limit bug.
  • I then looked into whether I can make a simple, efficient query to get only the (?n0_type ?e0 ?n1_type) information so that we can query for provenance information in future queries. However, this query still takes ~14 seconds (and returns 319 results for some reason). So I think any time saving will be more than made up for by the time taken to make further queries.

I'm currently working on implementing the SPARQL GROUP BY solution (with an eye on whether we actually need the node IDs for any reason), while cleaning up and documenting QuerySolution as I go.

I mentioned this to Chris Bizon yesterday, and he said that generating multiple edges for each source would be a bad idea, since TRAPI is heading towards assuming that edges will be unique within a particular KP. Instead, he pointed me to Abrar Mesbah's proposal to include multiple provenance information in the same TRAPI response -- that way, we could indicate when an edge is supported by multiple sources. His full proposal is available at NCATSTranslator/TranslatorArchitecture#70.

@gaurav gaurav self-assigned this May 5, 2022
@balhoff
Copy link
Contributor

balhoff commented May 5, 2022

Here is an optimized query using a subselect and a Blazegraph hint:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?n0_type ?e0 ?n1_type (GROUP_CONCAT(DISTINCT ?g; SEPARATOR="|") AS ?groups) (GROUP_CONCAT(DISTINCT ?other; SEPARATOR="|") AS ?others)
	WHERE {
            VALUES ?n0_class {  <http://purl.obolibrary.org/obo/CHEBI_15361>  }
             ?n0 <http://www.openrdf.org/schema/sesame#directType> ?n0_type .
                  ?n0_type <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?n0_class .

               ?n1 <http://www.openrdf.org/schema/sesame#directType> ?n1_type .
                  ?n1_type <http://www.w3.org/2000/01/rdf-schema#subClassOf> ?n1_class .
  
    OPTIONAL { ?g <http://www.w3.org/ns/prov#wasDerivedFrom> ?other }
  {
    SELECT ?n0 ?e0 ?n1 ?g
    WHERE {
         #   ?e0 <http://cam.renci.org/biolink_slot> <https://w3id.org/biolink/vocab/participates_in> .
       #     hint:Prior hint:runFirst true .
              VALUES ?n0_class {  <http://purl.obolibrary.org/obo/CHEBI_15361>  }
               ?n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?n0_class .
               # VALUES ?n1_class {  <https://w3id.org/biolink/vocab/NamedThing>  }
               ?n1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?n1_class .
            GRAPH ?g { 
                FILTER( ?e0 IN ( <http://purl.obolibrary.org/obo/RO_0002565>, <http://purl.obolibrary.org/obo/RO_0000057>, <http://purl.obolibrary.org/obo/RO_0002087>, <http://purl.obolibrary.org/obo/RO_0002500>, <http://www.w3.org/2004/02/skos/core#narrowMatch>, <http://purl.obolibrary.org/obo/RO_0004007>, <http://purl.obolibrary.org/obo/RO_0002131>, <http://purl.obolibrary.org/obo/RO_0002206>, <http://purl.obolibrary.org/obo/RO_0002436>, <http://purl.obolibrary.org/obo/RO_0002160>, <http://purl.obolibrary.org/obo/RO_0001025>, <http://purl.obolibrary.org/obo/RO_0002093>, <http://purl.obolibrary.org/obo/RO_0002356>, <http://www.w3.org/2004/02/skos/core#relatedMatch>, <http://purl.obolibrary.org/obo/RO_0002216>, <http://purl.obolibrary.org/obo/BFO_0000063>, <http://purl.obolibrary.org/obo/RO_0002432>, <http://purl.obolibrary.org/obo/BFO_0000051>, <http://purl.obolibrary.org/obo/RO_0002349>, <http://purl.obolibrary.org/obo/RO_0002204>, <http://purl.obolibrary.org/obo/RO_0002488>, <http://purl.obolibrary.org/obo/RO_0004034>, <http://purl.obolibrary.org/obo/RO_0002497>, <http://purl.obolibrary.org/obo/RO_0002292>, <http://purl.obolibrary.org/obo/RO_0004033>, <http://purl.obolibrary.org/obo/RO_0001015>, <http://purl.obolibrary.org/obo/RO_0002434>, <http://purl.obolibrary.org/obo/RO_0002263>, <http://purl.obolibrary.org/obo/RO_0002212>, <http://purl.obolibrary.org/obo/RO_0002496>, <http://purl.obolibrary.org/obo/RO_0002331>, <http://purl.obolibrary.org/obo/RO_0002608>, <http://purl.obolibrary.org/obo/RO_0002314>, <http://purl.obolibrary.org/obo/RO_0002449>, <http://purl.obolibrary.org/obo/RO_0002333>, <http://purl.obolibrary.org/obo/RO_0002298>, <http://purl.obolibrary.org/obo/RO_0000056>, <http://purl.obolibrary.org/obo/RO_0002229>, <http://purl.obolibrary.org/obo/RO_0004032>, <http://translator.renci.org/ubergraph-axioms.ofn#acts_upstream_of_o_enabled_by>, <http://purl.obolibrary.org/obo/RO_0000052>, <http://purl.obolibrary.org/obo/BFO_0000066>, <http://www.w3.org/2004/02/skos/core#exactMatch>, <http://www.w3.org/2000/01/rdf-schema#subClassOf>, <http://purl.obolibrary.org/obo/RO_0002588>, <http://www.w3.org/2000/01/rdf-schema#subPropertyOf>, <http://purl.obolibrary.org/obo/RO_0002348>, <http://purl.obolibrary.org/obo/BFO_0000062>, <http://purl.obolibrary.org/obo/RO_0002211>, <http://www.w3.org/2004/02/skos/core#closeMatch>, <http://purl.obolibrary.org/obo/RO_0002224>, <http://purl.obolibrary.org/obo/RO_0002450>, <http://purl.obolibrary.org/obo/RO_0002205>, <http://purl.obolibrary.org/obo/RO_0002604>, <http://purl.obolibrary.org/obo/RO_0002448>, <http://purl.obolibrary.org/obo/RO_0002344>, <http://purl.obolibrary.org/obo/RO_0002232>, <http://purl.obolibrary.org/obo/RO_0002338>, <http://purl.obolibrary.org/obo/RO_0002592>, <http://purl.obolibrary.org/obo/RO_0002315>, <http://purl.obolibrary.org/obo/RO_0002234>, <http://purl.obolibrary.org/obo/RO_0002327>, <http://purl.obolibrary.org/obo/RO_0002492>, <http://purl.obolibrary.org/obo/RO_0002090>, <http://purl.obolibrary.org/obo/RO_0002412>, <http://purl.obolibrary.org/obo/RO_0002296>, <http://purl.obolibrary.org/obo/GOREL_0001006>, <http://purl.obolibrary.org/obo/RO_0002230>, <http://purl.obolibrary.org/obo/RO_0002299>, <http://purl.obolibrary.org/obo/RO_0002264>, <http://purl.obolibrary.org/obo/RO_0002092>, <http://purl.obolibrary.org/obo/RO_0002084>, <http://purl.obolibrary.org/obo/RO_0002297>, <http://purl.obolibrary.org/obo/RO_0004009>, <http://purl.obolibrary.org/obo/RO_0002223>, <http://purl.obolibrary.org/obo/RO_0002339>, <http://www.w3.org/2004/02/skos/core#broadMatch>, <http://purl.obolibrary.org/obo/UPHENO_0000001>, <http://purl.obolibrary.org/obo/RO_0002313>, <http://purl.obolibrary.org/obo/RO_0004008>, <http://purl.obolibrary.org/obo/RO_0002326>, <http://purl.obolibrary.org/obo/BFO_0000050>, <http://purl.obolibrary.org/obo/RO_0002220>, <http://purl.obolibrary.org/obo/RO_0001019>, <http://purl.obolibrary.org/obo/RO_0002215>, <http://purl.obolibrary.org/obo/RO_0002231>, <http://purl.obolibrary.org/obo/RO_0000053>, <http://purl.obolibrary.org/obo/RO_0002411>, <http://purl.obolibrary.org/obo/RO_0004035>, <http://purl.obolibrary.org/obo/RO_0002213>, <http://purl.obolibrary.org/obo/RO_0002590>, <http://purl.obolibrary.org/obo/RO_0002328>, <http://purl.obolibrary.org/obo/RO_0002233> ) )     
              ?n0 ?e0 ?n1 .
             }
    }
  }
   hint:Prior hint:runFirst true .
          
}
GROUP BY ?n0_type ?e0 ?n1_type

Blazegraph query hints: https://github.com/blazegraph/database/wiki/QueryHints

gaurav added a commit that referenced this issue May 10, 2022
gaurav added a commit that referenced this issue Jun 21, 2022
gaurav added a commit that referenced this issue Jun 21, 2022
@gaurav
Copy link
Member Author

gaurav commented Jul 20, 2022

Closed by PR #499

@gaurav gaurav closed this as completed Jul 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants