Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Add Limit test to document how the "limit" parameter is supposed to work #499

Merged
merged 43 commits into from
Jul 7, 2022

Conversation

gaurav
Copy link
Member

@gaurav gaurav commented Mar 16, 2022

I noticed some odd behavior with the way in which the limit parameter works, which can be best summarized by listing the results of running the same query against the SPARQL endpoint using the new test I've written in this PR.

Retrieved 14 results when limit=0
Retrieved 1 results when limit=1
Retrieved 2 results when limit=2
Retrieved 2 results when limit=3
Retrieved 2 results when limit=4
Retrieved 3 results when limit=5
Retrieved 6 results when limit=10
Retrieved 13 results when limit=20
Retrieved 14 results when limit=30
Retrieved 14 results when limit=40
Retrieved 14 results when limit=50
Retrieved 14 results when limit=60
Retrieved 14 results when limit=70
Retrieved 14 results when limit=80
Retrieved 14 results when limit=90
Retrieved 14 results when limit=100

This PR updates the SPARQL query generation code (in consultation with Jim) to correct this bug so we return the correct number of results. It retains the previously used code since there is functionality in there that we would probably like reuse in the future.

This PR previously introduced two new bugs (#517, #518) but then closes them in d4ccd3a.

@gaurav
Copy link
Member Author

gaurav commented Apr 7, 2022

It looks like the problem is happening here:
https://github.com/NCATS-Tangerine/cam-kp-api/blob/46cf7fb272b705c23e1ecc5fd70e923cfb3e8bb7/src/main/scala/org/renci/cam/QueryService.scala#L69-L70

findInitialQuerySolutions() seems to apply the limit, returning the number of triples that we expect (up to 25 in the example included in this PR), but then extractCoreTriples() removes n0_type and n1_type from the initial query solutions, returning only the n0, e0 and n1 entries. This reduces the number of distinct triples to the number we get in the output, e.g. Retrieved 6 results when limit=10.

Interestingly, solutionTriples is only used for getting provenance information -- as far as I can tell, the rest of the code uses the initial query solutions only. So maybe this isn't where the limit is being applied? Hmm.

All triples returned when limit=10

Initial query solutions:

n1_type e0 n1 n0_type n0
http://purl.obolibrary.org/obo/GO_0006094 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-70263/R-HSA-70263 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-70501
http://purl.obolibrary.org/obo/GO_0046034 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-70263/R-HSA-70263 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-70501
http://purl.obolibrary.org/obo/GO_0046034 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-73621/R-HSA-73621 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909776
http://purl.obolibrary.org/obo/GO_0046034 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-73621/R-HSA-73621 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909780
http://purl.obolibrary.org/obo/GO_0019674 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-70263/R-HSA-70263 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-70501
http://purl.obolibrary.org/obo/GO_0019674 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-73621/R-HSA-73621 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909776
http://purl.obolibrary.org/obo/GO_0019674 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-73621/R-HSA-73621 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909780
http://purl.obolibrary.org/obo/GO_0016223 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-909776 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909776
http://purl.obolibrary.org/obo/GO_0016223 http://purl.obolibrary.org/obo/RO_0002328 http://model.geneontology.org/R-HSA-909776 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909776
http://purl.obolibrary.org/obo/GO_0047305 http://purl.obolibrary.org/obo/RO_0000056 http://model.geneontology.org/R-HSA-909780 http://purl.obolibrary.org/obo/CHEBI_15361 http://model.geneontology.org/R-ALL-113557_R-HSA-909780

Solution triples is a HashSet() consisting of:

@gaurav
Copy link
Member Author

gaurav commented Apr 7, 2022

Jim is concerned that this code might also misbehave if:

s p o stype ptype
1 p1 2 A B
1 p1 2 C B
3 p1 4 A B

Note that in this case there are two different individuals that indicate the same relationship (A p1 B). They could be from different sources.

The way to find this would be a unit test that provides tests the difference pieces of the run() workflow, e.g. getTRAPIEdgeBindingsMany().

@gaurav gaurav mentioned this pull request May 4, 2022
@gaurav gaurav marked this pull request as ready for review June 28, 2022 05:50
@gaurav gaurav requested a review from balhoff June 28, 2022 05:50
Copy link
Contributor

@balhoff balhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks, good, just a few possible changes.

src/main/scala/org/renci/cam/QueryService.scala Outdated Show resolved Hide resolved
src/main/scala/org/renci/cam/QueryService.scala Outdated Show resolved Hide resolved
src/main/scala/org/renci/cam/QueryService.scala Outdated Show resolved Hide resolved
src/main/scala/org/renci/cam/QueryService.scala Outdated Show resolved Hide resolved
@balhoff balhoff merged commit 789066e into master Jul 7, 2022
@balhoff balhoff deleted the add-limit-test branch July 7, 2022 17:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants