Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize alphabet letter query: use GRAPH instead of FROM #1098

Merged
merged 2 commits into from
Dec 7, 2020

Conversation

osma
Copy link
Member

@osma osma commented Dec 1, 2020

It turns out that the existing SPARQL query to generate the letters of the alphabet for the alphabetical index is quite slow when using Fuseki TDB2. Switching to GRAPH instead of FROM makes it a lot faster.

See this thread on the Jena users list for some more background.

@osma osma added this to the 2.9 milestone Dec 1, 2020
@osma osma self-assigned this Dec 1, 2020
@osma
Copy link
Member Author

osma commented Dec 1, 2020

Some benchmarking of the query time for the KANTO/finaf vocabulary, using the dev.finto.fi SPARQL endpoint.

Before this PR: 16 seconds
After this PR: 1.7 seconds

The new query is almost 10x faster.

@osma osma requested a review from kouralex December 1, 2020 15:04
@codecov
Copy link

codecov bot commented Dec 1, 2020

Codecov Report

Merging #1098 (77f4b41) into master (91a30e5) will decrease coverage by 0.01%.
The diff coverage is 57.14%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1098      +/-   ##
============================================
- Coverage     60.32%   60.31%   -0.02%     
  Complexity     1592     1592              
============================================
  Files            32       32              
  Lines          4429     4430       +1     
============================================
  Hits           2672     2672              
- Misses         1757     1758       +1     
Impacted Files Coverage Δ Complexity Δ
model/sparql/GenericSparql.php 67.98% <57.14%> (-0.06%) 318.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91a30e5...6db0e1e. Read the comment docs.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 1, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@kouralex kouralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and works, which is the most important part. I had trouble with this PR at first since there was no similar treatment for single alphabetical letter query so I added one. Feel free to re-review, git diff went bad since there are similar blocks.

One note worth thinking now that this is faster, I think #729 and #751 could be fixed with a simple skos:prefLabel|skos:altLabel change in the appropriate spot which you have already identified in the comment section of the latter. I leave this for you to decide if you want to add that to this PR given that I presume the ten-fold increase in performance could handle that/those use cases with no problems, now.

Lastly, I checked the code for JenaTextSparql and there was one place in code where the original issue was already fixed in the exact same way it is in this PR, so no further modifications are necessary.

@osma
Copy link
Member Author

osma commented Dec 3, 2020

Looks good and works, which is the most important part. I had trouble with this PR at first since there was no similar treatment for single alphabetical letter query so I added one. Feel free to re-review, git diff went bad since there are similar blocks.

I benchmarked this change using KANTO and the alphabetical index page for the letter A. Before the PR, the query took 48s. After the change it was 12s. So a 4x improvement.

But the change you made is only relevant when using the Generic dialect. In the JenaText version of the alphabetical letter query, GRAPH is already used. I suspect you are using the Generic dialect, which is maybe not a great idea with large vocabularies such as KANTO/finaf.

One note worth thinking now that this is faster, I think #729 and #751 could be fixed with a simple skos:prefLabel|skos:altLabel change in the appropriate spot which you have already identified in the comment section of the latter. I leave this for you to decide if you want to add that to this PR given that I presume the ten-fold increase in performance could handle that/those use cases with no problems, now.

I leave that to another PR which must be separately benchmarked.

Lastly, I checked the code for JenaTextSparql and there was one place in code where the original issue was already fixed in the exact same way it is in this PR, so no further modifications are necessary.

Right. I think there is some room for refactoring now that the Generic and JenaText variants of the query are closer aligned.

@osma osma merged commit 254c2c6 into master Dec 7, 2020
@osma osma deleted the fix-first-character-query-graph branch December 7, 2020 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants