-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize alphabet letter query: use GRAPH instead of FROM #1098
Conversation
Some benchmarking of the query time for the KANTO/finaf vocabulary, using the dev.finto.fi SPARQL endpoint. Before this PR: 16 seconds The new query is almost 10x faster. |
Codecov Report
@@ Coverage Diff @@
## master #1098 +/- ##
============================================
- Coverage 60.32% 60.31% -0.02%
Complexity 1592 1592
============================================
Files 32 32
Lines 4429 4430 +1
============================================
Hits 2672 2672
- Misses 1757 1758 +1
Continue to review full report at Codecov.
|
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and works, which is the most important part. I had trouble with this PR at first since there was no similar treatment for single alphabetical letter query so I added one. Feel free to re-review, git diff went bad since there are similar blocks.
One note worth thinking now that this is faster, I think #729 and #751 could be fixed with a simple skos:prefLabel|skos:altLabel change in the appropriate spot which you have already identified in the comment section of the latter. I leave this for you to decide if you want to add that to this PR given that I presume the ten-fold increase in performance could handle that/those use cases with no problems, now.
Lastly, I checked the code for JenaTextSparql and there was one place in code where the original issue was already fixed in the exact same way it is in this PR, so no further modifications are necessary.
I benchmarked this change using KANTO and the alphabetical index page for the letter A. Before the PR, the query took 48s. After the change it was 12s. So a 4x improvement. But the change you made is only relevant when using the Generic dialect. In the JenaText version of the alphabetical letter query, GRAPH is already used. I suspect you are using the Generic dialect, which is maybe not a great idea with large vocabularies such as KANTO/finaf.
I leave that to another PR which must be separately benchmarked.
Right. I think there is some room for refactoring now that the Generic and JenaText variants of the query are closer aligned. |
It turns out that the existing SPARQL query to generate the letters of the alphabet for the alphabetical index is quite slow when using Fuseki TDB2. Switching to GRAPH instead of FROM makes it a lot faster.
See this thread on the Jena users list for some more background.