Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: do a single query to get all resource ids instead of a recursiv… #1132

Merged

Conversation

gabrielfs7
Copy link
Contributor

@gabrielfs7 gabrielfs7 commented Sep 25, 2024

https://oat-sa.atlassian.net/browse/ADF-1686
https://oat-sa.atlassian.net/browse/ADF-1805

Goal

The current getInstances() was executing behind the scenes several queries and loading objects into memory just to build the the predicate/object comparison as show bellow. For instance, if I have a class with 100 subclasses, it was executing 100 queries and loading 100 objects in memory. Not to mentioned that also uses grouping and HAVING, so bringing more results than necessary.

*This is how the query was looking like on the server side.

SELECT "subject"
FROM ( (SELECT DISTINCT "subject"
        FROM "statements"
        WHERE "predicate" = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
          AND ("object" IN ('...n', 'http://www.tao.lu/Ontologies/TAOItem.rdf#Item') AND "modelid" IN (1, 2)))) AS unionq
GROUP BY "subject"
HAVING COUNT(*) >= 1;

Solution

Use a single recursive query, avoiding hundreds of extra queries and objects to be load in memory. After this new query was introduced we had an improvement of 200% in RAM for the DB servers.

The new query is more expensive if the class select has little nested classes, but for huge databases, it is more efficient.

  • It takes 10sec to go over a database container 2610116 records in statements table, from root class item to the last class level, for all levels.
  • In this database there are 15200+ nested classes, so without this query, we would have executed 152k queries and loaded 152k classes objects in memory.
  • The query then returns 86k results
EXPLAIN ANALYZE WITH RECURSIVE statements_tree AS (
    SELECT
        r.subject,
        r.predicate
    FROM statements r
    WHERE r.subject = 'http://www.tao.lu/Ontologies/TAOItem.rdf#Item'
      AND r.predicate IN ('http://www.w3.org/2000/01/rdf-schema#subClassOf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type')
    UNION ALL
    SELECT
        s.subject,
        s.predicate
    FROM statements s
        JOIN statements_tree st
            ON s.object = st.subject
    WHERE s.predicate IN ('http://www.w3.org/2000/01/rdf-schema#subClassOf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type')
) SELECT subject FROM statements_tree WHERE predicate = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type';

Totals:

Screenshot 2024-10-01 at 16 21 20
Screenshot 2024-10-01 at 16 16 50

Query analysis

Screenshot 2024-10-01 at 16 14 46
Screenshot 2024-10-01 at 16 15 58

Related PRs

@gabrielfs7 gabrielfs7 marked this pull request as draft September 27, 2024 06:57
@gabrielfs7 gabrielfs7 marked this pull request as ready for review October 1, 2024 11:53
Copy link

github-actions bot commented Oct 1, 2024

Version

Target Version 15.38.0
Last version 15.37.0

There are 0 BREAKING CHANGE, 1 feature, 0 fix

@gabrielfs7 gabrielfs7 merged commit e5480a8 into develop Oct 4, 2024
5 checks passed
@gabrielfs7 gabrielfs7 deleted the feat/ADF-1686/improve-resources-query-performance branch October 4, 2024 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants