Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop MySQL query to Improve performance for reporting #33

Open
kylehuynh205 opened this issue Feb 22, 2024 · 0 comments
Open

Develop MySQL query to Improve performance for reporting #33

kylehuynh205 opened this issue Feb 22, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@kylehuynh205
Copy link
Contributor

kylehuynh205 commented Feb 22, 2024

About

  • Brief information about ark-services project
  • The database's design as Key and Value in Mysql has performance issue when the database is growing. Basically, the issue is in order to pull the report of Ark ID with metatdata, we have to do mysql JOINs, but they take so long to get the results.

How to reproduce

SELECT arks.* 
      FROM `dsu_ark`
      AS arks 
      JOIN ( 
        SELECT bound.id, 
        COALESCE(redirected._value, 0) 
        AS _value 
        FROM ( 
          SELECT DISTINCT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id 
          FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key NOT REGEXP '(\\s:\/c|\\sREDIRECT|\\sPID|\\sLOCAL_ID|\\sCOLLECTION)$' AND (_key LIKE '%%' OR _value LIKE '%%')
        ) AS bound 
        LEFT JOIN ( 
          SELECT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id, _value 
          FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key REGEXP '\\sREDIRECT$'
        ) AS redirected ON bound.id = redirected.id 
        ORDER BY _value ASC 
        LIMIT 10 
        OFFSET 0 
      ) AS subquery 
      ON arks._key LIKE CONCAT(subquery.id, '%')
      AND arks._key NOT LIKE '%:\/c'
      ORDER BY arks._key ASC;
 SELECT arks.* 
      FROM `dsu_ark`
      AS arks 
      JOIN ( 
        SELECT * FROM (
          SELECT DISTINCT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id
		      FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key NOT REGEXP '(\\s:\/c|\\sREDIRECT|\\sPID|\\sLOCAL_ID|\\sCOLLECTION)$' 
          INTERSECT
          SELECT DISTINCT REGEXP_SUBSTR(_key, '^([^\\s]+)') AS id
          FROM `dsu_ark`
          WHERE _key LIKE '61220/utsc%' AND _key NOT REGEXP '\\s:\/c' AND (_key LIKE '%%' OR _value LIKE '%%')
        ) AS target
        ORDER BY id ASC 
        LIMIT 10
        OFFSET 0
      ) AS subquery 
      ON arks._key LIKE CONCAT(subquery.id, '%') 
      AND arks._key NOT LIKE '%:\/c' 
      ORDER BY arks._key ASC;

  • These query is querying the ARKs IDs with metadata which is bound to it. As LIMIT 10 which is querying 10 at a time, but it's take at least 15 secs to load which is quite slow for just 10 items.

Goals

  • Develop MySQL query which can pull Arks IDs and their metadata more sufficiently in a acceptable loading time
    • as 10, 25, 50 at a time (from the UI)
    • All items (exporting to CSV feature)

Stretch Goals

  • If the Goal above can't be achieved, we may look into restructure the database and look for alternative database platform to use
  • Scope the migration and development, and testing performance in the new database platform, for example:

Related tickets

@kylehuynh205 kylehuynh205 added enhancement New feature or request good first issue Good for newcomers labels Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: No status
Development

No branches or pull requests

1 participant