-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache terms returned by IslandoraUtils::getTermForUri() (Islandora#2272) #997
base: 2.x
Are you sure you want to change the base?
Conversation
The tests failing seem to be getting a 500 response when generating derivatives along with some deprecations. I'll try figure out if these are related to the changes I've proposed |
We've deployed this change to a site receiving a decent amount of traffic. Prior to the change our slow query log (analysed using https://docs.percona.com/percona-toolkit/pt-query-digest.html) showed a total exec time of 229853s for slow queries, with the top offender being the lookup for the "http://pcdm.org/use#ServiceFile" term. After deploying the change these queries disappear from our slow query log and the total exec time of slow queries drops to 2536s, a 98.9% drop in time spent executing queries on the db server. |
I find myself wondering: Could this also be addressed by introducing a better DB index? As in, Drupal by default does truncated index things, using the first 30 characters, likely not expecting a significant number of lookups on fields of this type (instead, expecting "links" to be hyperlinks to be rendered, not so much a store of identifiers to be looked up): https://git.drupalcode.org/project/drupal/-/blob/10.2.x/core/modules/link/src/Plugin/Field/FieldType/LinkItem.php?ref_type=heads#L79-81 If URLs end up all having similar prefixes, resulting in constant table scans which might be expected to cause slow downs. Could potentially add in another index that makes use of the full URL value? ... at the same time, I start thinking: If we want caching on one method, why not cache as many as possible? Somewhat of mad science, but something of a shot at things in: 2.x...adam-vessey:islandora:madscience Somewhat of a mess decorating with a caching proxy, but, should handle things, in theory? |
5dfc8dc
to
14a6715
Compare
Discussed at the Committers Call. This sounds like a real problem that's related to our unconventional use of the Link field type. There was a strong lean toward looking into just improving database indexing. I'll test and review. |
This PR dramatically improves the speed. Made a simple drush script
Then ran it and it took 17s to complete
Applied this PR and ran again and execution dropped to 1s
It's definitely the query here causing the slowdown, as if I replace the query with Here's the query that entityTypeManager is running:
I think due to the orGroup this is slowing down these queries. It'd actually appears be more efficient to run several queries. diff --git a/src/IslandoraUtils.php b/src/IslandoraUtils.php
index a2df7589..954f768d 100644
--- a/src/IslandoraUtils.php
+++ b/src/IslandoraUtils.php
@@ -246,24 +246,19 @@ class IslandoraUtils {
// Add field_external_uri.
$fields[] = self::EXTERNAL_URI_FIELD;
- $query = $this->entityTypeManager->getStorage('taxonomy_term')->getQuery();
-
- $orGroup = $query->orConditionGroup();
+ $storage = $this->entityTypeManager->getStorage('taxonomy_term');
foreach ($fields as $field) {
- $orGroup->condition("$field.uri", $uri);
- }
-
- $results = $query
- ->accessCheck(TRUE)
- ->condition($orGroup)
- ->execute();
-
- if (empty($results)) {
- return NULL;
+ $query = $storage->getQuery();
+ $results = $query
+ ->accessCheck(TRUE)
+ ->condition("$field.uri", $uri)
+ ->execute();
+ if (!empty($results)) {
+ return $storage->load(reset($results));
+ }
}
- return $this->entityTypeManager->getStorage('taxonomy_term')
- ->load(reset($results));
+ return NULL;
}
/**
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should simplify the query for speed improvements instead of adding a cache layer. See #997 (comment) for more details
14a6715
to
dcbddf3
Compare
Hi @joecorall, I agree with simplifying the query rather than using a cache layer here- I will create a separate pull request with the suggestions you've made and test it out with some heavy traffic. |
GitHub Issue: (link)
Release pull requests, etc.)
What does this Pull Request do?
Caches the term found when calling IslandoraUtils::getTermForUri().
Cache ID is based on user id because the db query has an access check.
What's new?
How should this be tested?
A description of what steps someone could take to:
Documentation Status
Additional Notes:
I'm fairly new to using the Cache API so would appreciate any advice you have. In the meantime I'll try find example test cases so that this can be added to the test suite.
Interested parties
@Islandora/committers