-
Notifications
You must be signed in to change notification settings - Fork 133
Improve performance of working with large numbers of memberships #555
Conversation
983a407
to
671a3c0
Compare
I have been profiling this change and the results are not good. I am calling On a warm cache I am getting an average time of ~125ms on the original approach, while getting ~160ms after this change. So this PR is actually making it run SLOWER. I am removing the Ready For Review label since this clearly needs more work. I will do more in depth profiling to see what is going on. My gut tells me that probably the references to the OgRole entities are already preloaded on the OgMembership entities, and that our more complex code path is just making it slower because the loading of the roles is still taking place. Or possibly the heavy reliance on regular expressions is somehow slower than getting results from the database. |
Some deeper profiling results on warm cache: Common functionality:
Original approach
New approach
So it seems that I am very curious to see why it takes 100ms to run |
Results are in for
So it turns out that the vast majority of the time is spent by retrieving field data through The good news is that this will be pretty easy to avoid. We do not need to call the expensive |
I have refactored I also found out why the time spent inside So the total time spent is now only 0.69ms, where before it was 122.13ms, or 177x faster!! |
Good catch! |
175 test failures! 😅 It appears I made a mistake in the type casting. |
When a field value is populated on an entity from the database the property value is stored as a flat scalar value, while if the value is set through the field API the value is an array. See SqlContentEntityStorage::mapFromStorageRecords() vs FieldItemList::setValue().
…alue array will be empty but the field item list will be populated.
… if they are populated.
I made a long comment here with some details about the changes that are made: #555 (comment) Here is a summary of the impact of this change: This is a hackThis approach is relying on the internal "original values" of content entities which only has rudimentary support in the Entity API. So this is technically a hack. Instead of relying on the Field API which does a slow discovery of the features of every field by loading the field definitions and field storage definitions, we are reading the raw data directly from the location where it is set by the database layer. We have knowledge about how our fields are defined so we do not require the field definitions to be discovered. BenefitThis PR results in a pretty dramatic performance gain for membership entities that are being read from the database and have their properties read but not set. This is the most common case since GET requests are much more common than POST requests. In testing the processing of data of 1000+ memberships in RiskThere is a risk that the underlying implementation will change in core in the future. In that case we will be forced to rework this. I consider this pretty low risk since it would mean that the field storage and instantiation of entities has to be rewritten. This is not likely to happen in the lifetime of Drupal 8 and Drupal 9. Risk mitigationI have taken care to fall back to the official Field API when the field values have been populated by the calling code. So the "hacky code" is only executed when there is an actual speed gain to be realized. In practice this means that as soon as a value on the entity is set the Field API will be used. Setting values is done when memberships are being created or updated. So the business critical action of writing membership data to the database is still handled in the traditional way. Only reading of unchanged memberships is now sped up. Alternative approachSo what would be a "clean" way to achieve a similar speed up? I think we could implement something similar to the We could provide an |
I ran this patch on my "large" project with ~12000 behat test steps and it came back green: https://app.continuousphp.com/git-hub/ec-europa/joinup-dev/build/2970f118-98f9-478a-ae7f-b6e11856f95f |
I posted this on Twitter yesterday and @Berdir mentioned that it looked risky to him at first glance. He also linked to an existing issue that is using a similar approach: https://www.drupal.org/project/drupal/issues/2580551 I still have to check that patch in detail but it is failing on Postgres. Our tests are currently passing but we are not testing with Postgres, so I would like to wait with this until we have more information on the potential risk and some test results on Postgres. |
I addressed the remarks. I didn't see the commit suggestions from @MPParsley unfortunately. |
Sorry, this might be a dump question but, do we test this somewhere?
in the Maybe I did not find it but I did not find any other call for the |
I mean, if we are going to go with direct queries, we should ensure the outcome of all of these. |
There are no queries being done in this PR, is this comment perhaps meant for #559 ? |
If we don't have a test for it then yes we should add one, but it's not really in scope of this PR. This is strictly refactoring, we are not introducing new features or public methods here. In test driven development you're not supposed to change existing tests when refactoring, because it makes it unclear if the refactoring has not changed the existing functionality :) |
OK the test for We should merge in the latest 8.x-1.x in here and check that everything still passes. |
It looks like all the remarks have been addressed. @idimopoulos or @MPParsley care for a fresh review? |
Thanks! This now unblocks #559 |
This solves 2 points of #487
If a user has many groups there is a performance hit taking place when the user's roles are being discovered in their memberships. Currently when role IDs are requested in
OgMembership::getRolesIds()
a call is made toOgMembership::getRoles()
which loads the fullOgRole
entities. For a user with many memberships this might cause hundreds or thousands ofOgRole
entities to be loaded. This is unneeded since the role name can be derived from the role ID which is present in the membership entity.