Skip to content
This repository has been archived by the owner on Aug 18, 2024. It is now read-only.

Improve performance of working with large numbers of memberships #555

Merged
merged 13 commits into from
Aug 8, 2019

Conversation

pfrenssen
Copy link
Contributor

@pfrenssen pfrenssen commented Jul 23, 2019

This solves 2 points of #487

If a user has many groups there is a performance hit taking place when the user's roles are being discovered in their memberships. Currently when role IDs are requested in OgMembership::getRolesIds() a call is made to OgMembership::getRoles() which loads the full OgRole entities. For a user with many memberships this might cause hundreds or thousands of OgRole entities to be loaded. This is unneeded since the role name can be derived from the role ID which is present in the membership entity.

@pfrenssen
Copy link
Contributor Author

I have been profiling this change and the results are not good. I am calling OgRoleCacheContext::getContext() for a user with 1000+ memberships and am profiling the time taken inside the foreach() that loops over the memberships.

On a warm cache I am getting an average time of ~125ms on the original approach, while getting ~160ms after this change. So this PR is actually making it run SLOWER.

I am removing the Ready For Review label since this clearly needs more work. I will do more in depth profiling to see what is going on. My gut tells me that probably the references to the OgRole entities are already preloaded on the OgMembership entities, and that our more complex code path is just making it slower because the loading of the roles is still taking place. Or possibly the heavy reliance on regular expressions is somehow slower than getting results from the database.

@pfrenssen
Copy link
Contributor Author

pfrenssen commented Jul 24, 2019

Some deeper profiling results on warm cache:

Common functionality:

  • $this->membershipManager->getMemberships($this->user->id()) - 72ms (of which 6ms querying, and 66ms loading membership entities from cache).
  • Sorting the results + calculating the key: 0.61ms

Original approach

  • $membership->getRoles(): 121.20ms
  • Time spent inside array_map(): 0.93ms

New approach

  • $membership->getRolesIds(): 99.05ms
  • Time spent inside array_map(): 66.01ms

So it seems that getRolesIds() is only marginally quicker, and the processing inside the array_map() is a lot slower, causing the total result to be worse than before.

I am very curious to see why it takes 100ms to run ::getRolesIds() - this method is called around 3000x but it should just shuffle some data around, I don't understand why this would take 0.03ms per call. I will profile inside ::getRolesIds() now.

@pfrenssen
Copy link
Contributor Author

Results are in for OgMembership::getRolesIds(). Here are the cumulative values for ~3000 calls:

  • $this->get('roles')->getValue(): 39.24ms
  • array_map(): 0.05ms
  • $this->hasGroup(): 35.28ms
  • Assembling the ID: 25.11ms

So it turns out that the vast majority of the time is spent by retrieving field data through $this->get() calls, and more specifically inside ContentEntityBase::getTranslatedField() - this method will load the field definition and field storage definition to check if the field is translatable. These definitions are then cached inside the entity but they are never reused since our code is just looping over 1000+ memberships 1 single time.

The good news is that this will be pretty easy to avoid. We do not need to call the expensive ::getTranslatedField() method since we already know that these values are not translatable. We can get the values directly from the $this->values array.

@pfrenssen
Copy link
Contributor Author

I have refactored OgMembership to avoid calling $this->get() on non-translatable fields, and now the call to $membership->getRolesIds() went down to only 0.26ms (where before it was 99.05ms!)

I also found out why the time spent inside array_map() was so high: regexes make heavy use of assert() in PHP, and by turning off assertions the time went down from 66.01ms to only 0.43ms.

So the total time spent is now only 0.69ms, where before it was 122.13ms, or 177x faster!!

@MPParsley
Copy link
Collaborator

Good catch!

@pfrenssen
Copy link
Contributor Author

175 test failures! 😅 It appears I made a mistake in the type casting.

When a field value is populated on an entity from the database the property
value is stored as a flat scalar value, while if the value is set through
the field API the value is an array.

See SqlContentEntityStorage::mapFromStorageRecords() vs
FieldItemList::setValue().
…alue array will be empty but the field item list will be populated.
@pfrenssen pfrenssen changed the title Do not needlessly load OgRole objects when only the role name is required Improve performance of working with large numbers of memberships Jul 25, 2019
@pfrenssen
Copy link
Contributor Author

pfrenssen commented Jul 25, 2019

I made a long comment here with some details about the changes that are made: #555 (comment)

Here is a summary of the impact of this change:

This is a hack

This approach is relying on the internal "original values" of content entities which only has rudimentary support in the Entity API. So this is technically a hack.

Instead of relying on the Field API which does a slow discovery of the features of every field by loading the field definitions and field storage definitions, we are reading the raw data directly from the location where it is set by the database layer. We have knowledge about how our fields are defined so we do not require the field definitions to be discovered.

Benefit

This PR results in a pretty dramatic performance gain for membership entities that are being read from the database and have their properties read but not set. This is the most common case since GET requests are much more common than POST requests. In testing the processing of data of 1000+ memberships in OgRoleCacheContext goes down from 99ms to only 0.26ms, or 38077% faster.

Risk

There is a risk that the underlying implementation will change in core in the future. In that case we will be forced to rework this. I consider this pretty low risk since it would mean that the field storage and instantiation of entities has to be rewritten. This is not likely to happen in the lifetime of Drupal 8 and Drupal 9.

Risk mitigation

I have taken care to fall back to the official Field API when the field values have been populated by the calling code. So the "hacky code" is only executed when there is an actual speed gain to be realized.

In practice this means that as soon as a value on the entity is set the Field API will be used. Setting values is done when memberships are being created or updated. So the business critical action of writing membership data to the database is still handled in the traditional way. Only reading of unchanged memberships is now sped up.

Alternative approach

So what would be a "clean" way to achieve a similar speed up? I think we could implement something similar to the AccountProxy object that core provides. This is a fast placeholder for the real User entity that represents the currently logged in user.

We could provide an OgMembershipProxy object that implements OgMembershipInterface and is prepopulated with the untranslatable values for fast reading. Once one of the setter methods on the interface are called, it will load the full OgMembership entity and wrap it, from that moment on transparently passing through all get and set operations to the real entity. We can then rework the MembershipManager to return OgMembershipProxy objects by default. It would be a significant effort though and would break any existing code that assumes that full entities are returned.

@pfrenssen
Copy link
Contributor Author

I ran this patch on my "large" project with ~12000 behat test steps and it came back green: https://app.continuousphp.com/git-hub/ec-europa/joinup-dev/build/2970f118-98f9-478a-ae7f-b6e11856f95f

@pfrenssen
Copy link
Contributor Author

pfrenssen commented Jul 26, 2019

I posted this on Twitter yesterday and @Berdir mentioned that it looked risky to him at first glance. He also linked to an existing issue that is using a similar approach: https://www.drupal.org/project/drupal/issues/2580551

I still have to check that patch in detail but it is failing on Postgres. Our tests are currently passing but we are not testing with Postgres, so I would like to wait with this until we have more information on the potential risk and some test results on Postgres.

src/Entity/OgMembership.php Outdated Show resolved Hide resolved
src/Entity/OgMembership.php Outdated Show resolved Hide resolved
src/Entity/OgMembership.php Outdated Show resolved Hide resolved
@pfrenssen
Copy link
Contributor Author

I addressed the remarks. I didn't see the commit suggestions from @MPParsley unfortunately.

@idimopoulos
Copy link
Contributor

idimopoulos commented Aug 1, 2019

Sorry, this might be a dump question but, do we test this somewhere?
Apart from the prophesies in your test where you define what will be returned, there is only the following case

// Remove a role by ID.
    $membership->revokeRoleById($group_member->id());
    $roles_ids = $membership->getRolesIds();
    $this->assertFalse(in_array($group_member->id(), $roles_ids), 'The group member role has been revoked.');

in the \Drupal\Tests\og\Kernel\Entity\OgMembershipRoleReferenceTest::testRoleCreate where the ::getRolesIds is called, and here, if the method fails to create a proper role ID, the ::assertFalse will anyway succeed.
Shouldn't we have a test for getRolesIds?

Maybe I did not find it but I did not find any other call for the ::getRolesIds.

@idimopoulos
Copy link
Contributor

idimopoulos commented Aug 1, 2019

I mean, if we are going to go with direct queries, we should ensure the outcome of all of these.

@pfrenssen
Copy link
Contributor Author

I mean, if we are going to go with direct queries, we should ensure the outcome of all of these.

There are no queries being done in this PR, is this comment perhaps meant for #559 ?

@pfrenssen
Copy link
Contributor Author

pfrenssen commented Aug 2, 2019

Shouldn't we have a test for getRolesIds?

If we don't have a test for it then yes we should add one, but it's not really in scope of this PR. This is strictly refactoring, we are not introducing new features or public methods here.

In test driven development you're not supposed to change existing tests when refactoring, because it makes it unclear if the refactoring has not changed the existing functionality :)

@pfrenssen
Copy link
Contributor Author

OK the test for ::getRolesIds() that was requested by @idimopoulos has been merged in: #561.

We should merge in the latest 8.x-1.x in here and check that everything still passes.

@pfrenssen
Copy link
Contributor Author

It looks like all the remarks have been addressed. @idimopoulos or @MPParsley care for a fresh review?

@pfrenssen
Copy link
Contributor Author

Thanks! This now unblocks #559

@pfrenssen pfrenssen merged commit c884815 into 8.x-1.x Aug 8, 2019
@pfrenssen pfrenssen deleted the do-not-load-role-objects branch August 8, 2019 07:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Drupal 8 Performance Performance issues or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants