-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Object Hydration performance on large result sets #10035
Conversation
Hey @tucksaun, this looks like fantastic work! Thank you! It will take some time to review the PR through as its touching essentially everything about hydrators, so I am not sure when we will get to it at this time. |
I found this PR in relation to an issue we've been having when hydrating a graph with a couple left joins. Specifically that Doctrine calls convertToPHPValue() on all the NULL values for each column in any empty left join records, prior to hydrating the entities and then discarding those empty records. We noticed this because we have a custom data type that throws an exception if it encounters a NULL (the fix for now was to eliminate the check). But I was wondering whether it is necessary for the calls to convertToPHPValue() up front when ultimately the record ID might be NULL and the empty data set gets thrown in the trash, or could the code could be reworked to only convert the values for columns belonging to entities that will actually be hydrated? |
@beberlei sure, I understand as this is quite a change. I will try to maintain this PR as up-to-date as possible in the meantime. |
@sips-richard I would say that the hydrator has to maintain the call to |
bd6be8e
to
898cf1e
Compare
898cf1e
to
bd8be73
Compare
Hi! |
3ee9ab1
to
423b645
Compare
@greg0ire I though I already fixed them all and only casually rebased this PR today so thank you for the heads up about he CS issues. |
423b645
to
95213ea
Compare
$newObjectsData[$objIndex]['args'][$argIndex] = $value; | ||
} | ||
|
||
/** @psalm-suppress LessSpecificReturnStatement Psalm does not detect that args keys is always defined */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't suppress Psalm errors via comments. We usually configure this in psalm.xml.dist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derrabus I don't manage to suppress it for this specific method inside psalm.xml
.
And to suppress it for the whole file looks a bit too wide to me.
Do you have any recommendations?
Or should I add this to the baseline?
Or does it mean this is a legitimate use case to you?
$resultKey = isset($resultSetMapping->indexByMap['scalars']) | ||
? $row[$resultSetMapping->indexByMap['scalars']] | ||
: $this->resultCounter - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$resultKey = isset($resultSetMapping->indexByMap['scalars']) | |
? $row[$resultSetMapping->indexByMap['scalars']] | |
: $this->resultCounter - 1; | |
$resultKey = $resultSetMapping->indexByMap['scalars'] ?? $this->resultCounter - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum does not look correct to me as we are accessing $row
, not only $resultSetMapping->indexByMap
? 🤔
* | ||
* @psalm-return array<array-key, array<array-key, mixed>> An array with all the ids (name => value) of the row. | ||
*/ | ||
protected function gatherRowIDs(array $data, array &$id, array &$nonemptyComponents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected function gatherRowIDs(array $data, array &$id, array &$nonemptyComponents) | |
private function gatherRowIDs(array $data, array &$id, array &$nonemptyComponents): array |
* | ||
* @psalm-return array<array-key, mixed> | ||
*/ | ||
protected function gatherRowDataFromScalarsMapping(array $data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected function gatherRowDataFromScalarsMapping(array $data) | |
private function gatherRowDataFromScalarsMapping(array $data): array |
* | ||
* @psalm-suppress MoreSpecificReturnType Psalm does not detect that args keys is always defined | ||
*/ | ||
protected function gatherRowDataFromNewObjectsMapping(array $data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected function gatherRowDataFromNewObjectsMapping(array $data) | |
private function gatherRowDataFromNewObjectsMapping(array $data): array |
@tucksaun Absolutely this will speed up our code, but I will try to clarify on our problem scenario because I think a change in the internals might be beneficial here. Our ORM query included several left joins. Internally Doctrine was passing all the null values for any non-existent left join entities to the convertToPHPValue() method and only determining afterward that there was no relationship and then throwing these values away. The My question therefore is: can the logic be reworked in such a way that Doctrine could determine whether or not the ID columns represent a joined entity or an empty left join up front, prior to any hydration. It would save a lot of function calls where there are empty left joins for a start. |
95213ea
to
890d7d7
Compare
Do not store $resultPointers in a property as this is slower to access than a local variable
Cache the result of $this->resultSetMapping() to avoid repeating null checks as well as method calls over and over again
Rework how the hydrator works for better performance by moving most of the column information gathering before the result iteration so that the gathering is not done X times. Additionally, this allows to group column info by dql alias which in turns allows to get easily the column list associated with a given dql alias. This allows to conver the column data lazily, only when needed. When iterating over a result set representing a high cardinality relation, this avoid using and converting over and over again each column for a given entity. The only data we systematically need is the identifiers columns in order to determine if the entity is already present in the identity map. Lowering the impact of hydration for such high cardinality relation lower the thresold where a 2 step hydration process is required.
PHP Reflection API is known to not be the fastest. This is OK when only a couple of callls are made but here it is used in a tight loop whereas references to values fetched via Reflection are already available or not even used. Moving the use as late as possible allows to optimize performance a bit.
The current implementation uses collections object being hydrated as array, this introduces several calls to ArrayAccess implementation causing a noticeable overhead especially on big collections. While those checks would seem necessary in a regular use of collections they seem a bit too much in the hydration process where we can check differently the integrity of the collection.
890d7d7
to
b7aaf10
Compare
It looks very close to the issue I've described here #8390 Does my workaround make sense ? Should it be implemented into ORM ? |
There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. |
This pull request was closed due to inactivity. |
Hi folks!
While assessing a client's project performance I found some optimizations regarding the ORM hydration process that improves speed when working with large result sets and collections.
Expand to get some context on how we reached this.
The context is about online carts on a B2B e-commerce platform with a high number of items and complex personalization logic. We reached the point in the optimization process where most of the time was spent in the ORM (after serialization) when fetching a cart.This makes sense because of the volume and the logic at stake.
Denormalization or caching does not make sense for us because of the low read/write ratio and the business logic involved so we first went with the 2-step hydration technic.
While analyzing profiling results we more or less came to the same conclusion as #8390: the hydration process converts data over and over again even if the entity is already hydrated>
But after trying to work around performance for specific DBAL types that were costly to convert (namely Symfony's UUID and Dunglas's JSON document) and digging deeper into profiles I concluded something could be done directly in the Hydrator for the benefit of everyone instead of writing our specific optimized hydrator:
This PR acts according to one consideration: most parts of the hydration process are not slow per se but are repeated for every row and column. By applying some small optimizations, moving some code before the first iteration, or some code at the very last moment, we can improve the overall process.
The more rows and columns, the more the process is accelerated.
For examples:
With this patch, I manage to reach the following results:
->getQuery()->getResult()
Doctrine\ORM\PersistentCollection::*
TypedNoDefaultReflectionProperty::getValue
Doctrine\ORM\Internal\Hydration\AbstractHydrator::resultSetMapping
Doctrine\ORM\Internal\Hydration\AbstractHydrator::hydrateColumnInfo
Doctrine\DBAL\Types\IntegerType::convertToPHPValue
Doctrine\DBAL\Types\StringType::convertToPHPValue
Doctrine\DBAL\Types\BooleanType::convertToPHPValue
Doctrine\DBAL\Types\DateTimeImmutableType::convertToPHPValue
App\DBAL\Types\UuidType::convertToPHPValue
I hope we can manage to get this work so that everyone enjoys better performance 🤗
If you are willing to move in this direction, we can probably adapt ArrayHydrator the same way.