You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The well known performance issue with fetch joins on collection-like relations (*-to-many) is that the result set will contain repeated data. The simple case being an entity with one of its collection relations fetched which will then cause the (root) entity data to be repeated as many times as there are items in the collection.
It get obviously exponentially worse when nested or sibling collections are also fetched. This issue could BTW be avoided with a feature like this one (not merged yet) #1569 .
That being said, I've discovered another performance issue last week, which does not need thousands of rows to be noticeable. It occurs when the repeated data are mapped to a type with a costly conversion. In my case this is a json_document column, this type is provided by https://github.com/dunglas/doctrine-json-odm .
More concretly my query targets the table A, and fetch a collection of table B records. A contains a json_document column and for each A record there are ~66 B records.
The result set then contains 526 rows representing 526 distinct B records and only 8 distinct A records.
The query was to slow to meet my requirements and after having profiled that (in tracing mode to get actual call counts & sampling mode to get accurate timings) I've discovered that 70% of the hydration time is taken by JsonDocumentType::convertToPHPValue() which is called 526 times (instead of 8 as logically required).
And this slowness has obviously more to do with the fact that JsonDocumentType::convertToPHPValue() is called 526 times where it could be called only 8 times than with the own slowness of JsonDocumentType::convertToPHPValue() which is a bit expected and with probably not so much room for optimizations.
The problem lies in the fact that AbstractHydrator::gatherRowData() which is, among other things, responsible to call the proper type conversion function for each column of the given row, does not do any caching logic to avoid doing this process multiple times for the same logical set of column values (representing the root or a fetched relation AKA DQL alias).
So I've written my own hydrator, which extends ObjectHydrator and after having copy/pasted the gatherRowData() function I've added this caching logic.
When used, I've observed a 78% decrease of the hydrateAllData() cost. Here is a table below with more details:
Metric
Original hydrator
Custom hydrator
Decrease (%)
hydrateAllData() wall time (ms)
232
52
-78%
gatherRowData() call count
526
526
-
gatherRowData() wall time (ms)
200
26
-87%
JsonDocumentType::convertToPHPValue() call count
526
8
-
JsonDocumentType::convertToPHPValue() wall time (ms)
162
5
-97%
My questions:
am I doing something wrong here ?
if not, shouldn't this caching logic be added to the original gatherRowData() implementation (considering its simplicity and the fact that this use case may not be so uncommon) ?
<?phpnamespaceApp\Framework\DoctrineExtensions\ORM;
useDoctrine\ORM\Internal\Hydration\ObjectHydrator;
class OptimizedObjectHydrator extends ObjectHydrator
{
protectedfunctiongatherRowData(array$data, array &$id, array &$nonemptyComponents)
{
$rowData = ['data' => []];
// this first $data traversal is required in order to have the ids resolved ahead of the// rest of the processing since we need them in order to compute the cache keysforeach ($dataas$key => $value) {
if (($cacheKeyInfo = $this->hydrateColumnInfo($key)) === null) {
continue;
}
switch (true) {
caseisset($cacheKeyInfo['isNewObjectParameter']):
caseisset($cacheKeyInfo['isScalar']):
break;
default:
$dqlAlias = $cacheKeyInfo['dqlAlias'];
if ($cacheKeyInfo['isIdentifier'] && $value !== null) {
$id[$dqlAlias] .= '|' . $value;
$nonemptyComponents[$dqlAlias] = true;
}
break;
}
}
// this flag serves as a caching toggle$cachingEnabled = true;
if ($cachingEnabled) {
// building the cache keys$processedRowCacheKeys = [];
foreach ($idas$dqlAlias => $entityId) {
$processedRowCacheKeys[$dqlAlias] = \sprintf(
'%s::%s::%s::%s',
self::class,
'processedRows',
$dqlAlias,
$entityId
);
}
// pre-fill $rowData['data'] from cache if HIT for each DQL aliasforeach ($idas$dqlAlias => $entityId) {
if (isset($this->_cache[$processedRowCacheKeys[$dqlAlias]])) {
$rowData['data'][$dqlAlias] = $this->_cache[$processedRowCacheKeys[$dqlAlias]];
}
}
}
foreach ($dataas$key => $value) {
if (($cacheKeyInfo = $this->hydrateColumnInfo($key)) === null) {
continue;
}
$fieldName = $cacheKeyInfo['fieldName'];
switch (true) {
caseisset($cacheKeyInfo['isNewObjectParameter']):
$argIndex = $cacheKeyInfo['argIndex'];
$objIndex = $cacheKeyInfo['objIndex'];
$type = $cacheKeyInfo['type'];
$value = $type->convertToPHPValue($value, $this->_platform);
$rowData['newObjects'][$objIndex]['class'] = $cacheKeyInfo['class'];
$rowData['newObjects'][$objIndex]['args'][$argIndex] = $value;
break;
caseisset($cacheKeyInfo['isScalar']):
$type = $cacheKeyInfo['type'];
$value = $type->convertToPHPValue($value, $this->_platform);
$rowData['scalars'][$fieldName] = $value;
break;
//case (isset($cacheKeyInfo['isMetaColumn'])):default:
$dqlAlias = $cacheKeyInfo['dqlAlias'];
$type = $cacheKeyInfo['type'];
// If there are field name collisions in the child class, then we need// to only hydrate if we are looking at the correct discriminator valueif (isset($cacheKeyInfo['discriminatorColumn'], $data[$cacheKeyInfo['discriminatorColumn']])
&& !\in_array((string) $data[$cacheKeyInfo['discriminatorColumn']], $cacheKeyInfo['discriminatorValues'], true)
) {
break;
}
// in an inheritance hierarchy the same field could be defined several times.// We overwrite this value so long we don't have a non-null value, that value we keep.// Per definition it cannot be that a field is defined several times and has several values.// /!\ This test is also required to take advantage of the processed row cache.if (isset($rowData['data'][$dqlAlias][$fieldName])) {
break;
}
$rowData['data'][$dqlAlias][$fieldName] = $type
? $type->convertToPHPValue($value, $this->_platform)
: $value;
break;
}
}
if ($cachingEnabled) {
// save $rowData['data'] to cache if MISS for each DQL aliasforeach ($idas$dqlAlias => $entityId) {
if (!isset($this->_cache[$processedRowCacheKeys[$dqlAlias]])) {
$this->_cache[$processedRowCacheKeys[$dqlAlias]] = $rowData['data'][$dqlAlias];
}
}
}
return$rowData;
}
}
Yes in both cases there is a costly conversion involved, but in my case the conversion cost on its own is not noticeable. It is actually noticeable because it is unnecessarily repeated.
The well known performance issue with fetch joins on collection-like relations (*-to-many) is that the result set will contain repeated data. The simple case being an entity with one of its collection relations fetched which will then cause the (root) entity data to be repeated as many times as there are items in the collection.
It get obviously exponentially worse when nested or sibling collections are also fetched. This issue could BTW be avoided with a feature like this one (not merged yet) #1569 .
That being said, I've discovered another performance issue last week, which does not need thousands of rows to be noticeable. It occurs when the repeated data are mapped to a type with a costly conversion. In my case this is a
json_document
column, this type is provided by https://github.com/dunglas/doctrine-json-odm .More concretly my query targets the table A, and fetch a collection of table B records. A contains a
json_document
column and for each A record there are ~66 B records.The result set then contains 526 rows representing 526 distinct B records and only 8 distinct A records.
The query was to slow to meet my requirements and after having profiled that (in tracing mode to get actual call counts & sampling mode to get accurate timings) I've discovered that 70% of the hydration time is taken by
JsonDocumentType::convertToPHPValue()
which is called 526 times (instead of 8 as logically required).And this slowness has obviously more to do with the fact that
JsonDocumentType::convertToPHPValue()
is called 526 times where it could be called only 8 times than with the own slowness ofJsonDocumentType::convertToPHPValue()
which is a bit expected and with probably not so much room for optimizations.The problem lies in the fact that
AbstractHydrator::gatherRowData()
which is, among other things, responsible to call the proper type conversion function for each column of the given row, does not do any caching logic to avoid doing this process multiple times for the same logical set of column values (representing the root or a fetched relation AKA DQL alias).So I've written my own hydrator, which extends
ObjectHydrator
and after having copy/pasted thegatherRowData()
function I've added this caching logic.When used, I've observed a 78% decrease of the
hydrateAllData()
cost. Here is a table below with more details:hydrateAllData()
wall time (ms)gatherRowData()
call countgatherRowData()
wall time (ms)JsonDocumentType::convertToPHPValue()
call countJsonDocumentType::convertToPHPValue()
wall time (ms)My questions:
gatherRowData()
implementation (considering its simplicity and the fact that this use case may not be so uncommon) ?The custom hydrator (original gatherRowData() implementation):
The Doctrine bundle configuration:
When querying:
The text was updated successfully, but these errors were encountered: