Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doctrine NOT throwing an exception when attempting to re-persist a detached entity; instead creating duplicate record. #8007

Open
BusterNeece opened this issue Jan 31, 2020 · 23 comments

Comments

@BusterNeece
Copy link

Bug Report

Q A
BC Break unknown
Version 2.7.0

Summary

After implementing the doctrine-batch-utils iterator to process records in one section of my application, I'm discovering that the em->clear() called during that process detaches entities across the system. If those entities are then re-persisted, they are treated as new entities and given new IDs in the database. There is no exception thrown indicating that these are detached entities, or indicating that this problem might happen. The end result is the unintentional cloning of one record x number of times, where x is the number of times this portion of code has run.

The current documentation explicitly states that if you are attempting to persist an entity that is detached, that an exception will be thrown preventing this operation.

My experiences have shown that this exception is not thrown, and that instead a new record is created. The disagreement between the actual behavior and the documentation has caused me an immense deal of unnecessary pain today. The decision on Doctrine's part to deprecate entity-specific clear() calls in favor of a system-wide clear only makes this situation that much more likely to happen in the future.

Current behavior

When attempting to persist an entity that was made detached by an earlier call to em->clear(), rather than throwing an exception upon the next flush() call, it allows it and treats it as if it's a new record being added to the database, giving it a new auto-assigned ID but otherwise being a complete duplicate.

How to reproduce

Here's an example snippet of a very easy way that I have elicited this problem in my application. It's this exact kind of logic loop that led to the problem in question:

<?php
        $stations = $em->getRepository(\App\Entity\Station::class)->findAll();

        foreach ($stations as $station) {
            $station->setName($station->getName() . ' Modified');
            $em->persist($station);

            $em->flush();

        // Other batch processing operations...

            $em->clear();
        }
?>

The issue is completely resolved by reloading the $station variable at the start of each foreach (i.e. $station = $em->find(Station::class, $station->getId())), but the application currently offers no warning whatsoever that, because the loop variable has become detached, it's about to be recreated as a new entity instead.

Expected behavior

As suggested in the documentation, this should instead trigger an exception which can easily be handled by the developer.

@BusterNeece BusterNeece changed the title Doctrine NOT throwing an exception when attempting to re-persist a detached entity. Doctrine NOT throwing an exception when attempting to re-persist a detached entity; instead creating duplicate record. Jan 31, 2020
@Ocramius
Copy link
Member

Unlikely adapting ORM to docs here: this behaviour has been expected/stable for a long time, and detached entities are not recognizable anyway.

Docs will need adjustments.

@BusterNeece
Copy link
Author

@Ocramius So if you're saying that this behavior (simply creating a duplicate record, despite the existing one already having an ID assigned to it, with no exception or notice or anything else) is expected...

...is there any way to not make it the expected behavior?

Honestly, I would greatly prefer the documented behavior (an exception being thrown). If you have a situation where a routine task is re-persisting detached entities, you can quickly go from 2 to 4 to 8 to hundreds of those duplicated entities, resulting in a great deal of damage to have to clean up.

It should not be this easy to write something that's as straightforward code-wise as my example loop, and yet have it be subject to an error condition that could cause such huge (and hard to figure out) problems.

As I mentioned, because clearing the EM for specific entities is being deprecated in 3.0 in favor of clearing the entire memory only increases the likelihood that this kind of error will be run into. In fact, it's exactly because of this deprecation that I switched to the batch-utils function in the first place, which also resulted in me switching from a much more selective EM clear to clearing all EM-managed records.

The combination of the two (forcing the EM clear() to be broader than desired, and then no option whatsoever to throw an exception when accidentally re-persisting detached entities) seems like a recipe for disaster.

@Ocramius
Copy link
Member

...is there any way to not make it the expected behavior?

Nope.

Even if there was, docs follow behaviour, not the way around: changing observed behaviour would be a major BC break

@BusterNeece
Copy link
Author

@Ocramius Okay...how about in later versions? i.e. 2.8.x or 3.0.x?

I appreciate brevity in many cases, but I feel like "Nope" doesn't do much to address the notion that what I'm describing is an extremely easy trap for a developer (even one who's used Doctrine for years, as I have) to fall into, that it is reasonable to hope that an ORM would at least warn about this (if not stopping it altogether), and that the likelihood of this is increased the more users are encouraged to clear the entire EntityManager rather than specific entity types, as will soon be the case.

@Ocramius
Copy link
Member

It is not possible to track detached entities: they are detached, unknown to the ORM.

In UnitOfWork, we do indeed state this:


                case self::STATE_DETACHED:
                    // Can actually not happen right now as we assume STATE_NEW,
                    // so the exception will be raised from the DBAL layer (constraint violation).
                    throw ORMInvalidArgumentException::detachedEntityFoundThroughRelationship($assoc, $entry);
                    break;

case self::STATE_DETACHED:
// Can actually not happen right now since we assume STATE_NEW.
throw ORMInvalidArgumentException::detachedEntityCannot($entity, 'persisted');

@Ocramius
Copy link
Member

Re-opening this: it is indeed requiring documentation adjustments

@Ocramius Ocramius reopened this Jan 31, 2020
@BusterNeece
Copy link
Author

@Ocramius The message there says the DBAL should be throwing an exception instead...so why isn't it in my example? The record it's trying to persist already has an assigned ID so presumably it's trying to execute an insert with that ID already explicitly set, which should be failing somewhere, even if not in the ORM.

@Ocramius
Copy link
Member

Not sure tbh - docs may as well be unclear there. I suggest digging in the test suite, to see if this behavior has been already checked there. We can probably drop that case condition overall, after that.

@BusterNeece
Copy link
Author

BusterNeece commented Jan 31, 2020

@Ocramius Just ran it again while having it echo out the SQL queries it's actually executing...for the INSERT query it's stripping out the ID entirely as a set value (it's basically just executing a standard INSERT INTO table (other, fields, here) but with ID missing entirely), so the fact that it's already set on the entity is not only not caught by the ORM, but there's no way the DBAL or the DB itself could catch it either, as it doesn't even see it's in that state by that point.

In other words, nothing in this stack could've protected me from this situation today.

With all due respect, I'm not sure how this is acceptable.

@Ocramius
Copy link
Member

It may indeed not be OK from your point of view, but this behavior is currently being relied upon:

$thing = new Thing();

$em->persist($thing);
$em->flush();
$em->clear();

$em->persist($thing);
$em->flush();
$em->clear();

self::assertCount($em->getRepository(Thing::class)->findAll());

Changing that is indeed a BC break.

@BusterNeece
Copy link
Author

@Ocramius I understand that it is considered a BC break, but clearly many other changes, including changes that relate to this, are intended for the EntityManager in 2.8.x and 3.0. Semver suggests that these releases (especially 3.x) can include BC breaks.

Considering that if the EntityManager itself doesn't detect this condition, its removal of auto-assigned IDs will ensure that neither the DBAL nor DB itself can alert for it...does that not suggest that the ORM is the only place where a change could be made at this point to help protect against this happening where it's really not expected?

Again, even an option to warn on this condition instead of just letting it happen silently would be greatly preferred.

@Ocramius
Copy link
Member

As mentioned above:

                // Can actually not happen right now since we assume STATE_NEW.

An un-managed entity that is persisted is assumed to be in STATE_NEW, unless we keep a map of detached objects (not feasible)

@BusterNeece
Copy link
Author

@Ocramius How is it not possible to detect that the primary key identifier for the entity is already set and has a non-null value on the object being persisted?

It appears this was discussed something like 9 years ago, and this exact same suggestion came up and was shot down for reasons that are unclear. Yes, it would require reflection, but the entire persistence mechanism already reflects on the other properties of the object. In fact, to know to strip that ID from the query it already has to know what those identifiers are to exclude them from the properties being set, so what is the difficulty in checking that value by reflection to see if it's set?

@DGCarramona
Copy link

Hello ! Would it be possible to generate a warning if persisting entity, which id should be automatically generated but is yet assigned ? It would resolve the situation and being a warning it's not a BC break.

@Ocramius
Copy link
Member

How is it not possible to detect that the primary key identifier for the entity is already set and has a non-null value on the object being persisted?

An entity with an assigned PK is the default case for all entities I work with (since I also don't use generated identifiers) 🤷‍♀️

@DGCarramona a warning is a BC break, as discussed in doctrine/coding-standard#150

@BusterNeece
Copy link
Author

BusterNeece commented Jan 31, 2020

@Ocramius Okay, but let's say that you do use generated identifiers. This is specified in the metadata for that primary key, that it has a GeneratedValue strategy.

If you're persisting an entity that has a GeneratedValue identifier, and that property already has a value, then you can safely assume you're playing with a detached entity that previously had that ID generated by the entity manager. At least, safely enough to warn on that exact condition.

What's more, it's specifically because these entities are using GeneratedValue identifiers that their ID was stripped out before the DBAL/DB layer (as the ORM is assuming it's going to get a value back for last inserted ID, not send one) that this is an otherwise unprotected situation.

@Ocramius
Copy link
Member

That's an interesting use-case, indeed I hacked together a test that verifies that generated identifiers are ignored during persistence:

<?php

declare(strict_types=1);

namespace Doctrine\Tests\ORM\Functional\Ticket;

use Doctrine\ORM\Annotation as ORM;
use Doctrine\Tests\OrmFunctionalTestCase;
use function random_int;

/**
 * @group GH-8007
 */
class GH8007Test extends OrmFunctionalTestCase
{
    protected function setUp() : void
    {
        parent::setUp();

        $this->schemaTool->createSchema([$this->em->getClassMetadata(GH8007EntityWithAutoIncrementId::class)]);
    }

    /**
     * Verifies that joined subclasses can contain non-ORM properties.
     */
    public function testPersistingAnEntityWithGeneratedIdentifierThatIsGivenAnIdentifierLeadsToAPersistedRecord()
    {
        $id = random_int(1000, 100000);

        $entity = new GH8007EntityWithAutoIncrementId($id);

        $this->em->persist($entity);
        $this->em->flush();
        $this->em->clear();

        self::assertNotNull($this->em->find(GH8007EntityWithAutoIncrementId::class, 1));
        self::assertNull($this->em->find(GH8007EntityWithAutoIncrementId::class, $id));
    }
}

/** @ORM\Entity */
class GH8007EntityWithAutoIncrementId
{
    /**
     * @ORM\Id()
     * @ORM\GeneratedValue(strategy="AUTO")
     * @ORM\Column(type="integer")
     */
    public $id;

    public function __construct(int $id)
    {
        $this->id = $id;
    }
}

Guess not using them (and will likely continue avoiding them :P ) has shadowed this obscure behavior

@beberlei
Copy link
Member

@SlvrEagle23 what database and Id generator are you using? For MySQL with auto increment I believe we get an exception here from flush as documented. You must be using Postgresql with a Sequence/Serial?

@BusterNeece
Copy link
Author

@beberlei MariaDB 10.2 with Auto-increment, but as mentioned in the above descriptions, the fact that the entity identifier is set as an auto-generated identifier seems to be making the ORM strip it from the subsequent INSERT that happens when flushing the detached entity, meaning there's no way for the DB to know it's a duplicate or to throw an error.

@beberlei
Copy link
Member

@SlvrEagle23 you mention a discussion 9 years ago, can you link it or did i just not see it here?

@BusterNeece
Copy link
Author

BusterNeece commented Jun 26, 2020

@beberlei @Ocramius Hello all, following up on this issue since I notice it's still open and doesn't appear to have any resolution to date.

I'm preparing my applications to be able to migrate more easily to Doctrine 3.0, and part of that is removing em->flush(record) and em->clear(record) calls and replacing them with the corresponding calls but for the entire managed entity graph all at once.

As a result of these changes, there are now far more instances in my code where one function clears the global EntityManager and other functionality isn't aware of this happening, and doesn't know that it needs to "re-fetch" every entity that it's currently playing with. Because there is no warning or other notice when attempting to persist an entity that already has a generated ID, as in the examples above, this means that any time em->clear() is called, a huge swath of code must be inspected and modified to react to this to avoid accidentally creating clones of records the next time things are persisted, and if I miss a single instance then I wind up with unexpected results and there's nothing I can do about it.

I understand the reasoning behind removing entity-specific flushes and clears, but it appears that there was no attention paid to the impact this would have on applications when clearing the entire EntityManager becomes the only way of achieving that particular optimization, and there is no documentation anywhere that suggests a way to make this less painful or more intuitive for developers.

@BusterNeece
Copy link
Author

BusterNeece commented Jun 26, 2020

Basically, the tl;dr here is that this comment in this line of code appears to be incorrect in the case of AUTO GeneratedValues:

// We assume NEW, so DETACHED entities result in an exception on flush (constraint violation).
// If we would detect DETACHED here we would throw an exception anyway with the same
// consequences (not recoverable/programming error), so just assuming NEW here
// lets us avoid some database lookups for entities with natural identifiers.
$entityState = $this->getEntityState($entity, self::STATE_NEW);

The line where it assumes the state is STATE_NEW would actually work as previously documented, and would correctly prevent persisting existing entities silently resulting in an insert, if only the $assume parameter wasn't set.

As far as I can gather as an outside developer, though, the UnitOfWork class is absolutely not meant to be overridden; it is explicitly specified by name in the constructor of the EntityManager which itself can only be decorated and not extended.

Basically, I could fix this on my end with a single change in a single line of code (specified above) and yet as a consequence of the library's design, I can't, so really I have no option but to continue to follow up here and ask for help.

@ikulis
Copy link

ikulis commented Dec 3, 2020

@SlvrEagle23 as workaround EntityManager can be decorated with this class:

<?php

namespace App\Orm;
use Doctrine\ORM\ORMInvalidArgumentException;

class EntityManagerDecorator extends \Doctrine\ORM\Decorator\EntityManagerDecorator
{
    /**
     * preventing double rows, the id might change if detached object is persisted
     * @see https://github.com/doctrine/orm/issues/8007
     */
    public function persist($object) {
        if( is_callable([$object,'getId'])){
            $oldId = $object->getId();
            $this->wrapped->persist($object);

            if( !is_null($oldId) && $oldId !== $object->getId()){
                throw ORMInvalidArgumentException::detachedEntityCannot($object, 'persisted - id changed by Doctrine');
            }
        } else {
            // not an id based entity
            $this->wrapped->persist($object);
        }
    }
}

Adding to symfony configuration:

services:
    App\Orm\EntityManagerDecorator:
        decorates: Doctrine\ORM\EntityManagerInterface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants