Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider undeprecating EntityManager::merge #10209

Open
flack opened this issue Nov 7, 2022 · 10 comments
Open

Consider undeprecating EntityManager::merge #10209

flack opened this issue Nov 7, 2022 · 10 comments
Labels

Comments

@flack
Copy link
Contributor

flack commented Nov 7, 2022

Feature Request

Q A
New Feature no
RFC no
BC Break no

Summary

I am maintaining a Doctrine-based ActiveRecord implementation (yes, I know that's probably considered an antipattern, but there's a ton of code built on that, so what are you going to do?). One of the features I need to provide is that you can have two different copies of an entity, i.e.:

$object1 = new my_class($db_id);
$old_value = $object1->property;
$object1->property = 'some value';

$object2 = new my_class($db_id);
if ($object2->property == $old_value) {
    echo "GOOD";
}

$object1->save(); // 'some value' is now saved in the DB
$object2->save(); // $old_value is now saved in the DB, 'some value' was overwritten

The way I implement that is by detaching the entity once a property changes, and when save gets called, I merge it back in (there's a bunch of other things going on, too, but that is the gist of it).

With orm 3.0, that won't work any longer, because merge is scheduled to be removed (even though detach apparently stays). So I was wondering: How big of a burden would it be to continue supporting merge? Because as far as I can tell, my only alternative would be to clear the entire EntityManager, then load the entity again from the DB, copy over the properties from the detached copy and then calling persist/flush, which ofc totally kills performance. Which reminds me: flush($entity) is also deprecated. So I guess undeprecating that would be part of my feature request, too :-)

If you say it's impossible to support these features, I understand. But I thought it can't hurt to at least ask. Also, if someone has an idea how else I might implement the behaviour I need, I would be very interested to hear that!

@derrabus
Copy link
Member

Hello and sorry that it took us so long to give you an answer.

I am maintaining a Doctrine-based ActiveRecord implementation (yes, I know that's probably considered an antipattern, but there's a ton of code built on that, so what are you going to do?).

Doctrine ORM deliberately departed from the active record pattern. And as much as I understand your dilemma, I doubt that the Doctrine project has an interest in supporting AR or maintaining features that are needed to build an AR implementation on top of the ORM. In fact, we've sunset the last remaining AR features from the persistence library recently.

The functionality that you're describing sounds so very different from how the ORM is designed to be used, so you might even be better off by building your library on top of the DBAL directly.

We believe that the merge functionality in its current form is error-prone, hard to maintain and should not be needed in the first place. And your description of how you use merge somewhat proves my point. Merge is not what you needed, but you somehow could make it work for you.

We are happy to discuss a replacement for merge and you are invited to propose and contribute features that you need to build your library. But right now, I'm not convinced that we should keep merge.

@flack
Copy link
Contributor Author

flack commented Jan 2, 2023

The functionality that you're describing sounds so very different from how the ORM is designed to be used, so you might even be better off by building your library on top of the DBAL directly.

Yes, I've looked into that, too. The thing is, I want to use most parts of ORM, mostly all the ClassMetaData/mapping functionality, and QueryBuilder. The only thing that really doesn't fit into my usage model is EntityManager, or more precisely its insistence on having a singleton-like representation of entities. Have you thought about splitting up the ORM package into smaller parts? Because AFAICT EntityManager/UnitofWork is the only "opinionated" part that collides with what I want to do. Realistically, I will probably fork orm2 once it reaches end of life instead of writing something completely from scratch, but ofc I'm trying to avoid that for as long as possible.

Regarding a replacement for merge: I could try to come up with something, but I'm not sure I understand how exactly it is causing problems. Is there any writeup you could point me to?

@derrabus
Copy link
Member

derrabus commented Jan 2, 2023

Well, the thing is: You don't need merge actually. You just use it to work around the fact that the ORM's UoW cannot manage two entity representations of the same record. Merge is not a feature you would've requested if it had not existed back then. I don't think it's reasonable to keep that method only to keep that workaround alive.

My understanding is that you use the data mapping and change detection of the ORM and that the latter is hardwired into the UoW. If you can manage to make the functionality that you actually need reusable for your purpose, I'm happy to discuss a PR that does this.

Realistically, I will probably fork orm2 once it reaches end of life instead of writing something completely from scratch, but ofc I'm trying to avoid that for as long as possible.

That could also be an option for you. Specifically, it would allow you to overcome the single entity per record limitation because once you've forked the UoW, you can do what ever you want with it without caring about the ORM.

@flack
Copy link
Contributor Author

flack commented Jan 2, 2023

Well, the thing is: You don't need merge actually. You just use it to work around the fact that the ORM's UoW cannot manage two entity representations of the same record. Merge is not a feature you would've requested if it had not existed back then. I don't think it's reasonable to keep that method only to keep that workaround alive.

I mean, as long as a detach method exists, it would be nice to have a way to revert detaching, because detaching being a one-way operation kind of limits its usefulness (e.g. you can detach and modify the entity, but there is no way to persist your changes).

My understanding is that you use the data mapping and change detection of the ORM and that the latter is hardwired into the UoW. If you can manage to make the functionality that you actually need reusable for your purpose, I'm happy to discuss a PR that does this.

Alright, I'll have a look at this (at some point in the future)

@derrabus
Copy link
Member

derrabus commented Jan 2, 2023

detaching being a one-way operation kind of limits its usefulness (e.g. you can detach and modify the entity, but there is no way to persist your changes).

No, why? First of all, how would we clear the EM if we could not detach entities? And I detach an entity because I don't want changes to be tracked anymore. If I want changes to be persisted, why would I opt out of the change tracking just to opt-in later on again?

And for detach in your scenario, the same applies as for merge: You're detaching an entity in which you still want to track changes because you work around the fact that the UoW can only track a single entity per record.

Again: You use detach/merge to work around a limitation which is a deliberate design decision of this library.

@Kern046
Copy link
Contributor

Kern046 commented Jan 12, 2023

I have a different use-case I think more common: long-running commands. For example, in our international website, we have a command that loop through all the Website entities and then process a lot of entities for each Website. We did have a huge performance issue in CPU time and memory. So we came to use EntityManager::clear.

foreach ($websites as $website) {
    $currentPage = 1;
    $itemPerPage = 50;

    do {
        $pagination = $repository->getAllByWebsite($website, $currentPage++, $itemPerPage);

        // ... Do stuff with our paginated entities and then flush

        $this->entityManager->clear();
    } while ($pagination->hasNextPage());
}

We did get a huge gain in CPU time and RAM, yet it didn't completely work, because Website got cleared as well, and when the flush() occurs, we got the following exception:

A new entity was found through the relationship 'WonderfulEntity#website' that was not configured to cascade persist operations for entity

Several ways would allow us to get through this problem, but all are deprecated and will be removed in Doctrine ORM 3.0:

Use EntityManager::clear on all loaded entities classes but Website

Calling EntityManager::clear with arguments is deprecated. Now we can only do a "clear all" call.

Use EntityManager::merge on the Website entity before we use it again after a clear

The whole method is deprecated.

Use persist on the Website entity to put it back in Unit of Work:

In EntityManager::persist we read the following comment:

// We assume NEW, so DETACHED entities result in an exception on flush (constraint violation).
// If we would detect DETACHED here we would throw an exception anyway with the same
// consequences (not recoverable/programming error), so just assuming NEW here
// lets us avoid some database lookups for entities with natural identifiers.
$entityState = $this->getEntityState($entity, self::STATE_NEW);

The UnitOfWork::STATE_DETACHED case is not even taken care of in the switch that follows this statement, assuming that this state is "NEW".


The last way I see is to use our repository to perform a find() call to retrieve the same website, but as the existing Website entity wouldn't be found in the Unit of Work, it would really perform a SQL query and go through all hydrate process to get a new entity. When looking for better performance, it hurts a little :P.

Is there a solution I'm not aware of for dealing with this case in Doctrine 3.0 ?

I really think that this is a use-case that many developers will encounter when dealing with long-running commands.

@derrabus
Copy link
Member

The last way I see is to use our repository to perform a find() call to retrieve the same website

If you really only need the website entity for relations, you can call EntityManager::getReference() to get a proxy object instead of loading the whole thing.

@Kern046
Copy link
Contributor

Kern046 commented Jan 27, 2023

The last way I see is to use our repository to perform a find() call to retrieve the same website

If you really only need the website entity for relations, you can call EntityManager::getReference() to get a proxy object instead of loading the whole thing.

Thanks for your answer ! We tried that and it has indeed solved our issue.

But we also realized that we took the wrong way for our current need. I assumed that EntityManager::clear was the method we must use to clear the Unit of Work, but thanks to one of my teammates suggestion, we got interested in EntityManager::detach and it was the final remedy to our problem :).

If I take my previous code example and I update it:

foreach ($websites as $website) {
    $currentPage = 1;
    $itemPerPage = 50;

    do {
        $pagination = $repository->getAllByWebsite($website, $currentPage++, $itemPerPage);

        // ... Do stuff with our paginated entities and then flush

		// we need to execute that part after having persisted and flushed our stuff above
		foreach ($pagination->getItems() as $item) {
			$this->entityManager->detach($item);
		}
    } while ($pagination->hasNextPage());
}

Thanks to cascade={"detach"} we are able to choose which associated entities must remain in the UoW and which can be cleared.

In some cases we have multiple loops to do after our flush calls before all entities are not always related to perform cascade detach, but according to Blackfire, this is still more performant that querying again the entity with EntityManager::getReference.

All is well 😄 !

@flack
Copy link
Contributor Author

flack commented Jun 25, 2023

We believe that the merge functionality in its current form is error-prone, hard to maintain and should not be needed in the first place. And your description of how you use merge somewhat proves my point. Merge is not what you needed, but you somehow could make it work for you.

Just a small note: I think the recently proposed #10791 should resolve the errors/weirdness that are often observed in conjunction with using merge

@withinboredom
Copy link

Just to throw in an issue encountered when upgrading: value-object rows.

Consider a "Translatable String" where the hash("key + context + locale") can be considered the id of the string. If any of those values change, it is a different string and not the same string.

This works except when you 'accidentally' change the values to an "already existing string". In the "value object" paradigm, $string1 and $string2 are the same, yet two different instances according to Doctrine. There's now no way to tell Doctrine these two are the same (thus resulting in insertion errors) and requires some gymnastics. Previously, a ->merge() was all that was required. Now, there needs to be some creative walking of loading instances, detaching them from the entity manager, etc.

Basically, a huge performance hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants