Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDC-601: UnitOfWork commit order problem #5109

Open
doctrinebot opened this issue May 19, 2010 · 22 comments
Open

DDC-601: UnitOfWork commit order problem #5109

doctrinebot opened this issue May 19, 2010 · 22 comments

Comments

@doctrinebot
Copy link

Jira issue originally created by user exxbrain:

The next critical case doesn't work, because of Duplicate entry error:

$obj = $repository->findOneByUniqueKey($key);

if ($obj != null) {
     $em->remove($obj);
}
$obj = new Object();
$obj->setUniqueKey($key);
$em->persist($obj);

$em->flush();

Instead of that i must use the next construction:

$obj = $repository->findOneByUniqueKey($key);

if ($obj != null) {
     $em->remove($obj);
     $em->flush();      //  the possible data inconsistency cause
}
$obj = new Object();
$obj->setUniqueKey($key);
$em->persist($obj);

$em->flush();
@doctrinebot
Copy link
Author

Comment created by romanb:

Hi. The order of operations is mainly designed for a) efficiency and b) preserving foreign key constraints, but not for such kinds of unique constraints.

The order is:

  1. all entity inserts (topological order)
  2. all entity updates (topological order)
  3. all collection deletions
  4. all collection updates
  5. all entity deletions (reverse toplogical order)

Where the topological order is dictated by the foreign key constraints.

The unique constraint is not the primary key, is it? In either case, I think this is a strange thing to do, even more if this is the primary key, as a primary key should never be updated or reused by other objects.

You can still easily do it, just use an extra flush like you did. If you want both flushes to execute in a single transaction, simply use your own transaction demarcation:

$em->getConnection()->beginTransaction();
try {
    if ($obj != null) {
        $em->remove($obj);
        $em->flush();
    } 
    $obj = new Object();
    $obj->setUniqueKey($key);
    $em->persist($obj);
    $em->flush();
    $em->getConnection()->commit();
} catch (Exception $e) {
   $em->getConnection()->rollback();
   $em->close();
   throw $e;
}

See: http://www.doctrine-project.org/projects/orm/2.0/docs/reference/transactions-and-concurrency/en#transaction-demarcation

I don't think we can change the order by putting deletions first without breaking other things. At least Hibernate has a similar order (inserts first) and probably for a reason.

Here are some related discussions:

http://opensource.atlassian.com/projects/hibernate/browse/HHH-2801
http://blog.krecan.net/2008/01/05/flushing-hibernate/
https://forum.hibernate.org/viewtopic.php?p=2421199

@doctrinebot
Copy link
Author

Comment created by romanb:

I tried this with eclipselink (jpa 2 reference implementation, http://www.eclipse.org/eclipselink/ ) as well, same unique constraint violation (inserts first).

That makes me not very optimistic that we can do any better.

@doctrinebot
Copy link
Author

Comment created by romanb:

Some more references on why deletes are performed last:

http://old.nabble.com/EntitEntityManager.remove%3A-Why-does-it-care-about-nullable%3Dfalse--td22942681.html
http://forums.oracle.com/forums/thread.jspa?messageID=3805444

We may consider doing the deletes first since we always have a topological order and most of the mentioned issues might not apply to our architecture or might not be common cases but I am not certain yet.

In any case, like I mentioned in my first comment, you can just flush() after remove() to force the deletion first.

@doctrinebot
Copy link
Author

Issue was closed with resolution "Won't Fix"

@doctrinebot
Copy link
Author

Comment created by jean-gui:

Calling flush() after remove() is not always possible when using Doctrine with Symfony, because bindRequest does everything under the hood.
A function "setShouldPerformDeletesFirst" as described in http://docs.oracle.com/cd/B31017_01/web.1013/b28218/uowadv.htm#i1137484 would be very useful in those cases.
Without it, I need to mess with detach() and merge() to keep only the things I want to remove() attached, flush() and then merge() everything back before a second flush().

@doctrinebot
Copy link
Author

Comment created by magnetik:

I'm facing the exact same issue 4 years later. Is ther any thing new to help with this situation?

@doctrinebot
Copy link
Author

Comment created by @Ocramius:

[~magnetik] the issue is marked as won't fix

@doctrinebot
Copy link
Author

Comment created by deatheriam:

When there is a complicated system of flush event handlers that involves sending requests to independent systems, having multiple flushes to solve this particular issue creates a lot of other issues on its own. Also it goes against Doctrine philosophy to issue as few flushes as possible.

I checked one of two links (the other one is dead) posted above that supposedly explains why deletes should go last in a transaction, and did not find any meaningful explanation, a quick Google search also yielded a few discussions without a clear answer, for example this one: https://forum.hibernate.org/viewtopic.php?t=934483.

@marco, could you re-state why this cannot be achieved without sending any future readers of this ticket to some articles?

@doctrinebot
Copy link
Author

Comment created by deatheriam:

At least give us, users an option to use this approach:

{quote}Using the Unit of Work setShouldPerformDeletesFirst Method

By default, TopLink does insert and update operations first, before delete operations, to ensure that referential integrity is maintained. This is the preferred approach.

If you are forced to replace an object with unique constraints by deleting it and inserting a replacement, you may cause a constraint violation if the insert operation occurs before the delete operation. In this case, call setShouldPerformDeletesFirst to perform the delete operation before the insert operation.{quote}

@BenMorel
Copy link
Contributor

I'd like to emphasize that this really is a pain when you deal with collections, that you want to clear() and add() to again. You can usually not flush() in between, as the whole operation happens in your model, with no knowledge of the EntityManager. So if your table contains unique constraints, you're screwed because the inserts are performed before the deletes. I've had this problem on pretty much every project I've used Doctrine on.

The explanation of the current order of operations lies in this link:

Deletes are general performed last, and an update may be required in order to allow deletion because of constraints.

One could argue that you could then do UPDATE, DELETE, then INSERT. But if an UPDATE needs an INSERT, you're screwed.

So there is no one-size-fits-all solution to this problem.

What would be good is, indeed, a configuration option to force DELETE before INSERT, at least in a collection.

@ureimers
Copy link
Contributor

ureimers commented Dec 2, 2016

Even though this whole issue relates to the edge case of replacing entities with unique constraints with one another, "won't fix" won't make it go away. The current workarounds look horrifying, to say the least, and there really should be a better way to solve this problem.

Theoretically:
As Doctrine knows about the unique constraints of entities it could automatically check if the current unit of work contains an entity (A) with unique constraints that gets replaced by another entity (B) of the same class and with the same values in the constrained fields. In that case it needs to delete A before inserting B and everything should be fine. Or am I missing something?

@mastir
Copy link

mastir commented Feb 6, 2020

3 years gone and this is not solved. To "replace" entity in my project i do DBAL delete query before persist (in my case no need to load entity). ORM is simple unable to handle it. This is fine for simple entities, but when it comes from cascade persist combined with orphan remove it is so messed and hard to debug.

So there is no one-size-fits-all solution to this problem.

Actually there is one, not so simple but real solution: do not split INSERT / UPDATE / DELETE operations, but build operation order based on requirements. First run queries that has no need in chained operations. Then loop next queries that has all requirements solved and continue till all work is done. And for sure circular dependencies can be detected and solved by splitting into two INSERT/UPDATE queries.

@fredgnf
Copy link

fredgnf commented Oct 28, 2020

I'm facing an situation where i need to do uptades interleaved with inserts, becuase if i don't, the database give unique exception,, and if i do flush in each case, i can have a Exception in the middle. In that case, i have to do an catch with deletes interleaved with updates to rollback the other flushs. And this can give exception too.
I will have to manipulate this using PDO.
It would be very helpfull if doctrine give us the option of do transactions in the order we need.

@beberlei beberlei reopened this Jul 4, 2021
@beberlei beberlei self-assigned this Jul 4, 2021
@beberlei beberlei added this to the 3.0.0 milestone Jul 4, 2021
@beberlei
Copy link
Member

beberlei commented Jul 4, 2021

Reopening for research only for now, 3.0 could be something where we rethink if that might be fixable.

@sabat24
Copy link

sabat24 commented Aug 24, 2021

Another use case where I need UPDATE after DELETE is updating sort order field of remaining entities after DELETING another entity. For example I've got:

ID sort_order
A 1
B 2
C 3

Let's say that I want to remove B entity and then run something like that:

SET @cnt = 0;
UPDATE table AS w
   SET w.sort_order = @cnt := @cnt + 1
 ORDER BY w.sort_order ASC
;

It won't work because my UPDATE will be performed before DELETE. I can't just flush after DELETE and then run an UPDATE because removing process is a part of deeper process and I can't just break a transaction. UPDATE query is also more complicated and depends on some higher level relations.

The most safe solution is to make a sequence of DELETE, UPDATE, DELETE, UPDATE queries wrapped by single transaction. However it's not possible right now.

@michaeldnelson
Copy link

michaeldnelson commented Oct 4, 2021

Adding a use case for consideration. I have users and users may have 1 or 0 credentials (login, pw). Logins need to be unique and I would like credentials to be replaceable. Ideally, credentials would be a readonly unidirectional object owned by users, if I replace them I would like that to happen in a single transaction, or risk a state where someone loses their credentials between remove and replace. Obviously it can be worked around, like making credentials mutable, or just shoving everything as nullable on the user object. But it feels like a limitation of the orm.

@Nek-
Copy link
Contributor

Nek- commented Oct 15, 2021

I was looking for a way to fix this issue because I think that it's really an issue. (I'm new in this thread, and old reasons of why it will not be fixed are not accessible anymore x))

I think the best way to handle this is to change how are staged entities in the uow. They are currently simple arrays but to support order on a commit we need to add a new order notion. I have 2 ideas, with pro and cons:

  1. Using a new set of objects like StagedItemCollection and Removal, Update, Insert
    pro: clean api
    con: memory consumption (not sure it's relevant with recent php changes), more changes
  2. Changing the way the array is composed for something like [$stagedOrder, $entity] instead of just the entity
    pro: little change, no change on performance
    con: will obviously lead to complex and more code inside the uow

In any case, it will lead to major changes deep inside the uow in many many functions.

I'm interested in trying something. I'd like to know if somebody would be able to follow me until it's merged or if it looks more like a real wontfix (as I said, I cannot really read previous arguments of why it's not a good idea to fix it, I'm open to anything).

Thanks!

@linuxprocess
Copy link

Sometimes, complex transactions need to follow a specific order of operations (SELECT, DELETE, UPDATE, INSERT,...) in only one transaction (COMMIT all or ROLLBACK all). I thought ORM just follow the natural order of operations (I mean the order the developer coded it).
It seams to me that this implies that ORM cannot deal with such complex situations ...
(Yes, the bad "primary key" usage is understandable, but in a relation BDD, there can be a lot of other constraints that imply a specific order).

As an idea, why not try again the transaction with the developer order if the transaction with the INSERT/UPDATE/DELETE reordering did not work and realized a ROLLBACK ? (yes, It's not an efficient idea)

mpdude added a commit to mpdude/doctrine2 that referenced this issue Jun 28, 2023
There are a few requests (doctrine#5742, doctrine#5368, doctrine#5109, doctrine#6776) that ask to change the order of operations in the UnitOfWork to perform "deletes before inserts", or where such a switch appears to solve a reported problem.

I don't want to say that this is not doable. But this PR at least adds two tricky examples where INSERTs need to be done before an UPDATE can refer to new database rows; and where the UPDATE needs to happen to release foreign key references to other entities before those can be DELETEd.

So, at least as long as all operations of a certain type are to be executed in blocks, this example allows no other order of operations than the current one.
@mpdude
Copy link
Contributor

mpdude commented Jun 28, 2023

#10809 adds an example that shows why we cannot – at least not, in general – easily change the current order of operations in the UoW. The example requires an INSERT to be done so that an UPDATE can refer to the new entity, and the UPDATE is necessary to release a foreign key reference so that a DELETE can take place.

mpdude added a commit to mpdude/doctrine2 that referenced this issue Jun 29, 2023
There are a few requests (doctrine#5742, doctrine#5368, doctrine#5109, doctrine#6776) that ask to change the order of operations in the UnitOfWork to perform "deletes before inserts", or where such a switch appears to solve a reported problem.

I don't want to say that this is not doable. But this PR at least adds two tricky examples where INSERTs need to be done before an UPDATE can refer to new database rows; and where the UPDATE needs to happen to release foreign key references to other entities before those can be DELETEd.

So, at least as long as all operations of a certain type are to be executed in blocks, this example allows no other order of operations than the current one.
@mpdude
Copy link
Contributor

mpdude commented Jun 30, 2023

Regarding @mastir's idea from #5109 (comment):

It’s not only that it takes to consider all SQL operations at the individual level and understand when e. g. an UPDATE requires another INSERT to go first.

It would need to be smart enough to see where an UPDATE replaces an old value, and that old value is what is blocking another DELETE with a foreign key reference.

@mastir
Copy link

mastir commented Jun 30, 2023

Yes its can be done with correct structure. Here is a way i see it. Example: We have a Dog with 4 Legs in a Cage. Our models are: Cage, Dog, Legs. Cage-Dog is OneToOne relation with orphanRemoval=true. Dog-Leg is OneToMeny relation with orphanRemoval=true. And the scenario: customer comes to store asks to change socks_color for every leg of $liked_dog and takes this dog, so worker go to basment and put another dog into this cage:

$cage = $liked_dog->getCage();
foreach($liked_dog->getLegs() as $leg){
$leg->setSockColor('red');
$em->persist($leg);
}
$another_dog = $basement->getNextDog();
$cage->setDog($another_dog);
$em->persist($cage); //cascad persist dog, cascad persist legs
$em->flush();

So we have this operations in order they were perstited:

  1. UPDATE leg x4 - no requirments, run on stage 1
  2. UPDATE $cage - require $another_dog.id(3), run on stage 2
  3. INSERT $another_dog - no requirents, run on stage 1
  4. INSERT leg x4 - require $another_dog.id(3), run on stage 2
  5. DELETE $liked_dog - require legs to be deleted (6), run on stage 3
  6. DELETE $liked_dog.legs - require (1) as it work with same entity, run on stage 2
  7. UPDATE $cage - require $another_dog.id(2), run on stage 2

So the result operation order is: 1,3 / 2,4,6,7 / 5

The hardest part is requirements detection and checks, but nothing unreal. First simplest idea was to use somthing like simple object with boolean property and change it from UoW, but this will make UoW have all of the possible requiremnts logic. So another way is to have some classes to run checks, for example check for pending operations on same entity before we run delete:

class DeleteOperationRequirmentsCheck extends RequirmentsCheck {

  public function doCheck($uow, $operation_queue):array {
     $requirments = [];
     foreach($operation_queue as $operation){
       if ($operation === $this)continue;
       if ($operation->getTable() === $this->table && $this->criteria->match($operation->getPrimaryKey()){
         $requirments[] = $operation;
       }
     }
     return $requirments;
  }
}

In this example check we wait for all of the queued operations before delete, but we can use same logic to wait only for operations added before our by simple replacing continue with break

and some interface like Resolvable added to operations and maybe some new other Resolvable type requirments, like EntityPrimaryKeyRequirment and etc.

@mpdude your absolutly right, we can even have RelatiedEntityKeyRequirment and it can be triggered not only by entity removal, but also by its value update.

We can apply new strategies like CircularRequrmentsSolvingStrategy to split circular requirments into insert/update operations or OptimizeGroupInsertStrategy to run multiple inserts in one query or even PreparedQueryAnalyticStratagy, to handle analitics and have prepared statments for most of our common operations.

The way to detect recursion in requrments is simple as: count($operations_queue) > 0 && count($next_stage_operations) === 0

@greg0ire greg0ire removed this from the 3.0.0 milestone Oct 10, 2023
@berkut1
Copy link

berkut1 commented Oct 25, 2023

Since there have been attempts to fix it, I'll add my case.
Code example.

    public function handle(Command $command): void
    {
      //getting entities from repositories

      //remove old item from installed socked
      $device->removeInstalledItem(new Id($command->id_item_installed));

      $socket = $command->getSocket();
      //we add a new itemto the socket that we freed earlier.
      $device->installItem($item, $socket);

      $this->flusher->flush();
    }

How it should work:

  1. DELETE $removeInstalledItem
  2. INSERT $installItem

How it try to works:

  1. INSERT $installItem -> throw error
  2. DELETE $removeInstalledItem

removeInstalledItem() this is a large function that has a recursive function and deep loop functions that check if all associated relationships with an element have been removed, but in a pure DB they mostly not triggering.

Perhaps my attempt to explain failed, I will try to answer questions if something is not clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests