Doctrine UnitOfWork misleading explanation #4337

marcj · 2014-10-18T19:13:04Z

At https://github.com/symfony/symfony-docs/blob/master/book/doctrine.rst, we find:

In fact, since Doctrine is aware of all your managed entities, when you call the flush() method, it calculates an overall changeset and executes the most efficient query/queries possible. For example, if you persist a total of 100 Product objects and then subsequently call flush(), Doctrine will create a single prepared statement and re-use it for each insert. This pattern is called Unit of Work, and it's used because it's fast and efficient.

This is, unfortunately, not completely true.

Doctrine does not execute the most efficient query/queries possible. It's far away from this bold statement.
This is not called Unit of Work.

Explanation:

Doctrine's UnitOfWork doesn't do anything related to more efficient or faster queries. The most efficient possible query would be a bulk-insert aka multi-valued INSERT. This doesn't work since Doctrine does not have a internal dependency graph per entity instance, but only detects dependency sorting using simple topological sort of class-mapping informations which results in a INSERT-per-row strategy.
Unit of Work is

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

it is also (and this is one of the most important differences compared to ActiveRecord related to saving and its performance):

You can change the database with each change to your object model, but this can lead to lots of very small database calls, which ends up being very slow. Furthermore it requires you to have a transaction open for the whole interaction, which is impractical if you have a business transaction that spans multiple requests. The situation is even worse if you need to keep track of the objects you've read so you can avoid inconsistent reads.

http://martinfowler.com/eaaCatalog/unitOfWork.html

Important sentence is:

but this can lead to lots of very small database calls, which ends up being very slow.

Well, what Martin Flower explains here is that Unit of Work is among other things here to solve the issue with lots of very small database calls. Doctrine on the other side does exactly that: It fires lots of very small database calls, contrary to the definition of Unit of Work by Martin. This leads to the conclusion that Doctrine doesn't implemented Unit of Work completely nor is it related to any sort of a performance or efficiency topic.

Which means, Doctrine's UnitOfWork class only handles dependency resolving, so objects are inserted/updated in the correct order. This is what the paragraph above should only highlight.
Since in the symfony book explanation above is not mentioned such a dependency resolving it connotes that Doctrines does have a facility to improve performance using UnitOfWork - it doesn't. It's at the moment only a big bag of objects that knows which object needs to be saved first to have no foreign-key constraint failures coming up. It does not implement any way of performance improvements compared to other ORMs that use a difference persisting strategy.

Side-fact:

Doctrine's UnitOfWork suffers actually currently by its implementation. Although UnitOfWork might allow you to execute batch inserts as it knows already everything about the entities that need to be persisted, Doctrine doesn't utilize it. In fact, it actually suffers from a incredible inefficient implementation since it uses always single INSERTs for every row that needs to be persisted, and circular dependencies are even resolved in a INSERT-without-fk then UPDATE-fk strategy, which leads to a even worse performance. Actually, a non-UnitOfWork-pattern ORM like Propel is faster with the storage than Doctrine because of this fact.

So, I recommend to change the misleading paragraph above to:

In fact, since Doctrine is aware of all your managed entities, when you call the flush() method, it calculates an overall changeset and executes the queries in the correct order. It utilizes cached prepare statement to slightly improve the performance. For example, if you persist a total of 100 Product objects and then subsequently call flush(), Doctrine will execute 100 INSERT queries using a single prepared statement object.

The text was updated successfully, but these errors were encountered:

javiereguiluz · 2014-10-19T20:40:34Z

@marcj thanks for your very detailed and well explained issue report. I agree that the original explanation is misleading and we should reword it. I like your proposed text and I've submitted a PR with it: #4342

This PR was merged into the 2.3 branch. Discussion ---------- Reworded a misleading Doctrine explanation | Q | A | ------------- | --- | Doc fix? | yes | New docs? | no | Applies to | all | Fixed tickets | #4337 Commits ------- ef86b52 Fixed typo fef57d5 Reworded a misleading Doctrine explanation

weaverryan · 2014-11-05T19:26:18Z

Wow, brilliant! Thanks @marcj - I've just merged a PR with your new language (which I didn't need to change at all).

Thanks!

javiereguiluz mentioned this issue Oct 19, 2014

Reworded a misleading Doctrine explanation #4342

Merged

weaverryan closed this as completed Nov 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Doctrine UnitOfWork misleading explanation #4337

Doctrine UnitOfWork misleading explanation #4337

marcj commented Oct 18, 2014

javiereguiluz commented Oct 19, 2014

Uh oh!

weaverryan commented Nov 5, 2014

Uh oh!

Uh oh!

Doctrine UnitOfWork misleading explanation #4337

Doctrine UnitOfWork misleading explanation #4337

Comments

marcj commented Oct 18, 2014

javiereguiluz commented Oct 19, 2014

Uh oh!

weaverryan commented Nov 5, 2014

Uh oh!