Skip to content

Doctrine UnitOfWork misleading explanation #4337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcj opened this issue Oct 18, 2014 · 2 comments
Closed

Doctrine UnitOfWork misleading explanation #4337

marcj opened this issue Oct 18, 2014 · 2 comments

Comments

@marcj
Copy link

marcj commented Oct 18, 2014

At https://github.com/symfony/symfony-docs/blob/master/book/doctrine.rst, we find:

In fact, since Doctrine is aware of all your managed entities, when you call the flush() method, it calculates an overall changeset and executes the most efficient query/queries possible. For example, if you persist a total of 100 Product objects and then subsequently call flush(), Doctrine will create a single prepared statement and re-use it for each insert. This pattern is called Unit of Work, and it's used because it's fast and efficient.

This is, unfortunately, not completely true.

  1. Doctrine does not execute the most efficient query/queries possible. It's far away from this bold statement.
  2. This is not called Unit of Work.

Explanation:

  1. Doctrine's UnitOfWork doesn't do anything related to more efficient or faster queries. The most efficient possible query would be a bulk-insert aka multi-valued INSERT. This doesn't work since Doctrine does not have a internal dependency graph per entity instance, but only detects dependency sorting using simple topological sort of class-mapping informations which results in a INSERT-per-row strategy.

  2. Unit of Work is

    Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

it is also (and this is one of the most important differences compared to ActiveRecord related to saving and its performance):

You can change the database with each change to your object model, but this can lead to lots of very small database calls, which ends up being very slow. Furthermore it requires you to have a transaction open for the whole interaction, which is impractical if you have a business transaction that spans multiple requests. The situation is even worse if you need to keep track of the objects you've read so you can avoid inconsistent reads.

http://martinfowler.com/eaaCatalog/unitOfWork.html

Important sentence is:

but this can lead to lots of very small database calls, which ends up being very slow.

Well, what Martin Flower explains here is that Unit of Work is among other things here to solve the issue with lots of very small database calls. Doctrine on the other side does exactly that: It fires lots of very small database calls, contrary to the definition of Unit of Work by Martin. This leads to the conclusion that Doctrine doesn't implemented Unit of Work completely nor is it related to any sort of a performance or efficiency topic.

Which means, Doctrine's UnitOfWork class only handles dependency resolving, so objects are inserted/updated in the correct order. This is what the paragraph above should only highlight.
Since in the symfony book explanation above is not mentioned such a dependency resolving it connotes that Doctrines does have a facility to improve performance using UnitOfWork - it doesn't. It's at the moment only a big bag of objects that knows which object needs to be saved first to have no foreign-key constraint failures coming up. It does not implement any way of performance improvements compared to other ORMs that use a difference persisting strategy.

Side-fact:

Doctrine's UnitOfWork suffers actually currently by its implementation. Although UnitOfWork might allow you to execute batch inserts as it knows already everything about the entities that need to be persisted, Doctrine doesn't utilize it. In fact, it actually suffers from a incredible inefficient implementation since it uses always single INSERTs for every row that needs to be persisted, and circular dependencies are even resolved in a INSERT-without-fk then UPDATE-fk strategy, which leads to a even worse performance. Actually, a non-UnitOfWork-pattern ORM like Propel is faster with the storage than Doctrine because of this fact.

So, I recommend to change the misleading paragraph above to:

In fact, since Doctrine is aware of all your managed entities, when you call the flush() method, it calculates an overall changeset and executes the queries in the correct order. It utilizes cached prepare statement to slightly improve the performance. For example, if you persist a total of 100 Product objects and then subsequently call flush(), Doctrine will execute 100 INSERT queries using a single prepared statement object.

@javiereguiluz
Copy link
Member

@marcj thanks for your very detailed and well explained issue report. I agree that the original explanation is misleading and we should reword it. I like your proposed text and I've submitted a PR with it: #4342

weaverryan added a commit that referenced this issue Nov 5, 2014
This PR was merged into the 2.3 branch.

Discussion
----------

Reworded a misleading Doctrine explanation

| Q             | A
| ------------- | ---
| Doc fix?      | yes
| New docs?     | no
| Applies to    | all
| Fixed tickets | #4337

Commits
-------

ef86b52 Fixed typo
fef57d5 Reworded a misleading Doctrine explanation
@weaverryan
Copy link
Member

Wow, brilliant! Thanks @marcj - I've just merged a PR with your new language (which I didn't need to change at all).

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants