-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-99B] Proposal for (re)defining a "unit of work" #25108
Comments
This is great! Have you thought about introducing an external (to SQLA) "manager" of sorts that tracks transaction lifecycles and is responsible for proxying/postponing Adding such a construct could also allow us to build unit tests that "automatically" rollback after running against their target DB, which would remove the need to perform cleanup after each test. (Ex: https://docs.spring.io/spring-framework/reference/testing/annotations/integration-spring/annotation-rollback.html) One other potential case that I've seen in the past is when you may actually want to nest transactions (i.e. you need to create / commit some object within the context of a unit of work that doesn't necessarily depend on the outer scope/transaction) Lastly, support for "read-only" transactions would be awesome. This would remove the need to even |
Thanks for writing this SIP @john-bodley. I already review it internally so I only have one more addition: it's important that we are able to track the SIP progress given that it will generate a series of PRs. For SIP-61, I created a SIP-61 milestones project and we could do something similar here. Planning the SIP execution also have the additional benefit of enabling contributions from the community. |
Thanks @craig-rueda for your feedback. In answer to your questions:
|
cc @dpgaspar as this SIP references FAB. |
@john-bodley great work! |
Approved! |
If anyone can give us all an update on implementation status or plans, it would be appreciated :D |
[SIP-99B] Proposal for (re)defining a "unit of work"
This SIP is part of the [SIP-99] Proposal for correctly handling business logic series. Specifically, it proposes formalizing the "unit of work" construct. The topics outlined here are highly coupled with [SIP-99A] Primer for managing SQLAlchemy sessions as they both relate to managing database transactions.
Unit of Work
The "unit of work" is used to group multiple service operations into a single atomic unit. It materializes as a database transaction which—with the support of a SQLAlchemy session—allows us to create a block of code within which database atomicity is guaranteed. If the block of code is successfully completed, the changes are committed to the database. If there is an exception, the changes are rolled back.
SQLAlchemy also supports this feature:
Historically, Superset’s "unit of work" has been ill-defined, mismanaged, and/or misconstrued. It has resulted in us over (partial) committing*—a somewhat typical SQLAlchemy mistake (see SIP-99A for more details)—which violates the atomicity of the operation and leads to unnecessary complexity.
* The atomic unit vagueness has led to code inconsistencies which have resulted in us often adopting the “when in doubt, commit” mentality.
[SIP-35] Proposal for Improving Superset’s Python Code Organization introduced the concept of a Commands and Data Access Objects (DAOs) which (clearly and rightfully) stated,
however, sadly the code examples (where the database logic is defined entirely within the DAO which—via the commit and rollback operations—ends the transaction) likely led people astray—violating the "unit of work" concept.
Examples
The following examples illustrate where the atomic business operation has been violated:
BaseDAO.create()
method auto commits by default, which means that the DAO, as opposed to the Command, acts as the “unit of work”.QueryDAO.stop_query()
method over commits.ObjectUpdater.after_insert()
SQLAlchemy event handler instantiates a session (and thus transaction) outside of the Flask-SQLAlchemy session and explicitly commits*.UpdateDashboardCommand.run()
method prevents command chaining due to the explicit commit.EmbeddedDashboardDAO.upsert()
method explicitly commits.* The reason for this behavior is, per the SQLAlchemy event documentation,
The following examples illustrate where preserving the atomic business operation has added complexity or inconsistency which is difficult to grok:
DatasetDao.update()
method has complex commit chaining.DashboardDAO.update_chart_owners()
andDashboardDAO.set_dash_metadata()
methods are inconsistent, i.e., (by default) the former and latter commit and do not commit respectively—which leads to ambiguity within the same DAO in terms of the meaning ofset
andupdate
.Proposed Change
Unit of Work
In the context of Flask a typical "unit of work" is a request, however in addition to the RESTful API, Superset provides a CLI, and thus not all operations occur within the confines of a Flask request.
Per SIP-35 Commands really are the best representation of a “unit of work” as they provide a single cohesive business operation. Note not all RESTful API endpoints interface with a Command and thus, under these circumstances, the RESTful API can serve as the de facto atomic unit.
This proposal is inline with other recommendations that state one should
and
Nested Transactions
Using a Command (as opposed to a DAO) as the "unit of work" does not mitigate the over commit problem completely as it is conceivable that a business transaction may not be encapsulated by a single Command.
To address this—without adding code complexity—we recommend leveraging SQLAlchemy's nested transaction which is both viable as all our metadata engines—MySQL, PostgreSQL, and SQLite— support the
SAVEPOINT
construct and conducive to our design pattern.Used in conjunction with a context manager, upon exit, the outermost returned transaction object is either committed or rolled back to the
SAVEPOINT
. See here for the pseudo implementation.The merits of this approach are:
Session.flush()
,Session.commit()
, orSession.rollback()
as this is provided by the context manager.Event Handlers
Apart from asynchronous Celery tasks, only the Flask-SQLAlchemy singleton session should be used. Event handlers should share the same session of the mapped instance being persisted, i.e.,
SQLAlchemy event handlers typically either call
Connection.execute()
(which auto-commit) or instantiate a new session (and corresponding transaction)—both of which violate the atomic unit construct. Invoking additional database operations can be problematic, i.e., the transaction may be closed.In general we should discourage the use of the SQLAlchemy event handlers due to their complexity. Furthermore SQLAlchemy's bulk operations do not dispatch the corresponding ORM event callbacks which historically are required to augment additional records loosely defined via implicit relationships.
The proposed resolution would be to migrating the various operations—which typically represent business logic—to either the database (if appropriate) or Command level.
Testing
Tests should leverage the nested transaction (with a twist). Typically tests should upon startup:
And upon teardown:
A
pytest
fixture similar to this (with function scope) achieves the desired functionality. Note the fixture uses the nested transcript construct sans context manager to ensure that the transaction is never explicitly committed.Examples
The following code illustrates how the DAO and Command should be constructed, where it is evident that the Command (business layer) has control of the transaction.
Flask-AppBuilder
Superset relies heavily on Flask-AppBuilder (FAB) which has a tendency to explicitly commit, i.e.,
SecurityManager.add_user()
, thus violating our definition of "unit of work", i.e., the following,will not roll back to the savepoint at the beginning of the nested transaction, as a rollback or commit updates the savepoint.
There are a couple of options available to address this issue:
SecurityManager.get_session()
property by providing a monkey patched session (associated with a nested transaction) where the commit operation merely flushes* and the rollback operation is a no-op.Though (2) is preferable, (1) can be implemented as follows:
* The reason the
commit
flushes as opposed to being a no-op (which SQLAlchemy invokes unconditionally prior to a commit) is this helps preserve existing commit workflows where a series of operations (insert, update, delete) are commicated to the database .New or Changed Public Interfaces
None.
New Dependencies
May require an update to FAB to ensure that the atomic unit remains intact.
Migration Plan and Compatibility
Rejected Alternatives
None.
The text was updated successfully, but these errors were encountered: