SQL's WITH RECURSIVE
is a pretty beefy feature, but it can be mysterious when or why you'd even want to use it.
If you read around a bit, you might see that "graph reachability" or "transitive closure" are the most common motivating examples.
Do these examples motivate you?
They don't really motivate me.
Let's talk through some plausibly more motivating examples of why you might want to write recursive queries.
SQL's WITH RECURSIVE
allows you to write WITH
blocks whose bindings refer to each other.
I'm going to use Materialize's WITH MUTUALLY RECURSIVE
for my examples, so don't get stressed out if you see that, nor if you see something you think doesn't actually work with WITH RECURSIVE
.
All of these examples work in Materialize.
Here's the classing "graph reachability" example in WITH RECURSIVE
:
WITH MUTUALLY RECURSIVE
-- A path of edges leads from `a` to `b`.
reach (a int, b int) AS (
SELECT * FROM edges
UNION
SELECT edges.a, reach.b
FROM edges, reach
WHERE egdes.b = reach.a
)
SELECT * FROM reach;
You get to write a binding reach
that can refer to itself.
The collection reach
is initially empty, but you repeatedly set it equal to its definition.
Each time it changes, its output might change, and you should apply the rule again.
Better, think of the result as a fixed point, which results from the unlimited application of the binding rule.
At the end, you get to SELECT
out whatever you like.
We've already posited that reach
is not what you'd like, because so rarely do you want to march around on graphs.
So, let's instead look through some other use cases where recursion proves invaluable.
Often you have a rule to apply some logic and get some new state out, but you want to repeat it.
The best personal example I have of this is a Minesweeper clone I have in SQL. You can describe the rules in SQL (count adjacent mines, locations where all mines can be seen). But the most satisfying part of Minesweeper is having it open up a region of zero-count mines for you. To get this to happen you need to repeatedly apply the update rules, and it just isn't all that satisfying if you have to run each step yourself.
To my great delight, you can take the block of views and table inserts that form "one step" and wrap them with WITH MUTUALLY RECURSIVE
.
The result just works.
Let's take a closer-to-home example for Materialize. Materialize maintains views over data as they change, but at any moment each view is only caught up through some input timestamp. This timestamp is probably less than the timestamps through which the raw input data on which it relies is available, because things take time. This gap in timestamps is called the "propagation delay", and we'd like to understand where it comes from when it is large. If a view isn't caught up, it could be because the view definition is expensive, or it could be that some of its inputs is not caught up. Those inputs might themselves be slow views, or again blocked on slow inputs. Although Materialize surfaces the introspection data that would let you know which it is, the process of chasing down the dependencies, an iterative process, requires a lot of human effort. A recursive query could get you a full report at once, with all of the raw input timestamps and the propagation delay for each intermediate view.
You could run these queries with a human (or script) in the loop, rather than use a recursive query, but you risk inconsistency. The any round of work you do may no longer be valid if the input have changed since you ran your first query. These queries can take variable numbers of rounds, and odds are by the time you've finished them all something has changed.
Atomicity is a great reason to want recursive queries in the language. You really want that work to get applied atomically, as otherwise you haven't a clue what the results actually mean.
Folks love organizing their data into folders. Folders full of folders, full of folders, with various bits of data sprinkled througout. You often need to roll up your aggregates by these folders, so you can make sense of the volume of data you are storing.
This sort of data-defined roll-up is a cinch with a recursive query:
WITH MUTUALLY RECURSIVE
-- Sum of all file sizes in the folder, even transitively.
totals(folder text, total int) AS (
SELECT folder, sum(size)
FROM (
-- Files may exist in folders.
SELECT folder, size FROM files
UNION ALL
-- Folders may contain other folders.
SELECT child_of.parent, totals.total
FROM totals, child_of
WHERE child_of.folder = totals.folder
)
GROUP BY folder
)
SELECT * FROM totals;
Of course, it doesn't have to be folders.
The classic SQL example for recursive queries is the "bill of materials" query, where one wants to determine the raw parts needed to assemble some complex object. Of course, the complex object doesn't just tell you; you just know for each part which other parts are required to build it. These other parts need their own parts, all the way down to whatever the "raw parts" are (those that don't need other parts). A recursive query is just what you need if you want to make a shopping list for your factory.
Data organized by data are especially interesting for Materialize, where the changing input data can change not only the totals but even the structure of aggregation. Or not change the totals, as when moving subfolders around: the higher-ups shouldn't even notice when you pivot around the contents of your home directory. In either case, you want to move from one correct answer to another, without transient errors as you change the data or its structure.
You'll need to use recursion at some point, it's just a question of whether you do it in your database or not.
Materialize has a hierarchical dataflow visualizer based of off timely dataflow reporting.
Timely dataflow reports elapsed times, record counts, arranged data volumes, etc., as well as its hierarchical dataflow structure.
We didn't have WITH MUTUALLY RECURSIVE
when we did the visualizer, so we do that aggregation in React.
The React code wasn't maintained as the underlying system evolved, and glitches ensued.
Moving that logic into the database means that much less React code to support.
If you are rendering a web page you'll need to first acquire various assets, which may reference other assets. You can write this yourself, chasing down dependencies until you have everything you need, one database transaction at a time. Or, if you are lucky, you can use recursive queries to get your database to pre-chase the results for you, and come back with everything it can give you in one go. You probably get better results, probably better performance, and you get to delete some code.
Recursion is fundamental to computer science. You, or folks around you, will need to use it.
Moreover, you want your database to support recursion directly. You want this for all the same sorts of reasons you want your database to support SQL.
-
Isolation: You want results that reflect an view over consistent input data.
You don't want to order parts from a continually changing list of requirements for intermediate parts. You don't want to chase down performance phantoms when nothing is actually wrong. The database can ensure that you see only results that correspond to the atomic evaluation of your query, isolated from other changes.
-
Optimization: You want the system to be able to pick the best implementation.
If you want to look up a specific
folder
intotals
, you probably don't want to have to write a new query each time. Buttotals
is a view; you can justSELECT .. WHERE folder = ..
from it, and the optimizer will ensure only the relevant state is explored. The optimizer will also ensure thattotals
is indexed byfolder
, which is what you need to quickly roll up these aggregates. -
Reactivity: You want to subscribe to the change stream of a recursive query.
Materialize lets you subscribe to the change stream of views, including recursive views. Let's say you want to get notified when a
folder
has a total that exceeds some quota. Are you going to write correct theLISTEN/NOTIFY
logic yourself? Of course not; you'd much rather justSUBSCRIBE
to a join betweentotals
andquota
.
Databases do lots of great things for you. The more you can tell them about what you need, the more great things they can do for you.