Skip to content

Latest commit

 

History

History
139 lines (98 loc) · 11.1 KB

2022-04-06.md

File metadata and controls

139 lines (98 loc) · 11.1 KB

Materialize Unbundled: Consistency at Scale

Materialize allows you to frame SQL queries against continually changing data, and it will compute, maintain, and serve the answers even as the underlying data change.

Consistency is a watchword at Materialize. We are able to maintain query outputs that at all times correspond exactly to their inputs, at least as observed by Materialize. This effectively overcomes the cache invalidation problem, one of the core hard problems in computer science.

That sounds like a bold claim, so there is probably a catch. For sure, you could reasonably imagine that the problem can be solve "in the small". One carefully written process or tightly coupled system could work hard to keep everything in check. The issues arise when your system needs to grow, to involve more and varied resources. The complexity of managing all of their behavior causes systems (or their properties) to collapse.

As it turns out, the mechanisms Materialize uses for consistency do scale to large systems. In this post we'll explain those mechanisms, and outline our plans for scaling out Materialize to a platform for consistent, continually changing data.

Some context

Materialize is undergoing a fairly dramatic internal architectural shift. It has historically been a single binary, with some scale-out aspirations, that handles data ingestion, incremental view maintenance, and query serving, all in one. This design has changed, to one with separated storage, compute, and serving planes, so that each plane can operate and scale independently. You can ingest arbitrary volumes of data to elastic storage (think S3), you can spin up unlimited numbers of compute instances to read from and write back to this data, and you can serve results to as many concurrent connections as you like.

With all of these ambitions, how do we avoid racing forward with tangled shoelaces and landing immediately and forcefully on our face?

Materialize's consistency mechanism

Materialize uses virtual time as the basis of its consistency.

Virtual time is a technique for distributed systems that says events should be timestamped prescriptively rather than descriptively. The recorded time says when an event should happen, rather than when it did happen. Virtual time is definitely not for all systems. It is however a great fit for systems tasked with maintaining views over data that undergo specific, externally driven changes.

Materialize records, transforms, and reports explicitly timestamped histories of collections of data. These explicit histories promptly and unambiguously communicate the exact contents of a collection at each of an ever-growing set of times. If we are doing our job well, these times are pretty close to "right now", though it is fine if they lag behind a bit.

Explicit histories are in contrast to implicit histories, which is how most systems work. A system with implicit histories has some state at some point in time, and for whatever reason it chooses to change its state to something else. Anyone watching would notice the state transition, but you would have to be watching, or often actively participating in the system. The implicitness of these systems force a coupling of the behavior of other participants, often in ways that frustrate scaling and robust, distributed operation.

Once input data are recorded as explicit histories, the potential confusion of concurrency is largely removed. Problems of behavioral coordination are reduced to "just computation": components must produce the correct timestamped output from their timestamped input, as if the input changed at the recorded times. Much of Materialize's machinery is then about efficiently computing, maintaining, and returning the specific correct answers at specific virtual times.


Virtual time is related to multiversioning, used by traditional databases for concurrency control. These systems maintain recent historical values of data, potentially several, to decouple the apparent and actual changes to the data. However, these multiple versions are usually cleaned up as soon as possible, and ideally not exposed to the user. Multiple versions are a first class citizen in Materialize's data model, rather than an internal mechanism for optimizing performance.


Materialize's Unbundled Architecture

Materialize is architected in three layers: Storage, Compute, and Adapter. Virtual times are the decoupling mechanism for these layers.

  • Storage ensures that input data are durably transcribed as explicit histories, and provides access to snapshots at a virtual time and subscriptions to changes from that time onward.
  • Compute transforms explicit input histories into the corresponding explicit output histories, for views it has been tasked to compute and maintain.
  • Adapter maps user actions (e.g. INSERT, SELECT) to virtual times, to present the users with the experience of a transactional system that applies operations in sequence.

The three layers do not need to have their executions coupled. Their behavior is only indirectly synchronized through the availability of virtually timestamped results.

Importantly, each of these layers can be designed independently, and their operation scaled independently. As we'll see, these designs will follow different principles, and avoid scaling bottlenecks with different techniques.

Storage: Writing things down

The Storage layer is tasked with durably maintaining explicitly timestamped histories of data collections.

Storage is driven primarily by requests to create and then continually ingest "sources" of data. There are various types of sources, detailing where the data come from, its format, and how to intepret each new utterance about the data. However, all sources have the property that once recorded they present to the rest of Materialize as explicitly timestamped histories of relational data. Storage captures this representation, maintains it durably, and presents it promptly and consistently.

Storage is the place we pre-resolve questions of concurrency. The virtual time an update is assigned becomes the truth about when that update happens. These times must reflect constraints on the input: updates in the same input transaction much be given the same virtual time, updates that are ordered in the input must be given virtual times that respect that order. The recorded explicitly timestamped history is now unambiguous on matters of concurrency.

Storage is also the place where we record the output of the Compute layer, and make it available for other compute instances as input. These outputs exactly correspond to the virtual times in their inputs, and other compute instances using any number of inputs and outputs will see exactly consistent views, without further coordination.

Compute: Transforming data

The Compute layer is tasked with efficiently computing and maintaining views over explicitly timestamped histories of data collections.

In Materialize, the Compute layer is implemented by differential dataflow atop timely dataflow. These are high-performance, scale-out dataflow systems, designed exactly for the task of maintaining consistent views over changing data with high throughput and low latency. Importantly, each instance is also able to share its maintained data with other dataflows on the same system, which can result in a substantial reduction in required resources, and improvement in performance. The trade-off that comes with this sharing is a lack of isolation: overwhelming an instance overwhelms all of the co-located view maintenance tasks.

For this reason, Compute is keen to spin up independent instances when sharing is not valuable, or when isolation is paramount. Crucially, independent instances do not mean inconsistent instances. Materialize's use of virtual time ensures that independent instances still provide consistent results, with no coordination other than the explicitly timestamped histories provided by Storage.

Consistency has other amazing consequences: each compute instance can be actively replicated, as independently operating replicas will produce identical results. Active replication is an excellent tool for masking downtime due to failures, rescaling, reconfiguration, and version upgrades. Here strong consistency has provided us with a tool for greater availability, which---while not strictly a CAP theorem violation---isn't what you would normally expect.

Adapter: Serving results

The Adapter layer is tasked with assigning timestamps to users actions to present the experience of a system that move forward consistently through time.

Users come to Materialize looking for the experience of a SQL database and strong consistency guarantees. However, they likely do not know about virtual time, and their SQL queries certainly do not. The users hope to type various SELECT and INSERT flavored commands, perhaps surrounded by BEGIN and COMMIT, and would like the experience of a system that applies the commands of all users in one global sequence.

This does not mean that Materialize must actually apply these operations in a sequence, only that it must appear to do so.

At its core, the thing Materialize must do is assign a virtual timestamp to each user command, which determines the intended order. One this has been done, the operation of the rest of the system, updates to managed tables and query results returned back, are all "determined". Materialize still has some work to do to return the results, but the coordination problem has been reduced to producing the correct answer.

This is not to trivialize the timestamp assignment process. Although workloads of independent reads and writes work out well, transactions throw a spanner in the works. Transactions that have a mix of reads and writes must span multiple virtual times, and doing so for multiple transactions at a time risks serializability violations (for reasons you might imagine, and just as many you didn't).

Putting the pieces back together

Virtual time underlies Materialize's consistency guarantees, and its decoupled architecture. Independent components coordinate indirectly through virtual time, allowing their actual implementations to operate as efficiently as they know how.

This decoupling allows scalable, robust, distributed implementations of low-latency systems, which .. is just really exciting.