title | layout |
---|---|
Features |
features |
Flink's data streaming runtime achieves high throughput rates and low latency with little configuration. The charts below show the performance of a distributed item counting task, requiring streaming data shuffles.
Flink supports stream processing and windowing with Event Time semantics.
Event time makes it easy to compute over streams where events arrive out of order, and where events may arrive delayed.
Streaming applications can maintain custom state during their computation.
Flink's checkpointing mechanism ensures exactly once semantics for the state in the presence of failures.
Flink supports windows over time, count, or sessions, as well as data-driven windows.
Windows can be customized with flexible triggering conditions, to support sophisticated streaming patterns.
Data streaming applications are executed with continuous (long lived) operators.
Flink's streaming runtime has natural flow control: Slow data sinks backpressure faster sources.
Flink's fault tolerance mechanism is based on Chandy-Lamport distributed snapshots.
The mechanism is lightweight, allowing the system to maintain high throughput rates and provide strong consistency guarantees at the same time.
Flink uses one common runtime for data streaming applications and batch processing applications.
Batch processing applications run efficiently as special cases of stream processing applications.
Flink implements its own memory management inside the JVM.
Applications scale to data sizes beyond main memory and experience less garbage collection overhead.
Flink has dedicated support for iterative computations (as in machine learning and graph analysis).
Delta iterations can exploit computational dependencies for faster convergence.
Batch programs are automatically optimized to exploit situations where expensive operations (like shuffles and sorts) can be avoided, and when intermediate data should be cached.
The DataStream API supports functional transformations on data streams, with user-defined state, and flexible windows.
The example shows how to compute a sliding histogram of word occurrences of a data stream of texts.
WindowWordCount in Flink's DataStream API
{% highlight scala %} case class Word(word: String, freq: Long)val texts: DataStream[String] = ...
val counts = text .flatMap { line => line.split("\W+") } .map { token => Word(token, 1) } .keyBy("word") .timeWindow(Time.seconds(5), Time.seconds(1)) .sum("freq") {% endhighlight %}
Flink's DataSet API lets you write beautiful type-safe and maintainable code in Java or Scala. It supports a wide range of data types beyond key/value pairs, and a wealth of operators.
The example shows the core loop of the PageRank algorithm for graphs.
val result = initialRanks.iterate(30) { pages => pages.join(adjacency).where("pageId").equalTo("pageId") {
(page, adj, out : Collector[Page]) => {
out.collect(Page(page.id, 0.15 / numPages))
for (n <- adj.neighbors) {
out.collect(Page(n, 0.85*page.rank/adj.neighbors.length))
}
}
} .groupBy("pageId").sum("rank") } {% endhighlight %}
Flink's stack offers libraries with high-level APIs for different use cases: Machine Learning, Graph Analytics, and Relational Data Processing.
The libraries are currently in beta status and are heavily developed.