Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 2.07 KB

README.md

File metadata and controls

34 lines (25 loc) · 2.07 KB

flink-pi

Simple Scala examples for using Apache Flink's DataStream and SQL API for the same problem, that is simple enough not to involve other libraries.

The example approximates Pi with the Monte Carlo method using Flink's

All three examples have the same functionality, by iteratively approaching better and better estimate for the value of Pi. All write the actual best estimate in the given intervals (3 sec)
till an explicit user interruption.

How it Works

As a continuous DataSource, (random/sequnetial) IDs are generated either by IdGenerator or by DataGen SQL Connector.

The next step is the creation is random point in a 2x2 box , centered in the origin, for each generated ID. The ratio of withinCircle points in the sample estimates Pi/4. Rationale : The area of the 2x2 box is 4, the area of the 1 unit radius circle inside is 1*1*Pi = Pi by definition. As a consequence, the ratio of these two areas are : Pi/4 = P(isWithinCircle)/1=> Pi = 4 * P(isWithinCircle) The data from the millions of trials (resulting 4.0's and 0.0's) are aggregated in two phases. The first phase is on each thread (the bulk of the workload), while the second is a global aggregation.

The output is collected in Print SQL Sink or Flink's DataStreamSink.