- Apache Spark is a unified analytics engine for large-scale data processing.
- It provides high-level APIs in Java, Scala, Python and R,
- and an optimized engine that supports general execution graphs.
- It also supports a rich set of higher-level tools including
-
Spark SQL for SQL and structured data processing,
-
pandas API on Spark for pandas workloads,
-
MLlib for machine learning,
-
GraphX for graph processing,
- and
Structured Streaming
- for incremental computation and stream processing.
-
+ Apache Spark is a unified analytics engine for large-scale data processing.
+ It provides high-level APIs in Java, Scala, Python and R,
+ and an optimized engine that supports general execution graphs.
+ It also supports a rich set of higher-level tools including
+
Spark SQL for SQL and structured data processing,
+
pandas API on Spark for pandas workloads,
+
MLlib for machine learning,
+
GraphX for graph processing,
+ and
Structured Streaming
+ for incremental computation and stream processing.