update ballista

apache · May 21, 2021 · 103c87c · 103c87c
1 parent 1fecbdf
commit 103c87c
Show file tree

Hide file tree

Showing 10 changed files with 44 additions and 34 deletions.
diff --git a/ballista/README.md b/ballista/README.md
@@ -19,14 +19,14 @@
 
 # Ballista: Distributed Compute with Apache Arrow
 
-Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built 
-on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as 
+Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built
+on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as
 first-class citizens without paying a penalty for serialization costs.
 
 The foundational technologies in Ballista are:
 
 - [Apache Arrow](https://arrow.apache.org/) memory model and compute kernels for efficient processing of data.
-- [Apache Arrow Flight Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for efficient 
+- [Apache Arrow Flight Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for efficient
   data transfer between processes.
 - [Google Protocol Buffers](https://developers.google.com/protocol-buffers) for serializing query plans.
 - [Docker](https://www.docker.com/) for packaging up executors along with user-defined code.
@@ -57,7 +57,6 @@ April 2021 and should be considered experimental.
 
 ## Getting Started
 
-The [Ballista Developer Documentation](docs/README.md) and the 
-[DataFusion User Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide) are currently the 
+The [Ballista Developer Documentation](docs/README.md) and the
+[DataFusion User Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide) are currently the
 best sources of information for getting started with Ballista.
-
diff --git a/ballista/docs/README.md b/ballista/docs/README.md
@@ -16,19 +16,19 @@
   specific language governing permissions and limitations
   under the License.
 -->
+
 # Ballista Developer Documentation
 
-This directory contains documentation for developers that are contributing to Ballista. If you are looking for 
-end-user documentation for a published release, please start with the 
+This directory contains documentation for developers that are contributing to Ballista. If you are looking for
+end-user documentation for a published release, please start with the
 [DataFusion User Guide](../../docs/user-guide) instead.
 
 ## Architecture & Design
 
-- Read the [Architecture Overview](architecture.md) to get an understanding of the scheduler and executor 
+- Read the [Architecture Overview](architecture.md) to get an understanding of the scheduler and executor
   processes and how distributed query execution works.
 
 ## Build, Test, Release
 
 - Setting up a [development environment](dev-env.md).
 - [Integration Testing](integration-testing.md)
-
diff --git a/ballista/docs/architecture.md b/ballista/docs/architecture.md
@@ -16,37 +16,38 @@
   specific language governing permissions and limitations
   under the License.
 -->
+
 # Ballista Architecture
 
 ## Overview
 
-Ballista allows queries to be executed in a distributed cluster. A cluster consists of one or 
+Ballista allows queries to be executed in a distributed cluster. A cluster consists of one or
 more scheduler processes and one or more executor processes. See the following sections in this document for more
 details about these components.
 
-The scheduler accepts logical query plans and translates them into physical query plans using DataFusion and then 
-runs a secondary planning/optimization process to translate the physical query plan into a distributed physical 
-query plan. 
+The scheduler accepts logical query plans and translates them into physical query plans using DataFusion and then
+runs a secondary planning/optimization process to translate the physical query plan into a distributed physical
+query plan.
 
-This process breaks a query down into a number of query stages that can be executed independently. There are 
-dependencies between query stages and these dependencies form a directionally-acyclic graph (DAG) because a query 
+This process breaks a query down into a number of query stages that can be executed independently. There are
+dependencies between query stages and these dependencies form a directionally-acyclic graph (DAG) because a query
 stage cannot start until its child query stages have completed.
 
-Each query stage has one or more partitions that can be processed in parallel by the available 
+Each query stage has one or more partitions that can be processed in parallel by the available
 executors in the cluster. This is the basic unit of scalability in Ballista.
 
-The following diagram shows the flow of requests and responses between the client, scheduler, and executor 
-processes. 
+The following diagram shows the flow of requests and responses between the client, scheduler, and executor
+processes.
 
 ![Query Execution Flow](images/query-execution.png)
 
 ## Scheduler Process
 
-The scheduler process implements a gRPC interface (defined in 
+The scheduler process implements a gRPC interface (defined in
 [ballista.proto](../rust/ballista/proto/ballista.proto)). The interface provides the following methods:
 
 | Method               | Description                                                          |
-|----------------------|----------------------------------------------------------------------|
+| -------------------- | -------------------------------------------------------------------- |
 | ExecuteQuery         | Submit a logical query plan or SQL query for execution               |
 | GetExecutorsMetadata | Retrieves a list of executors that have registered with a scheduler  |
 | GetFileMetadata      | Retrieve metadata about files available in the cluster file system   |
@@ -60,7 +61,7 @@ The scheduler can run in standalone mode, or can be run in clustered mode using
 The executor process implements the Apache Arrow Flight gRPC interface and is responsible for:
 
 - Executing query stages and persisting the results to disk in Apache Arrow IPC Format
-- Making query stage results available as Flights so that they can be retrieved by other executors as well as by 
+- Making query stage results available as Flights so that they can be retrieved by other executors as well as by
   clients
 
 ## Rust Client
@@ -69,7 +70,6 @@ The Rust client provides a DataFrame API that is a thin wrapper around the DataF
 the means for a client to build a query plan for execution.
 
 The client executes the query plan by submitting an `ExecuteLogicalPlan` request to the scheduler and then calls
-`GetJobStatus` to check for completion. On completion, the client receives a list of locations for the Flights 
-containing the results for the query and will then connect to the appropriate executor processes to retrieve 
+`GetJobStatus` to check for completion. On completion, the client receives a list of locations for the Flights
+containing the results for the query and will then connect to the appropriate executor processes to retrieve
 those results.
-
diff --git a/ballista/docs/dev-env.md b/ballista/docs/dev-env.md
@@ -16,13 +16,14 @@
   specific language governing permissions and limitations
   under the License.
 -->
+
 # Setting up a Rust development environment
 
 You will need a standard Rust development environment. The easiest way to achieve this is by using rustup: https://rustup.rs/
 
 ## Install OpenSSL
 
-Follow instructions for [setting up OpenSSL](https://docs.rs/openssl/0.10.28/openssl/). For Ubuntu users, the following 
+Follow instructions for [setting up OpenSSL](https://docs.rs/openssl/0.10.28/openssl/). For Ubuntu users, the following
 command works.
 
 ```bash
@@ -35,4 +36,4 @@ You'll need cmake in order to compile some of ballista's dependencies. Ubuntu us
 
 ```bash
 sudo apt-get install cmake
-```
+```
diff --git a/ballista/docs/integration-testing.md b/ballista/docs/integration-testing.md
@@ -16,10 +16,11 @@
   specific language governing permissions and limitations
   under the License.
 -->
+
 # Integration Testing
 
-We use the [DataFusion Benchmarks](https://github.com/apache/arrow-datafusion/tree/master/benchmarks) for integration 
-testing. 
+We use the [DataFusion Benchmarks](https://github.com/apache/arrow-datafusion/tree/master/benchmarks) for integration
+testing.
 
 The integration tests can be executed by running the following command from the root of the DataFusion repository.
 

diff --git a/ballista/rust/client/README.md b/ballista/rust/client/README.md
@@ -18,5 +18,5 @@
 -->
 
 # Ballista - Rust
-This crate contains the Ballista client library. For an example usage, please refer [here](../benchmarks/tpch/README.md).
 
+This crate contains the Ballista client library. For an example usage, please refer [here](../benchmarks/tpch/README.md).
diff --git a/ballista/rust/core/README.md b/ballista/rust/core/README.md
@@ -18,4 +18,5 @@
 -->
 
 # Ballista - Rust
+
 This crate contains the core Ballista types.
diff --git a/ballista/rust/executor/README.md b/ballista/rust/executor/README.md
@@ -18,6 +18,7 @@
 -->
 
 # Ballista Executor - Rust
+
 This crate contains the Ballista Executor. It can be used both as a library or as a binary.
 
 ## Run
@@ -28,4 +29,4 @@ RUST_LOG=info cargo run --release
 [2021-02-11T05:30:13Z INFO  executor] Running with config: ExecutorConfig { host: "localhost", port: 50051, work_dir: "/var/folders/y8/fc61kyjd4n53tn444n72rjrm0000gn/T/.tmpv1LjN0", concurrent_tasks: 4 }
 ```
 
-By default, the executor will bind to `localhost` and listen on port `50051`.
+By default, the executor will bind to `localhost` and listen on port `50051`.
diff --git a/ballista/rust/scheduler/README.md b/ballista/rust/scheduler/README.md
@@ -18,6 +18,7 @@
 -->
 
 # Ballista Scheduler
+
 This crate contains the Ballista Scheduler. It can be used both as a library or as a binary.
 
 ## Run
@@ -32,8 +33,9 @@ $ RUST_LOG=info cargo run --release
 By default, the scheduler will bind to `localhost` and listen on port `50051`.
 
 ## Connecting to Scheduler
-Scheduler supports REST model also using content negotiation. 
-For e.x if you want to get list of executors connected to the scheduler, 
+
+Scheduler supports REST model also using content negotiation.
+For e.x if you want to get list of executors connected to the scheduler,
 you can do (assuming you use default config)
 
 ```bash
@@ -43,7 +45,8 @@ curl --request GET \
 ```
 
 ## Scheduler UI
-A basic ui for the scheduler is in `ui/scheduler` of the ballista repo. 
+
+A basic ui for the scheduler is in `ui/scheduler` of the ballista repo.
 It can be started using the following [yarn](https://yarnpkg.com/) command
 
 ```bash

diff --git a/ballista/ui/scheduler/README.md b/ballista/ui/scheduler/README.md
@@ -22,7 +22,9 @@
 ## Start project from source
 
 ### Run scheduler/executor
+
 First, run scheduler from project:
+
 ```shell
 $ cd rust/scheduler
 $ RUST_LOG=info cargo run --release
@@ -34,6 +36,7 @@ $ RUST_LOG=info cargo run --release
 ```
 
 and run executor in new terminal:
+
 ```shell
 $ cd rust/executor
 $ RUST_LOG=info cargo run --release
@@ -44,6 +47,7 @@ $ RUST_LOG=info cargo run --release
 ```
 
 ### Run Client project
+
 ```shell
 $ cd ui/scheduler
 $ yarn
Original file line number	Diff line number	Diff line change
Expand Up		@@ -18,4 +18,5 @@
		-->

		# Ballista - Rust

		This crate contains the core Ballista types.