Skip to content

PoC of distributed compute platform using Rust, Apache Arrow, and Kubernetes!

Notifications You must be signed in to change notification settings

franchb/ballista

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ballista

Ballista is will be a proof-of-concept distributed compute platform based on Kubernetes and the Rust implementation of Apache Arrow.

This is not my first attempt at building something like this. I originally wanted DataFusion to be a distributed compute platform but this was overly ambitious at the time, and it ended up becoming an in-memory query execution engine for the Rust implementation of Apache Arrow. However, DataFusion now provides a good foundation to have another attempt at building a modern distributed compute platform in Rust.

My goal is to use this repo to move fast and try out ideas that eventually can be contributed back to Apache Arrow and to help drive requirements for Apache Arrow and DataFusion.

I will be working on this project in my spare time, which is limited, so progress will likely be slow.

PoC Status

  • README describing project
  • Define service and minimal query plan in protobuf file
  • Generate code from protobuf file
  • Implement skeleton gRPC server
  • Implement skeleton gRPC client
  • Client can send query plan
  • Server can receive query plan
  • CLI to create cluster using Kubernetes
  • Server can translate protobuf query plan to DataFusion query plan
  • Server can execute query plan using DataFusion
  • Server can write results to CSV files
  • Server can stream Arrow data back to client
  • Benchmarks
  • Implement Flight protocol

Building

Currently depends on https://github.com/tower-rs/tower-grpc/tree/master/tower-grpc being cloned in a parallel directory.

Run Example

Open two terminal sessions. In first session, run:

cargo run --bin server

In second terminal, run:

cargo run --example client

So far, this just sends a logical query plan from the client to the server.

About

PoC of distributed compute platform using Rust, Apache Arrow, and Kubernetes!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages