These experimentations where conducted with the following goals in mind:
- Familiarization with the Arrow C++ code base
- Evaluation of the effort required to build a query engine on top of it
- Try to deploy a C++ app with complex dependency tree on AWS Lambda
- Experiment on AWS Lambda raw performance (S3 bandidth, memory bandwidth, CPU instruction sets...)
The Cloudfuse query engine is splitted into two components:
- 🐝 bees : the cloud function workers (AWS Lambda) that load and pre-aggregate the S3 data
- 🍯 hives: the containers (AWS Fargate) that collect and reduce the aggregates sent by bees.
Deployments and runs are managed by the Makefile. Commands are detailed in the HOWTO section below.
- The
code/directory contains the C++ code base. It contains a set of experiments that can be run individually incode/playground/. The rest of the code are helpers that are common to multiple experiments. - The
data/directory is the one mounted by minio to simulate S3 locally. Parquet files for experiments go here with the following path:data/bucketname/keyname. - The
docker/directory contains build imagesand scripts for the various dependencies of the project - The
infra/directory contains terraform scripts to deploy the experiments in the AWS cloud. This infra deploys a "generic" Lambda where you can load any of the experiment fromcode/playground/, lambdas with bandwidth tests automatically triggered to get statistics through time, and an ECS cluster with the hive config.
- docker
- AWS account(s) + AWS CLI
- bash + makefile
- good knowledge of terraform, Cpp, cmake and docker
make testrun C++ tests locallymake run-local-XXXwhere XXX should be replaced by the experiment file name will run that experiment locally.- Note:
make run-local-flight-serverdependency to abseil seems broken
- Note:
make bash-inside-emulatorto explore the Lambda runtime emulator interactively
You need an AWS account with the AWS CLI and .aws/credentials properly configured. If you use S3 as a backend for terraform, you can use a bucket in a different account from your deployment (the account is determined by the profile=xxx config below). You first need to init your terraform remote backend:
cd infra
terraform init \
-backend-config="bucket=s3-tf-backend-bucket" \
-backend-config="key=cloudfuse-labs-cpp" \
-backend-config="region=eu-west-1" \
-backend-config="profile=s3-tf-backend-profile"
terraform workspace new dev
cd ..make initrun terraform init in the current workspaceGEN_PLAY_FILE=XXX make deploy-beedeploy the experiment file XXX to the "generic" Lambda function. For lambdas that need to access an object from S3, you can configure it ininfra/playgroun-generic.tfmake run-beerun the experiment deployed in the "generic" Lambda function.GEN_PLAY_FILE=XXX make deploy-run-beerun both abovemake deploy-bench-XXXrun functions multiple time stimulating cold starts by changing the function between runsmake docker-loginrequired to login to your ECR repository if deploying hive components to the cloud. You will be prompted for the AWS profile you want to use.force-deploydeploy the "generic" lambda for individual tests, nd bandwidth tests triggered by crons to get statistics of lambda perfs through time and a hive infra (you should configure a valid object ininfra/playgroun-collect.tf). Needs docker-login to deploy hive. You will be prompted for the AWS profile you want to use.make destroyremove the resources. You will be prompted for the AWS profile that has access to the resources you want to destroy.
In the file where you want to print the backtrace:
#define BOOST_STACKTRACE_USE_ADDR2LINE
#include <boost/stacktrace.hpp>std::cout << boost::stacktrace::stacktrace();In the ThirdpartyToolchain.cmake file of arrow, remove slim boost source URLs:
set_urls(
BOOST_SOURCE_URL
# These are trimmed boost bundles we maintain.
# See cpp/build_support/trim-boost.sh
# "https://dl.bintray.com/ursalabs/arrow-boost/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
"https://dl.bintray.com/boostorg/release/${ARROW_BOOST_BUILD_VERSION}/source/boost_${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}.tar.gz"
"https://github.com/boostorg/boost/archive/boost-${ARROW_BOOST_BUILD_VERSION}.tar.gz"
# FIXME(ARROW-6407) automate uploading this archive to ensure it reflects
# our currently used packages and doesn't fall out of sync with
# ${ARROW_BOOST_BUILD_VERSION_UNDERSCORES}
#
)