Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flink Operator Bakery #21

Merged
merged 44 commits into from
Sep 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
af21b72
Add very basic flink runner
yuvipanda Sep 1, 2022
010f1f3
Create FlinkCluster & port-forward it appropriately
yuvipanda Sep 2, 2022
a5e5731
Make sure kubectl is installed
yuvipanda Sep 2, 2022
225cde3
Rename the Bakery to better reflect what it is
yuvipanda Sep 2, 2022
6cd5c5a
Parameterize flink configuration
yuvipanda Sep 2, 2022
f062a19
Setup k3s + flink operator in CI
yuvipanda Sep 2, 2022
e1327e3
Move flink tests to their own workflow
yuvipanda Sep 2, 2022
73ab352
Setup cert-manager in GHA
yuvipanda Sep 2, 2022
1ade0a8
Try waiting for cert-manager to be setup
yuvipanda Sep 2, 2022
5dba943
Wait for cert-manager to be ready before installing flink
yuvipanda Sep 2, 2022
621f0c7
Wait for cert-manager correctly
yuvipanda Sep 2, 2022
ae8ce86
Setup Python on flink test
yuvipanda Sep 2, 2022
4d6a1be
Setup minio
yuvipanda Sep 2, 2022
1ba8b70
Try debug why minio isn't working
yuvipanda Sep 2, 2022
07b5eaa
See why minio is failing
yuvipanda Sep 2, 2022
1d59508
Set some minio config
yuvipanda Sep 2, 2022
4ed0347
Speed up readiness test for sdk harnass
yuvipanda Sep 2, 2022
9828cc9
Set rootPassword to be at least 8 chars long
yuvipanda Sep 2, 2022
c8a94aa
Wait for minio install to finish
yuvipanda Sep 2, 2022
b775532
Try using minio from local machine
yuvipanda Sep 2, 2022
8053015
Don't set up minio in GHA inside k8s
yuvipanda Sep 2, 2022
a17de4a
Install minio + run flink test
yuvipanda Sep 2, 2022
fc9b924
Actually add flink test
yuvipanda Sep 2, 2022
8b87481
Fix test fixtures
yuvipanda Sep 2, 2022
1a39464
Try fix kubectl wait
yuvipanda Sep 2, 2022
6c125fb
Temporarily add a time.sleep to get kubectl wait to work
yuvipanda Sep 2, 2022
952f182
Show what various pods are doing after test fails
yuvipanda Sep 2, 2022
4148251
Mark flink bakery as blocking
yuvipanda Sep 2, 2022
8cf2951
Install socat explicitly in GHA
yuvipanda Sep 2, 2022
315627c
Add sudo to apt commands
yuvipanda Sep 2, 2022
95e8ab6
Specify full name of kubectl
yuvipanda Sep 2, 2022
bcc5ed6
Just wait for 5 mins for flink test to complete
yuvipanda Sep 2, 2022
bdff932
Output more diagnostics
yuvipanda Sep 2, 2022
398b1ff
Bump down default requiremebts
yuvipanda Sep 2, 2022
b3b94a0
Reduce wait in flink test
yuvipanda Sep 2, 2022
f987668
Bump up flink test time
yuvipanda Sep 2, 2022
6414dc1
Don't run flink tests on unit test job
yuvipanda Sep 7, 2022
3e53d27
Actually test the flink test
yuvipanda Sep 7, 2022
136abdf
Upload cov to codecov for flink test too
yuvipanda Sep 7, 2022
317e35e
Make resources of flink deployment configurable
yuvipanda Sep 16, 2022
b8e3a64
Set default memory for both job & taskmanager
yuvipanda Sep 17, 2022
9ca9004
Add a tutorial on running on AWS
yuvipanda Sep 22, 2022
6d2530e
Explicitly specify CPU, or flink asks for 2
yuvipanda Sep 22, 2022
8b90539
Add a little more documentation for flink operator
yuvipanda Sep 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions .github/workflows/flink.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
name: Run Flink Test

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

# Starts a k8s cluster with NetworkPolicy enforcement and installs both
# kubectl and helm
#
# ref: https://github.com/jupyterhub/action-k3s-helm/
- uses: jupyterhub/action-k3s-helm@v3
with:
metrics-enabled: false
traefik-enabled: false
docker-enabled: true

- name: Setup CertManager
run: |
# Setup cert-manager, required by Flink Operator
CERT_MANAGER_VERSION=1.9.1
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v${CERT_MANAGER_VERSION}/cert-manager.yaml

- name: Wait for CertManager to be ready
uses: jupyterhub/action-k8s-await-workloads@v1
with:
timeout: 150
max-restarts: 1
namespace: cert-manager

- name: Setup FlinkOperator
run: |
FLINK_OPERATOR_VERSION=1.1.0
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-${FLINK_OPERATOR_VERSION}
helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator --wait

kubectl get pod -A
kubectl get crd -A

- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v0'


- name: Set up Python 3.9
uses: actions/setup-python@v3
with:
python-version: 3.9

- name: 'Setup minio + mc'
run: |
wget --quiet https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
mv minio /usr/local/bin/minio

wget --quiet https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/mc

minio --version
mc --version


- name: Install dependencies & our package
run: |
python -m pip install --upgrade pip
python -m pip install -r dev-requirements.txt
python -m pip install -e .


- name: Install socat so kubectl port-forward will work
run: |
# Not sure if this is why kubectl proxy isn't working, but let's try
sudo apt update --yes && sudo apt install --yes socat

- name: Test with pytest
run: |
pytest -vvv -s --cov=pangeo_forge_runner tests/test_flink.py
kubectl get pod -A
kubectl describe pod

- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v2
5 changes: 4 additions & 1 deletion .github/workflows/unit-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,11 @@ jobs:
python -m pip install --upgrade pip
python -m pip install -r dev-requirements.txt
python -m pip install -e .

- name: Test with pytest
run: |
pytest -vvv -s --cov=pangeo_forge_runner tests/
# Test everything except the flink tests
pytest -vvv -s --cov=pangeo_forge_runner --ignore=tests/test_flink.py tests/

- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v2
8 changes: 8 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@
Commandline tool to manage [pangeo-forge](https://pangeo-forge.readthedocs.io/en/latest/)
feedstocks

## Tutorials

```{toctree}
:maxdepth: 1

tutorial/flink
```

## Contents

```{toctree}
Expand Down
90 changes: 90 additions & 0 deletions docs/tutorial/flink.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Run a recipe on a Flink cluster on AWS

`pangeo-forge-runner` supports baking your recipes on Apache Flink using
the [Apache Flink Runner](https://beam.apache.org/documentation/runners/flink/)
for Beam. After looking at various options, we have settled on supporting
Flink on Kubernetes using Apache's [Flink Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/).
This would allow baking recipes on *any* Kubernetes cluster!

In this tutorial, we'll bake a recipe on a Amazon [EKS](https://aws.amazon.com/eks/)
kubernetes cluster!

## Setting up the cluster

You need an EKS cluster with [Apache Flink Operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/)
installed. Setting that up is out of the scope for this tutorial, but you can find some
useful terraform scripts for that [here](https://github.com/yuvipanda/pangeo-forge-cloud-federation/)
if you wish.

## Setting up your local machine

1. Install required tools on your machine.
1. [aws](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
2. [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl)

2. Authenticate to `aws` by running `aws configure`. If you don't already have the
AWS Access Keys, you might need to [create one](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey).

3. Get credentials access to the kubernetes cluster by running
`aws eks update-kubeconfig --name=<cluster-name> --region=<region>`.

You can verify this works by running `kubectl get pod` - it should succeed,
and show you that at least a `flink-kubernetes-operator` pod is running.

## Setting up configuration

Construct a `pangeo_forge_runner.py` that will have configuration on *where*
the data should *end up in*. In our case, we will use S3. You must already have
created an S3 bucket for this to work.

```python
# Let's put all our data into s3, partitioned by the id of each job we run
BUCKET_PREFIX = "s3://<bucket-name>/<some-prefix>/{job_id}"

c.TargetStorage.fsspec_class = "s3fs.S3FileSystem"
c.TargetStorage.root_path = f"{BUCKET_PREFIX}/output"
c.TargetStorage.fsspec_args = {
"key": "<your-aws-access-key>",
"secret": "<your-aws-access-secret>",

}

c.InputCacheStorage.fsspec_class = c.TargetStorage.fsspec_class
c.InputCacheStorage.fsspec_args = c.TargetStorage.fsspec_args
c.InputCacheStorage.root_path = f"{BUCKET_PREFIX}/cache/input"

c.MetadataCacheStorage.fsspec_class = c.TargetStorage.fsspec_class
c.MetadataCacheStorage.fsspec_args = c.TargetStorage.fsspec_args
c.MetadataCacheStorage.root_path = f"{BUCKET_PREFIX}/cache/metadata"
```


## Running the recipe

Now run a recipe!

```bash
pangeo-forge-runner bake --repo <url-to-github-repo> --ref <name-of-branch-or-commit-hash>
```

You can add `--prune` if you want to only test the recipe and run just the first
few steps.

This might take a minute to submit.

## Access the Flink Dashboard

After you run the `pangeo-forge-runner` command, amongst the many lines of output,
you should see something that looks like:

`You can run 'kubectl port-forward --pod-running-timeout=2m0s --address 127.0.0.1 <some-name> 0:8081' to make the Flink Dashboard available!`

If you copy the command provided in the message and run it, it should provide you
with a local address where the Flink Dashboard will be available!

```
$ kubectl port-forward --pod-running-timeout=2m0s --address 127.0.0.1 <some-name> 0:8081
Forwarding from 127.0.0.1:<some-number> -> 8081
```

Copy the `127.0.0.1:<some-number>` URL to your browser, and tada!
Loading