Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds k8s config options to Bytewax materialization engine #3518

Merged
merged 1 commit into from
Mar 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion docs/reference/batch-materialization/bytewax.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ To configure secrets, first create them using `kubectl`:
kubectl create secret generic -n bytewax aws-credentials --from-literal=aws-access-key-id='<access key id>' --from-literal=aws-secret-access-key='<secret access key>'
```

If your Docker registry requires authentication to store/pull containers, you can use this same approach to store your repository access credential and use when running the materialization engine.

Then configure them in the batch_engine section of `feature_store.yaml`:

``` yaml
Expand All @@ -40,6 +42,8 @@ batch_engine:
secretKeyRef:
name: aws-credentials
key: aws-secret-access-key
image_pull_secrets:
- docker-repository-access-secret
```

#### Configuration
Expand All @@ -51,9 +55,28 @@ batch_engine:
type: bytewax
namespace: bytewax
image: bytewax/bytewax-feast:latest
image_pull_secrets:
- my_container_secret
service_account_name: my-k8s-service-account
annotations:
# example annotation you might include if running on AWS EKS
iam.amazonaws.com/role: arn:aws:iam::<account number>:role/MyBytewaxPlatformRole
resources:
limits:
cpu: 1000m
memory: 2048Mi
requests:
cpu: 500m
memory: 1024Mi
```

The `namespace` configuration directive specifies which Kubernetes [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) jobs, services and configuration maps will be created in.
**Notes:**

* The `namespace` configuration directive specifies which Kubernetes [namespace](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/) jobs, services and configuration maps will be created in.
* The `image_pull_secrets` configuration directive specifies the pre-configured secret to use when pulling the image container from your registry
* The `service_account_name` specifies which Kubernetes service account to run the job under
* `annotations` allows you to include additional Kubernetes annotations to the job. This is particularly useful for IAM roles which grant the running pod access to cloud platform resources (for example).
* The `resources` configuration directive sets the standard Kubernetes [resource requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) for the job containers to utilise when materializing data.

#### Building a custom Bytewax Docker image

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,17 @@ class BytewaxMaterializationEngineConfig(FeastConfigBaseModel):
These environment variables can be used to reference Kubernetes secrets.
"""

image_pull_secrets: List[str] = []
""" (optional) The secrets to use when pulling the image to run for the materialization job """

resources: dict = {}
""" (optional) The resource requests and limits for the materialization containers """

service_account_name: StrictStr = ""
""" (optional) The service account name to use when running the job """

annotations: dict = {}
""" (optional) Annotations to apply to the job container. Useful for linking the service account to IAM roles, operational metadata, etc """

class BytewaxMaterializationEngine(BatchMaterializationEngine):
def __init__(
Expand Down Expand Up @@ -248,9 +259,14 @@ def _create_job_definition(self, job_id, namespace, pods, env):
"parallelism": pods,
"completionMode": "Indexed",
"template": {
"metadata": {
"annotations": self.batch_engine_config.annotations,
},
"spec": {
"restartPolicy": "Never",
"subdomain": f"dataflow-{job_id}",
"imagePullSecrets": self.batch_engine_config.image_pull_secrets,
"serviceAccountName": self.batch_engine_config.service_account_name,
"initContainers": [
{
"env": [
Expand Down Expand Up @@ -300,7 +316,7 @@ def _create_job_definition(self, job_id, namespace, pods, env):
"protocol": "TCP",
}
],
"resources": {},
"resources": self.batch_engine_config.resources,
"securityContext": {
"allowPrivilegeEscalation": False,
"capabilities": {
Expand Down