Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscription "hints" #37

Merged
merged 24 commits into from
Jul 26, 2023
Merged

Subscription "hints" #37

merged 24 commits into from
Jul 26, 2023

Conversation

ryannedolan
Copy link
Collaborator

@ryannedolan ryannedolan commented Jun 26, 2023

Added Subscription.spec.hints field, which can be used to tweak sink resources, i.e. any resources created downstream of the SQL job (e.g. an output Kafka topic). Hints are ignored by the SQL engine.

Added SubscriptionEnvironment, which exposes hints to resource templates. While I was here, I also exposed the computed Avro schema and changed name and namespace to pipeline.name and pipeline.namespace. This should be less confusing than having "name" in templates, since "name" is very overloaded in that context.

Added support for default values in template expressions via{{property : defaultValue}}. This was required in order to support hints, since hints are usually undefined.

Changed the KafkaTopic template to now use a numPartitions hint. This means you can change the number of partitions in a pipeline by specifying Subscription.spec.hints.numPartitions. This gets carried through to the KafkaTopic controller, which will add partitions if necessary.

Details

The idea behind "hints" is that we need to be able to control low-level aspects of the data plane, but we also don't want to surface this complexity to the SQL layer. This is a unique aspect of Hoptimator -- the SQL is very generic, and purposefully avoids anything that may be considered an "implementation detail". One way this goal manifests is that we do not support a WITH(...) clause, unlike, say, Flink SQL or KSQLdb. Instead, we can give "hints" to the control plane out-of-band in the Subscription object.

As the name implies, hints are not specifications. The underlying controllers may ignore hints, especially if they are impossible to satisfy.

Testing Done

To verify the new default values mechanism, we can have the CLI generate YAML for a pipeline:

> !yaml insert into RAWKAFKA."test-sink" SELECT AGE AS PAYLOAD, NAME AS KEY FROM DATAGEN.PERSON
...

kind: FlinkDeployment
metadata:
  name: {{pipeline.name}}-flink-job
  namespace: {{pipeline.namespace}}
spec:
--->%---
    taskmanager.numberOfTaskSlots: "1"
  serviceAccount: flink
  jobManager:
    resource:
      memory: "2048m"
      cpu: .1
  taskManager:
    resource:
      memory: "2048m"
      cpu: .1
--->%---
apiVersion: hoptimator.linkedin.com/v1alpha1
kind: KafkaTopic
metadata:
  name: {{pipeline.name}}-kafka-topic-9d0eb3f4
  namespace: {{pipeline.namespace}}
spec:
  topicName: test-sink
  numPartitions: null
--->%---

As expected, pipeline.name and pipeline.namespace are not expanded, as they are missing from the environment. (There is no Subscription being created here.)

OTOH, we see the operator fills these in:

--->%---
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: products-flink-job
  namespace: default
--->%---

If we deploy a Subscription with hints:

apiVersion: hoptimator.linkedin.com/v1alpha1
kind: Subscription
metadata:
  name: products
spec:
  sql: SELECT "quantity", "product_id" AS KEY FROM INVENTORY."products_on_hand"
  database: RAWKAFKA
  hints:
    numPartitions: "2".  # <--- added hint here

... we see the operator generates appropriate YAML:

kind: KafkaTopic
metadata:
  name: products-kafka-topic-477cf338
  namespace: default
spec:
  topicName: products
  numPartitions: 2
--->%---

Most importantly, we see the KafkaTopic controller reacts to the change in numPartitions:

Desired partitions 2 > actual partitions 1. Creating additional partitions.

@ryannedolan ryannedolan requested review from hshukla and vmaheshw June 26, 2023 16:23
@ryannedolan ryannedolan requested review from ehoner and jogrogan July 20, 2023 16:49
Base automatically changed from sink-api to main July 21, 2023 21:56
@ryannedolan ryannedolan merged commit 503ba7c into main Jul 26, 2023
@ryannedolan ryannedolan deleted the hints branch July 26, 2023 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants