Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Tekton DSL based on Starlark #185

Closed
imjasonh opened this issue Aug 2, 2019 · 24 comments
Closed

RFC: Tekton DSL based on Starlark #185

imjasonh opened this issue Aug 2, 2019 · 24 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@imjasonh
Copy link
Member

imjasonh commented Aug 2, 2019

Problems

  • Tekton's API is too complex and requires too many lines of too-deeply-indented YAML to define a complex real-world workflow.
    • Some platforms like JenkinsX provide their own DSL that wraps Tekton's API; in JX's case that DSL is also YAML.
    • The problem of verbose and brittle YAML exists in all of Kubernetes, but Tekton is especially sensitive to it because its object model is so complex and so many resources are needed to define a Pipeline.
    • Tekton's variable interpolation feature, which is used to tie together Tekton resources, is non-standard among Kubernetes technologies.
  • Tekton's YAML-based config complexity hinders adoption, especially for simple workflows (new users)

Proposal

  • Adopt a config language to configure Tekton that's higher-level than YAML.
    • A user should still be able to configure Tekton with YAML, if they want.
  • Support this config language as first-class in the tkn CLI, and in future triggering workflows.
    • A user should be able to define a trigger that interprets and invokes a config file, or YAML.
    • The CLI should be capable of translating the config to YAML, and possibly vice versa.

Caveats

  • This exploration proposes Google's Starlark config language, which has a thorough spec, underlies the https://bazel.build build tool, and has well-supported Go bindings. This config language has been in use at Google in some form for almost a decade.
    • But I'm not prescribing it, if there's a better alternative. Your feedback is welcome!
    • This is really about proposing some improvement over YAML configs, and Starlark is one specific option.
  • This is all a strawman. Many details are elided for illustration purposes.
  • This analysis is incomplete. We'll learn a lot more about the benefits and pitfalls of a config language as we build it and get user feedback. We might decide it's not worth doing after all!

Simple example

Let's see some code!

This config describes a simple TaskRun that executes a single step:

# taskrun.cfg

def main(ctx):
  return [tekton.TaskRun(
    name = 'my-task-run',
    spec = tekton.TaskRunSpec(
      taskSpec = tekton.TaskSpec(
        steps = [tekton.Step(
          image = "ubuntu",
          command = ["echo", "hello", "tekton"],
        )]),
    ))]

CLI usage

$ tkn config apply -f taskrun.cfg

This interprets the config, generates YAML, and kubectl-applies it

  • (...by literally shelling out to kubectl, which it assumes you've installed)
  • This could also make direct API requests, but being able to generate YAML has compat benefits.
$ tkn config export -f taskrun.cfg > taskrun.yaml

This interprets the config and emits generated YAML to stdout

  • This can be useful for generating release YAMLs for users who want to consume YAML, and for debugging config interpreter behavior.
$ tkn config import -f taskrun.yaml > taskrun.cfg

This reverses the export process, converting YAML into the equivalent unoptimized config.

  • This can be useful for migrating an existing YAML config to the DSL.
  • The YAML must only define types that Tekton knows about (Tekton's CRDs, basic K8s resources, etc.)

What did that buy us?

Not a lot really! The .cfg version isn't any more readable than YAML, and is quite a bit more verbose! :(

Just converting YAML to this DSL has some small benefits:

  • tkn config can perform type checking, but so does the K8s API server (though slower). We can validate Tekton resources, but the webhook controller will do that too (though slower).
  • Error messages can include line-and-column information about the error, instead of the kinds of errors you'd get from K8s (.foo.bar[2].baz should be a string)
  • Existing tooling can auto-format .cfg files easily
    • Existing IDE tooling can syntax-highlight, format-on-save, etc.
    • Future IDE tooling can autocomplete, aid in refactoring, validate-on-save, etc.
  • It's not YAML.
    • YAML has meaningful whitespace, confusing indentation levels, inconsistent styling, can't be easily autoformatted, and more.

But wait, there's more!

Starlark gives us a Python-esque language, with many of the features of a Real Programming Language, like:

Helper Methods

def step(greeting="hello"): # with defaults!
  return tekton.Step(
    image = "ubuntu",
    command = ["echo", greeting],
  )

tekton.TaskRun(
  name = 'my-task-run',
  spec = tekton.TaskRunSpec(
    taskSpec = tekton.TaskSpec(
      steps = [
        step("hello"),
        step("hola"),
        step("bonjour"),
      ],
    ),
  ),
)

This would be ~3x more lines in YAML, and harder to read.

Starlark comes bundled with a number of built-in methods for things like string manipulation (e.g., join, replace, format), list manipulation (any, all, enumerate, sorted), and more.

Users can define and share their own libraries of helper methods, including other shared files in their repository.

Tekton itself could provide a library of helper methods to make common use cases simpler to define.

Comprehensions

tekton.TaskRun(
  name = 'my-task-run',
  spec = tekton.TaskRunSpec(
    taskSpec = tekton.TaskSpec(
      steps = [tekton.Step(
        image = "ubuntu",
        command = ["echo", "step", x],
      ) for x in range(10)],
    ),
  ),
)

This would be ~10x more lines in YAML, and harder to read.

Conditional Expressions

tekton.TaskRun(
  name = 'my-task-run',
  spec = tekton.TaskRunSpec(
    taskSpec = tekton.TaskSpec(
      steps = [tekton.Step(
        image = "ubuntu",
        command = ["echo", "step", x],
      ) for x in range(10) if x != 7],
    ),
  ),
)

NB: This would serve a different purpose to the conditionals work being added to Pipelines. Config conditionals would only have the context available to it at config-interpretation-time (possibly including triggering context), while conditionals would have runtime context available (e.g., whether the previous step failed).

Variables

namespace = "my-cool-namespace"
step = tekton.Step(
  image = "ubuntu",
  command = ["echo", "hello"]
)

return [tekton.TaskRun(
  name = "my-task-run",
  namespace = namespace,
  spec = tekton.TaskRunSpec(
    taskSpec = tekton.TaskSpec(steps = [step]),
  ),
), tekton.TaskRun(
  name = "another-task-run",
  namespace = namespace,
  spec = tekton.TaskRunSpec(
    taskSpec = tekton.TaskSpec(steps = [step]),
  ),
)]

This reduces lines of code and improves readability by aiding reuse.

This conceivably elides the need for variable interpolation in Tekton resource strings, if we're careful about how we expose Resource information in configs. It's possible that interpolation will still be necessary in YAML, but won't be necessary (and therefore should be disallowed) in the config language. More experimentation is needed.

Execution Context

The top-level main function takes a Context, which Tekton can populate with any information available at the time of interpretation. This could include information about the cluster in which Tekton is running, values passed from the CLI, and in the case of triggered resource creation, information about the event that triggered it.

tekton.TaskRun(
  name = "my-task-run",
  spec = tekton.TaskRunSpec(
    taskSpec = tekton.TaskSpec(steps = [tekton.Step(
      image = "ubuntu",
      command = ["echo", "triggered" if ctx.trigger else "manual"]
    )]),
  )

Debugging

Starlark includes a builtin print function which can be helpful to users debugging config interpretation. Users running tkn config export my.cfg will see these print statements in stderr, so they don't alter the output YAML.

For configs that are interpreted during triggering, Tekton will need some way to record and surface those debug logs to users.

Applicability Outside Tekton

There's nothing terribly Tekton-specific about anything above. Really at its core we're just generating builtin Starlark objects from Go structs or protobuf messages and interpreting some scripts, and printing its output as YAML. This same work could be applied to Kubernetes resources of all kinds, or YAML of all kinds, or non-YAML outputs, and so on.

I do think there's a specific need in Tekton for a simpler config format than YAML, so we should start by addressing Tekton's needs first, then see if we can extend the approach for general Kubernetes resources.

@siamaksade
Copy link

The proposal is valid and definitely required IMHO for Tekton to reach wider adoption. Interacting with Tekton using K8s YAML objects is too verbose and cumbersome. Tooling can assist to an extent but there are limitations to that.

However I still see benefits in providing a more compact YAML format instead. That's the approach taken by Jenkins X and frankly majority of other CI solution (GitLab, Travis CI, Circle CI, Azure Pipelines, BitBucket Pipelines, Google Build, etc) out there which has created a strong muscle memory for users. Would the benefits you described be strong enough for Tekton users to justify deviating from that muscle memory?

This also reminds me of one of the downsides of Groovy syntax in Jenkins pipelines based on what we heard from our customers. That it made it more difficult for them to reason about what a pipeline does by looking at its definition due to the higher level abstractions, loops, etc.

@imjasonh
Copy link
Member Author

imjasonh commented Aug 9, 2019

Yeah that's a fair question.

First: I believe that Tekton's API syntax today is roughly as simple as it can get for the level of flexibility it provides, and the extent of functionality it lets you describe. If this isn't true today, we should work to make it true, and probably sooner rather than later since this complexity hurts adoption, with or without a DSL. If there are specific examples of unnecessary complexity please file bugs so we can fix them.

If we assume Tekton will have a YAML API (and it will; this is Kubernetes we're talking about), and we consider this the lower-level "plumbing" API, then there's an opportunity for a higher-level "porcelean" API in addition to it, that is capable of driving the lower-level layer -- in this case, generating the lower-level API described by the higher-level API.

I think it would be more confusing to users to have two levels of the API, and to have both of them be YAML. Jenkins X avoids this by completely hiding the low-level Tekton API and replacing it with their own higher-level YAML. That's perfectly fine for JX, but Tekton itself can't do this since expert users (and providers like JX!) still need that low-level API for debugging and maximum flexibility. Other providers only have the high-level YAML API, which presumably compiles to some internal format to actually execute it.

I admit I don't know anything about Jenkins' use of Groovy, and I'd love to learn more from that experience. Was the Groovy the "only" API for defining the pipeline? Or did it let you generate some lower-level config that could be cracked open and debugged or manually edited if the user needed to?

Basically, I think there's a nice sweet spot here for providing a higher-level config mode -- in some non-YAML language -- that generates the lower-level YAML one. Users can check in both the high-level and generated low-level configs, with presubmit checks to ensure they're in sync, so they can see diffs in both config flavors.

Happy to chat more, and thanks for your input!

@vtereso
Copy link

vtereso commented Sep 9, 2019

# taskrun.cfg

def main(ctx):
  return [tekton.TaskRun(
    name = 'my-task-run',
    spec = tekton.TaskRunSpec(
      taskSpec = tekton.TaskSpec(
        steps = [tekton.Step(
          image = "ubuntu",
          command = ["echo", "hello", "tekton"],
        )]),
    ))]

This and the other examples seems very reminiscent of the test builders. I think your proposal point:

Support this config language as first-class in the tkn CLI

is very interesting. Should some tool be introduced to make working with Tekton easier, what does that mean for the state of the catalog? Would the catalog support both? Only one?

@vtereso
Copy link

vtereso commented Sep 9, 2019

Tekton's variable interpolation feature, which is used to tie together Tekton resources, is non-standard among Kubernetes technologies

Starlark comes bundled with a number of built-in methods for things like string manipulation (e.g., join, replace, format), list manipulation (any, all, enumerate, sorted), and more.
...
This reduces lines of code and improves readability by aiding reuse.
This conceivably elides the need for variable interpolation in Tekton resource strings

If interpolation were not necessary, this seems to imply that the catalog would likely be a Task only repo?

@imjasonh
Copy link
Member Author

imjasonh commented Sep 9, 2019

If we supported this DSL as a first-class feature of Tekton, resources in the catalog could be defined as YAML and installed using kubectl apply, or defined in the DSL and installed using tkn task create -f whatever.dsl (which would interpret the DSL into a K8s API request, and apply it)

Ultimately the DSL should just be another way to express the same underlying resources and send them to the cluster, and I think it's critical that it be able to "compile down" the high-level format down to the "low-level" YAML, for debuggability, compatibility, etc. Whether the Task/Pipeline/etc was originally defined using the DSL should be irrelevant once it's installed on the cluster.

I'm not sure I understand your question in #185 (comment) -- the intention of the catalog is to include common Tasks, Pipelines and Resources (once those are extensible)

@vtereso
Copy link

vtereso commented Sep 9, 2019

I was trying to envision what you meant by "elides the need for variable interpolation in Tekton resource strings" and incorrectly thought that Tasks were the only resource that could omit interpolation, but they also use it as current. Can you describe this point more?

@imjasonh
Copy link
Member Author

I haven't thought it all the way through yet, but I was imagining you could do something along the lines of:

# Task that has an image output resource
t1 = Task(
  name='build-image',
  ...
)

t2 = Task(
  name = 'deploy-image',
  inputs = [t1.image],
  steps = [Step(
    command = 'build image ' + t1.image.url,
  )],
  ...
)

This would validate that the t1's output resource is of the same type as t2's input resource, then generate YAML that wires them up together like you'd expect, including inserting the $(placeholder) string in t2's configuration.

If you only ever dealt with the DSL configuration, you'd never need to learn or know about the interpolation syntax yourself. But since it generates the YAML with placeholders, and that YAML is the source-of-truth, you can always fall back to that lower level of abstraction if you need to.

This is just a strawman, but I think it should be possible if we design it carefully.

@bparees
Copy link

bparees commented Sep 11, 2019

As you note, this is a problem shared by k8s. K8s has never addressed it by introducing a first-class DSL or tooling, it has allowed an ecosystem of tools to evolve, as well as people using generic templating tools to generate json/yaml. (Keeping said tools up to date w/ evolving apis is of course a challenge, depending how much logic they include).

Is there a reason a similar solution isn't appropriate/viable here? i.e. external/standalone tools that can generate tekton yaml rather than an embedded first-class choice? (And perhaps if a particular tool evolves to be universally adopted by the community, then bringing it into the tekton cli as a first class capability?)

@imjasonh
Copy link
Member Author

Yeah that's a fair question. Kubernetes is arguably more complex than Tekton, and it doesn't have a built-in opinionated YAML-generating DSL -- so why should Tekton?

I think Tekton does a few things that are fairly uncommon in "vanilla" Kubernetes, which the lack of a higher-level DSL makes acutely painful. Tekton relies heavily on variable interpolation to be able to pass context from one Task to another, and this is necessary in nearly every real-world use case to be able to accomplish even simple tasks. This data isn't known at the time the pipeline is configured (e.g., test success status, built image digest, etc.), and is only derived once the pipeline is running.

In Kubernetes, I don't think this is the case to nearly the same extent. Users create Deployments, put them behind Services, and hook those up to an Ingress, and so on, but ultimately each of those configurations are configurable on their own ahead of time, without much data needing to be passed between them -- a Service doesn't need to be configured with details of its Deployment's Pods derived at runtime, it just relies on label selectors. Kubernetes is actually quite a bit simpler than Tekton in this way.

I think if you used Starlark to define normal Kubernetes resources, you'd probably mostly end up using loops and variables for DRY-ing (which could be helpful...), but you wouldn't use things like those described in #185 (comment) to pass dynamic data around.

Users could use any of Kubernetes' many YAML-templating tools to generate Tekton configs, but they'd lack the facilities to make data passing easier, so they'd ultimately only be solving the easier part of the problem.

It's possible that Tekton shouldn't bundle a DSL, and should rely on the community to build some options, and only then maybe choose one to support as first-class. I don't see any other DSLs solving Tekton's specific problems out in the community though, and I think it would take a fairly deeply involved Tekton user (or contributor) to design one. So it's a bit of a chicken-and-egg problem -- adoption is hurt because of YAML's complex usability, so nobody gains enough experience to build a more usable DSL. I think it will take Tekton itself focusing on this problem to solve it.

@vtereso
Copy link

vtereso commented Sep 12, 2019

@imjasonh To your point about Tekton using variable interpolation and with consideration for this issue I raised regarding PipelineResources, it is theoretically possible to remove most (if not all) interpolation from core Tekton, where the aforementioned tooling could be sufficient should the yaml definitions still be cumbersome.

Without a DSL and as current, Task and Pipeline authoring is necessary where the CLI could fill the gaps in between for run creation . In the future, if there are wrappers for creating/handling run Triggers, then Task/Pipeline yaml authoring would remain the only two concerns.

It is my own preference to have as little indirection as possible to discern what is truly going in general, where hopefully Tekton yaml is not the inhibitor of adoption. Although a DSL does consolidate the resource definitions, I feel that this has shifted the adoption concern from understanding the API format to somewhat understanding the API format and this python (or any other) DSL. Perhaps this tooling would be used by a subset of people that much prefer this, but in general I don't know if this entirely resolves the issue. However, aside from upkeep, I can only see benefits to providing a DSL as presented above.

@imjasonh
Copy link
Member Author

I feel that this has shifted the adoption concern from understanding the API format to somewhat understanding the API format and this python (or any other) DSL.

This is very good point. Ideally the solution is to improve and simplify the core API and object model, to such a point that a DSL isn't necessary at all. Indeed we should also strive to simplify the underlying object model no matter what, even if there is a DSL. Removing some of the nouns, and removing variable interpolation could help, but I don't know (possibly due to my own narrowmindedness) how that can be achieved.

@vtereso
Copy link

vtereso commented Sep 12, 2019

The way I envision removing interpolation is to implicitly pass all parameters (as they exist current) as arguments directly into the container. I don't know all the use cases, but there are likely only a few other notable areas that would require tailored overrides like image/image tag, which could be other new inputs introduced to steps. Even if PipelineResources were to exist in their current form, there has also been some ideas about implied creation were this should be possible? This could potentially clean up the definitions.

@ahpook
Copy link

ahpook commented Sep 23, 2019

Just wanted to chime in with support for higher order DSL. I think it's best to get the variable interpolation out of the data format and into an appropriately sophisticated language, because it is a very slippery slope - as in the discussion at tektoncd/pipelines#850, once there's interpolation people start to need concatenation, regex, etc.

By way of example, Spinnaker have gone all the way down that slippery slope, via their Pipeline Expressions feature, which provides a large number of helper expressions and indeed the ability to embed arbitrary Java code anywhere you can use a plain text field.

Which is great for that project's philosophy, but I think Tekton's kubernetes lineage leads it towards wanting inputs that don't go through complex transformation steps before acting on them.

$.02 FWIW YMMV etc.

@imjasonh
Copy link
Member Author

Re: concatenation, regex; I completely agree. Params today are not powerful enough, and making them more powerful invites the need for a "real programming language", and YAML doesn't cut it.

I wonder whether tektoncd/pipeline#1344 (tektoncd/pipeline#781) allays this concern, since it enables steps to invoke simple scripts in some "real programming language" like Bash/Python/JavaScript given the values of interpolated variables, which gives you support for regex, et al.

steps:
- image: python
  env:
  - name: FOO
    value: $(inputs.params.foo)
  script: |
    #!/usr/bin/env python3
    import re
    foo = os.getenv('FOO')
    if re.match('[a-z]+', foo):
      print('yay it matches')

Script-mode has downsides of its own of course, you end up writing Python wrapped in YAML (nested meaningful whitespace!). By the time you split the code out into a separate file you've probably been experiencing the pain of having it inline for a while. But at least we're not inventing new languages embedded in YAML.

Does script-mode count as a DSL? I don't think so. But it improves intra-step programming. Now we just need to improve inter-step programming, and we're golden. :)

@ahpook
Copy link

ahpook commented Sep 25, 2019

Proving that there are no new problems in computer science, today I came across this bit in the Borg, Omega, and Kubernetes paper from a few years ago...

Of all the problems we have confronted, the ones over which the most brainpower, ink, and code have been spilled are related to managing configurations—the set of values supplied to applications, rather than hard-coded into them. In truth, we could have devoted this entire article to the subject and still have had more to say.
To cope with these kinds of requirements, configuration management systems tend to invent a domain-specific configuration language that (eventually) becomes Turing complete, starting from the desire to perform computation on the data in the configuration (e.g., to adjust the amount of memory to give a server as a function of the number of shards in the service). The result is the kind of inscrutable “configuration is code” that people were trying to avoid by eliminating hard-coded parameters in the application’s source code. It doesn’t reduce operational complexity or make the configurations easier to debug or change; it just moves the computations from a real programming language to a domain-specific one, which typically has weaker development tools such as debuggers and unit test frameworks. We believe the most effective approach is to accept this need, embrace the inevitability of programmatic configuration, and maintain a clean separation between computation and data.

¯\(ツ)

@imjasonh
Copy link
Member Author

Thanks I hadn't seen that before, it's definitely relevant! 🤣

At the risk of spilling even more ink over this well-inked topic, I think this can be interpreted to mean: if you choose to pursue a DSL, learn from our mistakes and make sure it's a "real language" with "real development tools" (debuggers, test frameworks), and acknowledge that you'll end up with a Turing-complete language eventually, so probably just start there.

If you accept that interpretation (you don't have to, it's a stretch!) then I think Starlark is a good option -- or something like it. It has a testing framework of a sort, and is similar to "real" Python, albeit stripped down for simplicity.

Also of note, the goal of this DSL in my mind is not to replace static YAML config, but to make it easier to generate, so that it's entirely optional in the ecosystem -- YAML would remain the source of truth. I believe this is one way to achieve "clean separation between config and code", or at least enable it.

Definitely food for thought though. Now I'm going to go read the rest of that paper. 🤔

@patcito
Copy link

patcito commented Dec 1, 2019

Is this RFC going to move forward? Looks good to me. I would also like to point out in case you missed it that drone.io is using Starlark for their DSL as an alternative to yaml.

https://docs.drone.io/starlark/overview/

@imjasonh
Copy link
Member Author

imjasonh commented Dec 7, 2019

There have been a few prototypes demoed in WG meetings, but I don't know of any more official experiments. I think there's broad consensus that some DSL (YAML or otherwise) would be useful. If you are reading this and have ideas feel free to share them. :)

@TristanCacqueray
Copy link

TristanCacqueray commented Jan 29, 2020

Dhall is another configuration language which already has strong Kubernetes bindings. It actually advertises absence of turing completeness : http://www.haskellforall.com/2020/01/why-dhall-advertises-absence-of-turing.html

FWIW I was able to generate the Tekton resources types from the go source files, and here is a prototype: https://github.com/TristanCacqueray/dhall-tekton

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2020
@tekton-robot
Copy link

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 13, 2020
@ChrisJBurns
Copy link

ChrisJBurns commented Dec 20, 2022

Has there been any update/progress with this? I'm somewhat of the opinion that Tekton won't achieve wide adoption until a simple DSL has been achieved.

Whether it's Starlark, or whether it's a more simpler yaml that gets translated by the tektoncli and applied into the Cluster (although this means developers have access to cluster, which I'd argue isn't desirable), unless the developers that use it are able to actually learn it easily (in combination with the thousand other things they need to learn nowadays) I don't really think it's going to get wide adoption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

9 participants