Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with "$$" in Script blocks #3888

Merged
merged 1 commit into from May 6, 2021
Merged

Fix issue with "$$" in Script blocks #3888

merged 1 commit into from May 6, 2021

Conversation

ghost
Copy link

@ghost ghost commented Apr 15, 2021

Changes

Fix issue with "$$" in Script blocks

Prior to this commit Steps with Scripts that include twodollar signs resulted in only one dollar sign ending upin the final executed script. This is because the scriptis passed into an init container via its "args" field andKubernetes replaces instances of the literal string"$$" with "$". This is a documented behaviour of the"args" field (sort of) but it doesn't make sense to havethose replacements happen for our "script" blocks. "$$"in bash scripts is the current process id so replacing thischaracter sequence can be problematic.

This commit replaces instances of two dollar signs in a scriptwith four dollar signs. Kubernetes then converts these backto two dollar signs. This replacement behaviour only existsfor dollar symbols so hard-coding this replacement feels likean acceptable workaround that will have the least impacton backwards compatibility.

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Fix a bug where the literal characters "$$" in a Step's script block would be replaced with a single "$".

/kind bug

@tekton-robot tekton-robot added kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 15, 2021
@tekton-robot tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 15, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 87.4% 87.5% 0.1

@ghost
Copy link
Author

ghost commented Apr 15, 2021

TODO:

  • Add a test specifically for "$$" in script. Might have to be an integration test? Edit: turned out an example works just fine comparing $$ against unicode codepoints in python.
  • Document size limit of annotations and therefore scripts.

@imjasonh
Copy link
Member

First of all:

Further to this change I've also removed the random suffixes
that used to appear on creds-init volumes. These didn't
serve any purpose that couldn't be equally well-served
with an incrementing integer. The benefit of doing it this
way is that our unit tests become less brittle - those
random suffixes were a source of considerable churn whenever
other volumes were added or removed to test data because
the random pool of suffixes was shared with those other
volumes.

👏👏👏 I love it.

There's even already a test that incidentally exercises it reaching "tekton-creds-init-home-1", what's not to love.


Is there a limitation on the max size of an annotation, such that users who previously had a working script might be broken when they upgrade Tekton to pick up this new terrible hack ingenious solution?

kubernetes/kubernetes@b501cc7 (now here) makes me think this is 256KB but I don't see any other official documentation. That seems to be the max size of all annotation keys and values combined, not just one specific annotation.

Granted, it probably wasn't a good idea to have hundreds of lines of bash in your YAML anyway (or anywhere for that matter! 🙃 ), but it's probably worth understanding and documenting and including in release notes just in case.


Can we include a YAML test that exercises this newly functional behavior, that would have failed without this change?

@ghost
Copy link
Author

ghost commented Apr 15, 2021

Is there a limitation on the max size of an annotation, such that users who previously had a working script might be broken when they upgrade Tekton to pick up this new terrible hack ingenious solution?

kubernetes/kubernetes@b501cc7 (now here) makes me think this is 256KB but I don't see any other official documentation. That seems to be the max size of all annotation keys and values combined, not just one specific annotation.

Great find. My guess here is that previously we were limited by ARG_MAX which varies from OS to OS. In an alpine:latest container I get 131072 but on a debian desktop I get a whopping 2097152. I'll document the annotations size limit but I'm also going to sleep on it and see if there's another way to tackle it. I wish Kubernetes exposed a "literal" volume type on to which I could just write base64-encoded strings to appear in an emptyDir. We could emulate something similar to that with short-lived ConfigMaps that hold the scripts but... egh.

Edit: we were building the script with heredoc before so maybe there were no limits? But we were inside an argument to sh -c so... I don't really know at this point. 🤷

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 87.4% 87.5% 0.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 87.4% 87.5% 0.1

@ghost
Copy link
Author

ghost commented Apr 16, 2021

I've added to the existing script example and confirmed that it explodes nicely prior to this commit:

λ k create -f ./examples/v1beta1/taskruns/step-script.yaml
taskrun.tekton.dev/step-script-djp6g created
                                   
λ tkn tr logs -f step-script-djp6g
[noshebang] no shebang
[noshebang] + echo no shebang

# ... snip logs ...

[double-dollar-allowed] double dollar signs ($) are not passed through as expected :(                                                 
                  
container step-double-dollar-allowed has failed

Also updated release note to reflect 256kB limit.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 87.4% 87.5% 0.1

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 87.4% 87.5% 0.1

@ghost ghost removed the kind/bug Categorizes issue or PR as related to a bug. label Apr 16, 2021
@ghost
Copy link
Author

ghost commented Apr 16, 2021

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 16, 2021
@ghost ghost added this to the Pipelines 0.24 milestone Apr 20, 2021
@afrittoli
Copy link
Member

/test check-pr-has-kind-label

@ghost
Copy link
Author

ghost commented Apr 20, 2021

TODO:

  • Get a measure of the size limit on scripts prior to this change when we are passing via sh -c.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2021
@ghost
Copy link
Author

ghost commented Apr 26, 2021

I went back and tested the existing place-scripts solution. It stops working right around 130,867 bytes. When it hits this limit we see an error:

standard_init_linux.go:219: exec user process caused: argument list too long

@ghost
Copy link
Author

ghost commented Apr 26, 2021

Using the annotations approach a single script field can go up to around 260,000 bytes. The trade-off is that this pool of bytes is shared amongst all the script blocks in your Task. And, of course, anything else that wants to write to the Pod's annotations.

Here's what the failure condition looks like for the annotations approach when you use too many bytes:

  conditions:
  - lastTransitionTime: "2021-04-26T19:20:52Z"
    message: 'failed to create task run pod "script-262144-tm26f": Pod "script-262144-tm26f-pod-vtrz7"
      is invalid: metadata.annotations: Too long: must have at most 262144 bytes.
      Maybe invalid TaskSpec'
    reason: CouldntGetTask

@ghost
Copy link
Author

ghost commented Apr 26, 2021

@imjasonh looking at these two error conditions and size limits, do you feel strongly one way or the other about moving ahead on this PR? I think I've convinced myself at this point that it's OK to take the new annotations-based approach. Anyone with scripts this massive should probably bake them in to a container image rather than inflate a Task with them.

@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 26, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/pod.go 87.4% 87.5% 0.1

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we might want to call this out as a potentially backwards incompatible change in our release notes (technically id say fixing any bug is a backwards incompatible change X'D but that's maybe a convo for another day) - specifically b/c some folks may have run into this and worked around it (and i wonder if it's possible that using args the way we were may have been doing some other replacements? or maybe $$ was definitely the only one - sounds like that might be the case...)

or maybe $$ was definitely the only one - sounds like that might be the case...

if this IS really only an issue with $$, what do you think about the alternative where we special case that? i.e. we look for $$ in the script block and replace it with $$$$? that seems kind hacky but if this really is the only sequence that was being replaced, it seems like it avoids adding the complexity of the annotation + downward API solution

runAsUser: 99
script: |
#!/usr/bin/env python3
if '$$' != '\u0024\u0024':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test :D

@ghost
Copy link
Author

ghost commented Apr 28, 2021

i.e. we look for $$ in the script block and replace it with $$$$? that seems kind hacky but if this really is the only sequence that was being replaced, it seems like it avoids adding the complexity of the annotation + downward API solution

Oooh yeah I like that much much more from a backwards-compatibility angle. Great point, will have a go at implementing it that way instead.

@ghost
Copy link
Author

ghost commented Apr 28, 2021

/hold

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2021
@MartinKanters
Copy link

@bobcatfish I'm pretty sure $$ is the only case. I personally thought it was a bug in Kubernetes itself and offered a fix, but they will probably choose not to follow it up because it will break existing clients. But as a result I dove in the code and found the expansion parts. See also the testcases for it.

Replacing all $$'s with $$$$'s might work just fine, it's another workaround, but a simpler one, indeed.

@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 29, 2021
@ghost
Copy link
Author

ghost commented Apr 29, 2021

/hold cancel

@bobcatfish that solves the immediate problem with much less blast radius. Thanks!

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2021
@ghost
Copy link
Author

ghost commented Apr 29, 2021

Moved the creds-init volume naming test fix over to its own pr: #3907

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than my nit pick about the security context looks great!!

if you DO get a chance it might be worth at least mentioning something about this change in the commit? (just let me know if you did and im missing it....)

/approve

// on these instances in our args and Kubernetes reduces them back down
// to the expected number of dollar signs. This is a workaround for
// https://github.com/kubernetes/kubernetes/issues/101137
script = strings.Replace(script, "$$", "$$$$", -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wooooot :D

# are able to run scripts regardless of any specific
# Step's UID.
securityContext:
runAsUser: 2000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick but i dont wanna block anythign! are these changes related to the $$ change? if not i feel like it would be clearer to have them in another commit (and/or PR)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've removed these and moved them to a new PR: #3914

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 4, 2021
@ghost
Copy link
Author

ghost commented May 4, 2021

/test tekton-pipeline-unit-tests

Prior to this commit Steps with Scripts that include two
dollar signs resulted in only one dollar sign ending up
in the final executed script. This is because the script
is passed into an init container via its "args" field and
Kubernetes replaces instances of the literal string
"$$" with "$". This is a documented behaviour of the
"args" field (sort of) but it doesn't make sense to have
those replacements happen for our "script" blocks. "$$"
in bash scripts is the current process id so replacing this
character sequence can be problematic.

This commit replaces instances of two dollar signs in a script
with four dollar signs. Kubernetes then converts these back
to two dollar signs. This replacement behaviour only exists
for dollar symbols so hard-coding this replacement feels like
an acceptable workaround that will have the least impact
on backwards compatibility.
@ghost
Copy link
Author

ghost commented May 5, 2021

/test tekton-pipeline-unit-tests

@ghost
Copy link
Author

ghost commented May 5, 2021

/test pull-tekton-pipeline-integration-tests

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this and nice test!
It's good to fix this asap before too many people starts relying on the broken behaviour :)
/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label May 6, 2021
@tekton-robot tekton-robot merged commit 9a9f896 into tektoncd:main May 6, 2021
tekton-robot pushed a commit that referenced this pull request Jun 14, 2021
Kubernetes replaces instances of "$$" in container args fields with "$". This
can muck up the contents of script fields because scripts are passed into a
TaskRun Pod as an arg to an init container.

Prior to this commit we [tried to prevent the
replacement](#3888) from happening by:

1. putting scripts into annotations on a pod and projecting them using downward
API
    - con: the max size of the annotations map is capped to ~250kB. The
      aggregate size of all scripts in a single Task therefore becomes
      constrained by this. Any other systems using annotations will reduce the
      available headroom. Backwards incompatible.

2. replacing instances of "$$" in scripts with "$$$$" for k8s to then process
back to "$$"
    - con: k8s doesn't actually process _all_ instances of "$$". For example,
      if you write an arg with format "echo $(eval \$$foo)" then k8s will see
the first "$(", assume it's a variable reference, and pass it through verbatim.
So user's scripts with bash variable become broken by tekton's new replacement.
Backwards incompatible.

This commit takes a third approach, proposed by @MartinKanters, encoding
scripts as base64 in the controller and then having them decoded in the init
container. This bypasses Kubernetes' args processing completely because dollar
signs aren't used in base64 encodings. It also doesn't introduce a
backwards-incompatible limit to the aggregate script size. And it doesn't
mangle existing bash scripts with variable replacements.

The most noticeable trade-offs we now make are:

1. Tiny scripts can be up to 300% bigger, but as scripts get longer the max
increase gets closer to 133%.
2. Also the TaskRun's `initContainer` YAML is a bit less human readable:

```
    initContainers:
    - args:
      - -c
      - |
        tmpfile="/tekton/scripts/script-0-f8fmf"
        touch ${tmpfile} && chmod +x ${tmpfile}
        cat > ${tmpfile} << '_EOF_'
        IyEvYmluL3NoCnNldCAteGUKZWNobyAibm8gc2hlYmFuZyI=
        _EOF_
        /tekton/tools/entrypoint decode-script "${tmpfile}"
```

The entrypoint is extended to decode base64 files so that the `shellImage`
(which is used to write scripts to disk for Step containers) is not required
to package a `base64` binary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants