Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s cronjob example potentially problematic #159

Closed
bentasker opened this issue Aug 18, 2024 · 4 comments
Closed

K8s cronjob example potentially problematic #159

bentasker opened this issue Aug 18, 2024 · 4 comments

Comments

@bentasker
Copy link
Contributor

bentasker commented Aug 18, 2024

The example k8s cronjob embeds secrets (the token) into the YAML.

That's generally considered poor security hygiene as there's a non-zero risk of someone then committing/uploading that to something public etc.

Instead, a k8s secret should be created in order to then expose the token as an environment variable (this also avoids anything which might log the commandline from including the sensitive value).

For example, my quick setup looks like this:

Create the secret

kubectl create secret generic fedifetcher \
--from-literal=server_domain=mastodon.bentasker.co.uk \
--from-literal=token="<token>"

Define the cronjob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: fedifetcher
spec:
  schedule: "*/15 * * * *"
  failedJobsHistoryLimit: 5
  successfulJobsHistoryLimit: 5
 concurrencyPolicy: Forbid
  jobTemplate:
    spec:
        template:
            spec:
                restartPolicy: Never
                containers:
                - name: fedifetcher
                  image: ghcr.io/nanos/fedifetcher:v7.1.6
                  imagePullPolicy: IfNotPresent
                  env:
                    - name: FF_HOME_TIMELINE_LENGTH
                      value: "200"
                    - name: FF_MAX_FOLLOWERS
                      value: "10"
                    # Optional, uncomment this and the mounts 
                    # if you want to persist state and
                    # lockfiles
                    # - name: FF_STATE_DIR
                    #   value: "/data/"
                    - name: FF_SERVER
                      valueFrom: 
                        secretKeyRef:
                            name: fedifetcher
                            key: server_domain
                            optional: false
                    - name: FF_ACCESS_TOKEN
                      valueFrom: 
                         secretKeyRef:
                            name: fedifetcher
                            key: token
                            optional: false
               # Uncomment these
               # if you want persistent state
               # You'll also need to define your PVC
               #  volumeMounts:
               #     - mountPath: /data
               #       name: fedifetcher
               #       readOnly: false 
               # volumes:
               #   - name: fedifetcher               
               #     persistentVolumeClaim:
               #         claimName: fedifetcher
@nanos
Copy link
Owner

nanos commented Aug 18, 2024

Thanks. As I have 0 experience with k8, I depend on others providing these. So thank you!

quick question: any particular reason you are requesting ghcr.io/nanos/fedifetcher:v7.1.6 rather than latest? Or is it just the general‘I want to tag a specific version’ sentiment?

also, I’d go so far as to say you absolutely need to have a persistent state directory. Otherwise you are not benefiting from caching or rate limiting and will therefore waste a lot of resources.

@bentasker
Copy link
Contributor Author

Or is it just the general‘I want to tag a specific version’ sentiment?

Yep, pretty much - using latest isn't something you should really do in production especially with ephemeral pods (like those used for a CronJob) which might end up getting scheduled on any one of multiple different nodes.

With a specific version tag, I know exactly what will have run, so

  • If there's breakage, I know it's not because you've pushed a new version (deliberately or by accident)
  • I can do a canaried rollout at upgrade time
  • If something breaks after that upgrade I can rollback much more easily (assuming the software supports it)

If you're really unlucky, you can end up with pods running different versions too - the default imagePullPolicy is ifNotPresent: so if you deploy a new node, the new node (which won't yet have the image) might end up pulling a newer :latest than the existing image on a node that's been around a while. Which version you run will depend which node the pod spins up on (or worse, if you've multiple replicas, you might run both simultaneously)

It's almost certainly overkill for this, but it's a habit that's pretty heavily ingrained from managing infra.

also, I’d go so far as to say you absolutely need to have a persistent state directory. Otherwise you are not benefiting from caching or rate limiting and will therefore waste a lot of resources.

Yep, that was my thinking too - I left it as an option partly because it's optional for Github actions, but also because it means that a copy and paste of the YAML should just work. Once you start adding PVCs that tends to stop being the case (because people use different backends for their PVs)

@nanos
Copy link
Owner

nanos commented Sep 2, 2024

@bentasker Does it still need this stuff at the top?

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fedifetcher-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 100Mi
---

@bentasker
Copy link
Contributor Author

It depends on whether the operator wants to use a PVC to keep persistent state. For example, I quite often use NFS instead which doesn't require one.

I tend to leave them out of examples on the basis that anyone using them will probably know to set it up.

nanos pushed a commit that referenced this issue Sep 3, 2024
nanos added a commit that referenced this issue Sep 3, 2024
@nanos nanos closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants