Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki Canary should have a better way to be suspended #1700

Closed
shokada opened this issue Feb 14, 2020 · 3 comments · Fixed by #1891
Closed

Loki Canary should have a better way to be suspended #1700

shokada opened this issue Feb 14, 2020 · 3 comments · Fixed by #1891
Assignees
Labels
keepalive An issue or PR that will be kept alive and never marked as stale.

Comments

@shokada
Copy link
Contributor

shokada commented Feb 14, 2020

For details, please refer to #1695

@slim-bean slim-bean added the keepalive An issue or PR that will be kept alive and never marked as stale. label Feb 21, 2020
@slim-bean
Copy link
Collaborator

Currently we have a script that iterates over the kubernetes contexts on your machine and sends a SIGINT to all the canaries, this is both slow and error prone depending on who runs it and what contexts they have access to:

#!/bin/bash

function sigint_all_canaries {
    for POD in $(kubectl --context=${1} --namespace=default get pod -l "name=loki-canary" -o name | sed 's/pod\///')
    do
        echo Suspending $POD in ${1}
        kubectl --context=${1} --namespace=default exec $POD -- kill -2 1
    done
}


function usage() {
    echo
    echo "$0: sends a SIGINT to all loki-canaries in a cluster causing them to suspend operations until the pod is restarted"
    echo
    echo "arguments:"
    echo "all              iterate through all locally defined CONTEXTS using the command: kubectl config get-contexts -o name"
    echo "showall          print the output of: kubectl config get-contexts -o name and exit"
    echo "[context_name]   specify an individual context"
}

[ -z "$1" ] && usage && exit


case ${1} in
all)
  for CONTEXT in $(kubectl config get-contexts -o name)
  do
	echo Suspending canaries in context: ${CONTEXT}
	sigint_all_canaries ${CONTEXT}
  done
  ;;
showall)
  kubectl config get-contexts -o name
  ;;
*)
  echo Suspending canaries in context: ${1}
  sigint_all_canaries ${1}
  ;;
esac

I'm not sure the best way to handle this, it would be nice to be able to do it from a centralized manner, even from Loki, but the challenge here is usually that when you want to shut down the canaries it's because Loki or its supporting infrastructure is in trouble and it would be hard to guarantee you could make this work.

I still wonder though if maybe we should have both promtail and the canaries have behaviors to backoff connections when receiving specific http response codes, or even have the canaries shutdown on a specific http response code?

@shokada
Copy link
Contributor Author

shokada commented Feb 26, 2020

In a kubernetes environment there is no easy way to deschedule a daemonset and killing a pod will restart it so that's why this exisits.

There seems to be a way to scale down the daemonset using the nodeSelector with any non-existing label.
https://stackoverflow.com/a/57533340

What do you think about this way?

@joe-elliott joe-elliott self-assigned this Apr 3, 2020
@joe-elliott
Copy link
Member

I'm proposing adding an http endpoint /suspend that indefinitely stops all canary processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants