-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutating webhook times out when EventListener has multiple triggers #356
Comments
I was able to reproduce this issue with Tekton Triggers v0.1.0 installed on OpenShift after creating an EventListener with 37 triggers:
I wrote up this quick bash script to reproduce the error if anyone wants to try it out: #!/bin/bash
# Cleanup
rm eventlistener_*
# Define variables
max=50
if [ ! -z "$1" ]; then
max="$1"
fi
range=40
if [ ! -z "$2" ]; then
range="$2"
fi
min=$((max - range))
if [ $min -le 0 ]; then
min=1
fi
echo "Max triggers: $max"
echo "Min triggers: $min"
echo ""
# Generate EventListener files
cat << EOF >> "eventlistener_0.yaml"
apiVersion: tekton.dev/v1alpha1
kind: EventListener
metadata:
name: listener
spec:
serviceAccountName: tekton-triggers-example-sa
triggers:
EOF
for ((i=1; i<="$max"; i++)); do
prevFile="eventlistener_$((i - 1)).yaml"
newFile="eventlistener_${i}.yaml"
cp "$prevFile" "$newFile"
cat << EOF >> $newFile
- name: foo-trig
binding:
name: pipeline-binding
template:
name: pipeline-template
EOF
done
# Apply files
for ((i="$min"; i<="$max"; i++)); do
# Apply files and record how long it takes
echo "### $i triggers ###"
file="eventlistener_${i}.yaml"
startTime="$(date -u +%s)"
kubectl apply -f "$file"
endTime="$(date -u +%s)"
elapsed="$(($endTime - $startTime))"
echo "${elapsed}s"
echo ""
done
Next I'll see at what point I can reproduce the error with Triggers v0.2.0 Update:
The validation in v0.2.0 is a lot simpler than in v0.1.0. I think that the timeout error in v0.1.0 is probably coming from us querying the Kubernetes clientset a number of times: https://github.com/tektoncd/triggers/blob/v0.1.0/pkg/apis/triggers/v1alpha1/event_listener_validation.go#L54. My test script is not very realistic, because it just creates a number of duplicate triggers. However, it doesn't look like there's a way to exploit anything in the EventListener validation logic to make it time out. |
Also, we should probably create an e2e test to prevent a regression from occurring. To do this, I think we'll need a "realistic" maximum number of triggers that we think an EventListener should be able to have. I'm not really sure what this semi-arbitrary number should be. |
Issue tektoncd#356 found a bug in Triggers v0.1.0. EventListeners with too many triggers (around 27 in this user's case) could not be created due to an Admission Webhook timeout. This bug makes Triggers v0.1.0 unusable at a large scale. PR tektoncd#263 (which came after v0.1.0) fixes the timeout issue by removing clientset use from the EventListener validation code. So, I have manually re-applied the changes made in PR tektoncd#263 on top of v0.1.0. This should fix the timeout bug seen in v0.1.0.
I think having some performance numbers on the number of Triggers we can support would be nice! Given that we relaxed the validation to not check for the presence of bindings/templates, I don't think creating the EL is where we'll run into issues. We do use the Kubernetes/Triggers clientset to fetch resources in the EL sink at runtime (e.g. fetch the EL, then iterate over all the triggers; fetch the secret for GH/GL interceptors etc.) That might be problematic if we have a large number of incoming requests |
/kind bug |
I'm closing this issue since the bug has been resolved in the latest release of Triggers (v0.2.1) I'm opening a couple of issues for follow-up work:
|
Fixes tektoncd#405 Issue tektoncd#356 found a bug in Triggers v0.1.0 (fixed in v0.2.0) where the admission webhook times out after adding around 35 Triggers to an EventListener. This PR adds a regression test where we create an EventListener with 1000 Triggers.
Fixes tektoncd#405 Issue tektoncd#356 found a bug in Triggers v0.1.0 (fixed in v0.2.0) where the admission webhook times out after adding around 35 Triggers to an EventListener. This PR adds a regression test where we create an EventListener with 1000 Triggers. Also, moves the cluster-scope resource cleanup code (from tektoncd@7ba80a7) out of init_test.go and into eventlistener_test.go.
Fixes #405 Issue #356 found a bug in Triggers v0.1.0 (fixed in v0.2.0) where the admission webhook times out after adding around 35 Triggers to an EventListener. This PR adds a regression test where we create an EventListener with 1000 Triggers. Also, moves the cluster-scope resource cleanup code (from 7ba80a7) out of init_test.go and into eventlistener_test.go.
Update scripts for new cmd added upstream
Expected Behavior
I can use the tekton dashboard to create multiple webhooks (greater than 8 or so).
Actual Behavior
When creating the 9th webhook, the mutating webhook
triggers-webhook.tekton.dev
times out while processing theEventListener
that was updated by the tekton dashboard.Steps to Reproduce the Problem
Additional Info
Multiple users have reported this problem (kabanero-io/kabanero-foundation#150). All are using OpenShift 4.2, openshift-pipelines-operator 0.8 (includes triggers 0.1.0) and tekton dashboard 0.3.0. It appears that in OpenShift the mutating webhook timeout is 13 seconds. The
EventListener
has about 24 triggers in it (3 per github webhook) before the mutating webhook starts to time out.Perhaps a single
EventListener
was not intended to contain so many triggers... in any case it would be good to know if this is a reasonable configuration.I re-built the tekton-triggers-webhook container and put an elapsed time in the mutating admission webhook method (vendor/knative.dev/pkg/webhook/webhook.go, added
time.Since(ttStart)
):The elapsed time increases by about 1.2 seconds per github webhook that I create on my system (using the dashboard). The container can be found here:
The text was updated successfully, but these errors were encountered: