Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade knative -> 0.18 #3319

Merged
merged 8 commits into from
Oct 6, 2020
Merged

Conversation

eddie4941
Copy link
Contributor

@eddie4941 eddie4941 commented Oct 1, 2020

Changes

The newest knative/pkg release has released some change to how metrics
are generated. In particular, the new code stops generating a metrics
for every resource that is reconciled. This dramatically reduces the
metrics cardinality for the webhook. This ultimately
reduces amount of memory required by the webhook daemons and
greatly improves the usability of the /metrics endpoint as it will not
longer return a metrics for every resource that is reconciled.

fixes: #2872

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

Reviewer Notes

If API changes are included, additive changes must be approved by at least two OWNERS and backwards incompatible changes must be approved by more than 50% of the OWNERS, and they must first be added in a backwards compatible way.

Release Notes

This release will drop the `resource_name` label from `webhook_request_(count, latencies_bucket)` metrics. This this breaking change with no replacement for those relying on these metrics. The omission of these metrics will improve the overall memory usage of the webhook and the stability of the /metrics endpoint.

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 1, 2020
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 1, 2020
@tekton-robot
Copy link
Collaborator

Hi @eddie4941. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eddie4941
Copy link
Contributor Author

I tested this with older clusters to make sure the new changes didn't break anything. The results:

version info:

$ kb version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-16T00:04:31Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Tekton off of master (8ecaa29) includes the resource_name label:

$ http :9090/metrics | grep webhook_request_latencies_bucket | wc -l
      85

$ http :9090/metrics | grep webhook_request_latencies_bucket | grep 'resource_name=' | wc -l
      85

Tekton with new changes doesn't include the resource_name label.

$ http :9090/metrics | grep webhook_request_latencies_bucket | wc -l
      85

$ http :9090/metrics | grep webhook_request_latencies_bucket | grep 'resource_name=' | wc -l
       0

@imjasonh
Copy link
Member

imjasonh commented Oct 1, 2020

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 1, 2020
@imjasonh
Copy link
Member

imjasonh commented Oct 1, 2020

/ok-to-test

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 1, 2020
@tekton-robot tekton-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 2, 2020
@vdemeester
Copy link
Member

/hold
Let's make sure we wait for 0.17 to be release to get this in

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2020
@eddie4941 eddie4941 force-pushed the update-knative-0.18 branch 2 times, most recently from 477ddcc to 814aa62 Compare October 2, 2020 22:11
@mattmoor
Copy link
Member

mattmoor commented Oct 3, 2020

FWIW:
/lgtm

I was literally just about to start this and my spidey sense prompted me to go look for a PR 🎉

Looking over this I am actually really glad someone else did this as it looks like it required a lot of changes, so thanks for picking up this particularly nasty change 😅

@vdemeester is there a date for 17?

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2020
@mattmoor
Copy link
Member

mattmoor commented Oct 3, 2020

I bumped triggers here (imports this): tektoncd/triggers#784

@vdemeester
Copy link
Member

@vdemeester is there a date for 17?

Tomorrow 😉

@eddie4941
Copy link
Contributor Author

eddie4941 commented Oct 5, 2020

@vdemeester Let me know when this is all good to go. In the meantime, Ill try my best to keep it rebased and up to date.

@vdemeester
Copy link
Member

It should also fix #3237 if I am not wrong.

@vdemeester
Copy link
Member

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2020
@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

/lgtm

@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

If we can land this, then I will try to fix up my triggers and CLI PRs today/tomorrow

@vdemeester
Copy link
Member

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2020
@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

@eddie4941 looks like a merge conflict

test/workspace_test.go:253:24: cannot use t (variable of type *\"testing\".T) as context.Context value in argument to setup: missing method Deadline /home/prow/go/src/github.com/tektoncd/pipeline/test/workspace_test.go:253:25: too few arguments in call to setup

@afrittoli
Copy link
Member

I thought the fix for this was in knative/pkg v0.17 which is already in tektoncd/pipeline v0.17.... but it looks like it is not.... I just tried on my v0.17 cluster.
So we'll fix this in tektoncd/pipeline v0.18 using knative/pkg v0.18 :D

The newest knative/pkg release has released some change to how metrics
are generated. In particular, the new code stops generating a metrics
for every resource that is reconciled. This dramatically reduces the
metrics cardinality for the webhook. This ultimately
reduces amount of memory required by the webhook daemons and
greatly improves the usability of the /metrics endpoint as it will not
longer return a metrics for every resource that is reconciled.

fixes: tektoncd#2872
knative/pkg v0.18 requires the k8s deps of version v0.18.8 or higher.
Breaking changes were introduced in some of the k8s deps with version
v0.18.8. In particular, client-go v0.18.8 introduced some breaking
changes to its apis, so we need to update the version of k8s deps that
we use so so that we are compatible with knative/pgk v0.18.
client-go  v0.18.8 has different apis which are incompatible with
version v0.1.2 of the go-containerregistry module. This updates the
go-containerregistry module to v0.1.3 which supports the new client-go
apis brought in v0.18.8
client-go v0.18.8 changed all the client functions to take a
context.Context parameter. This updates the code to pass the
context.Context parameter so that it is compatible with client-go
v0.18.8. No functional changes were made with this change.
@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 6, 2020
@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

/lgtm

thanks again! 🎉

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 6, 2020
@tekton-robot tekton-robot merged commit 1710b68 into tektoncd:master Oct 6, 2020
@eddie4941 eddie4941 deleted the update-knative-0.18 branch October 6, 2020 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Webhook metrics can have enormous cardinality
6 participants