Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable insertId generation, and update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s. #68920

Merged
merged 1 commit into from
Oct 11, 2018

Conversation

qingling128
Copy link
Contributor

What this PR does / why we need it:
Enable insertId generation, and update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s. This help reduce log duplication and guarantee log order.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Enable insertId generation, and update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s. This help reduce log duplication and guarantee log order.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/gcp needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 21, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 21, 2018
@qingling128
Copy link
Contributor Author

Hi, could someone help with adding a /ok-to-test tag? @x13n @jszczepkowski

@x13n
Copy link
Member

x13n commented Sep 21, 2018

/ok-to-test
/assign

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 21, 2018
@x13n
Copy link
Member

x13n commented Sep 25, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2018
@qingling128
Copy link
Contributor Author

/assign @MaciekPytel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 28, 2018
@qingling128
Copy link
Contributor Author

Just added some tolerations for Metadata Agent to fix the issue where Metadata Agents are not scheduled.

PTAL @x13n @MaciekPytel

@@ -56,6 +56,13 @@ spec:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
tolerations:
- key: "node.alpha.kubernetes.io/ismaster"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below you are adding a toleration for all keys with NoSchedule effect, so this one doesn't seem to be doing anything.

@x13n
Copy link
Member

x13n commented Oct 2, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 2, 2018
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want such strong tolerations? This will make this able to schedule anywhere, including for example a node that is currently being deleted or a super-expensive node with GPU.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we currently have for fluentd and they should really go together. If this is more restricted, then fluentd should be more restricted, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this is meant to run on every node? In this case it makes much more sense.

What about metadata-agent-cluster-level below though? If it's one-per-cluster than presumably it should schedule on a node without taints, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, I think the cluster level one should respect taints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want all cluster level Metadata Agent to be scheduled though. Otherwise we lose insight of the metadata. It's intentional to disregard the taints. Note that this will only apply to customers who enable the Metadata Agent addons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@x13n - Yeah, Heapster does use CriticalAddonsOnly

As I read this guaranteed-scheduling-critical-addon-pods doc to understand a bit more about what it does, and it sounds like CriticalAddonsOnly will be deprecated soon. Instead, it recommends priorityClass.

And I did find a priorityClassName spec defined for Heapster:

We probably should just use priorityClassName: system-cluster-critical for Logging and Metadata Agents then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@x13n @MaciekPytel Friendly ping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/ says this feature is in beta starting from 1.11. You can add it, but don't cherrypick it to 1.10 branch later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Updated to use priorityClassName: system-cluster-critical instead. PTAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2018
@qingling128
Copy link
Contributor Author

PTAL

@qingling128
Copy link
Contributor Author

/retest

@@ -28,6 +28,7 @@ spec:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
serviceAccountName: metadata-agent
priorityClassName: system-cluster-critical
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is per-node, so probably should be system-node-critical (which is what fluentd-gcp uses already). @bsalamat can you help with getting this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to system-node-critical.

@bsalamat - Does this look right to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @x13n said, it system-node-critical is the right one to use here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming!

… 0.5-1.5.36-1-k8s and add priorityClassName for Metadata Agent.
@qingling128
Copy link
Contributor Author

/retest

@x13n
Copy link
Member

x13n commented Oct 10, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2018
@qingling128
Copy link
Contributor Author

Thanks for lgtm. @MaciekPytel Does this look good to you too?

@x13n
Copy link
Member

x13n commented Oct 11, 2018

Btw, I understand this change as correcting the existing behavior, so:
/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 11, 2018
@MaciekPytel
Copy link
Contributor

/approve
Cluster-level components (presumably metadata-agent-cluster-level) shouldn't have wide tolerations, because it removes the ability to control where those pods go and we don't want them scheduling on a node with expensive hardware they don't need or a node that is currently being deleted by autoscaler / node upgrade process, etc.

On the other hand we may need such tolerations to per-node metadata agent. Without them the agent won't schedule on a node with GPU or a node that has been explicitly dedicated for a specific job by user. If we need agent on every node than it probably needs the tolerations.

I'd suggest adding back tolerations to per-node metadata agent if we do need it running on every node.

/hold
So this doesn't merge before @qingling128 has a chance to reply to my comment. If I'm wrong and we don't need metadata agent running on tainted nodes feel free to cancel hold and let this merge.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2018
@x13n
Copy link
Member

x13n commented Oct 11, 2018

/approve
(since lgtm doesn't automatically approve anymore)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, qingling128, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 11, 2018
@x13n
Copy link
Member

x13n commented Oct 11, 2018

@MaciekPytel According to the documentation, critical pods should be marked system-cluster-critical or system-node-critical when priorities are used. CriticalAddonsOnly taint was used by rescheduler which apparently is deprecated and removed in 1.12.

@MaciekPytel
Copy link
Contributor

@x13n I understand this part, but AFAIK priorityClass just allows the pod to preempt a lower priority pod if there is no space in the cluster. It doesn't allow bypassing other scheduling constraints (such as taints).

@qingling128
Copy link
Contributor Author

For Metadata Agent, we expect the behavior to be consistent between cluster level agent and node level agent. CriticalAddonsOnly is to be deprecated, so we probably don't want to introduce that now.

The current implementation sounds like what we want.

@qingling128
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2018
@k8s-ci-robot k8s-ci-robot merged commit 1aef631 into kubernetes:master Oct 11, 2018
@qingling128
Copy link
Contributor Author

Created cherry-pick PRs to release branches 1.11 and 1.12.

k8s-ci-robot added a commit that referenced this pull request Oct 22, 2018
…68920-upstream-release-1.12

Automated cherry pick of #68920: Enable insertId generation, update Stackdriver Logging Agent
k8s-ci-robot added a commit that referenced this pull request Nov 2, 2018
…68920-upstream-release-1.11

Automated cherry pick of #68920: Enable insertId generation, update Stackdriver Logging Agent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants