-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pod template ID inconsistency over master reboots #988
Open
fredcooke
wants to merge
7
commits into
jenkinsci:master
Choose a base branch
from
fredcooke:fix-pod-template-id-situation
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+254
−4
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
ff4ce52
Add optional ID to the pod template step class such that it can be su…
fredcooke b8ba0e0
Add missing spaces to an if condition to aid readability.
fredcooke 3f3178e
No point checking for null, the constructor handles the null the same…
fredcooke 6dacdf5
Expand static imports out so the api usage footprint is visible in di…
fredcooke 0003b17
NUKE this commit before final merge, obviously. Hoping this will allo…
fredcooke a7abe45
First cut of a test that I hope fails - for the right reason. Not 100…
fredcooke f9e5506
Replace the expected message with one that is actually output to get …
fredcooke File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
173 changes: 173 additions & 0 deletions
173
...chez/jenkins/plugins/kubernetes/pipeline/RestartPipelinePodTemplateIdConsistencyTest.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
/* | ||
* The MIT License | ||
* | ||
* Copyright (c) 2017, Red Hat, Inc. | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in | ||
* all copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
* THE SOFTWARE. | ||
*/ | ||
|
||
package org.csanchez.jenkins.plugins.kubernetes.pipeline; | ||
|
||
import static java.util.Arrays.asList; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.setupHost; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.setupCloud; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.createSecret; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.createPipelineJobThenScheduleRun; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.assumeWindows; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.assumeKubernetes; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.getLabels; | ||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesTestUtil.deletePods; | ||
import static org.junit.Assert.assertTrue; | ||
|
||
import java.io.IOException; | ||
import java.util.Collections; | ||
import java.util.Optional; | ||
import java.util.concurrent.ExecutionException; | ||
import java.util.concurrent.atomic.AtomicReference; | ||
import java.util.logging.Level; | ||
import java.util.logging.Logger; | ||
|
||
import org.apache.commons.compress.utils.IOUtils; | ||
import org.csanchez.jenkins.plugins.kubernetes.ContainerEnvVar; | ||
import org.csanchez.jenkins.plugins.kubernetes.ContainerTemplate; | ||
import org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud; | ||
import org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave; | ||
import org.csanchez.jenkins.plugins.kubernetes.PodTemplate; | ||
import org.csanchez.jenkins.plugins.kubernetes.model.KeyValueEnvVar; | ||
import org.csanchez.jenkins.plugins.kubernetes.model.SecretEnvVar; | ||
import org.csanchez.jenkins.plugins.kubernetes.model.TemplateEnvVar; | ||
import org.jenkinsci.plugins.workflow.job.WorkflowJob; | ||
import org.jenkinsci.plugins.workflow.job.WorkflowRun; | ||
import org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution; | ||
import org.junit.BeforeClass; | ||
import org.junit.ClassRule; | ||
import org.junit.Rule; | ||
import org.junit.Test; | ||
import org.junit.rules.TemporaryFolder; | ||
import org.junit.rules.TestName; | ||
import org.jvnet.hudson.test.BuildWatcher; | ||
import org.jvnet.hudson.test.Issue; | ||
import org.jvnet.hudson.test.JenkinsRule; | ||
import org.jvnet.hudson.test.LoggerRule; | ||
import org.jvnet.hudson.test.RestartableJenkinsNonLocalhostRule; | ||
|
||
import hudson.model.Node; | ||
import hudson.model.Result; | ||
import hudson.slaves.DumbSlave; | ||
import hudson.slaves.JNLPLauncher; | ||
import hudson.slaves.NodeProperty; | ||
import hudson.slaves.RetentionStrategy; | ||
|
||
public class RestartPipelinePodTemplateIdConsistencyTest { | ||
protected static final String CONTAINER_ENV_VAR_VALUE = "container-env-var-value"; | ||
protected static final String POD_ENV_VAR_VALUE = "pod-env-var-value"; | ||
protected static final String SECRET_KEY = "password"; | ||
protected static final String CONTAINER_ENV_VAR_FROM_SECRET_VALUE = "container-pa55w0rd"; | ||
protected static final String POD_ENV_VAR_FROM_SECRET_VALUE = "pod-pa55w0rd"; | ||
protected KubernetesCloud cloud; | ||
|
||
@Rule | ||
public RestartableJenkinsNonLocalhostRule story = new RestartableJenkinsNonLocalhostRule(44000); | ||
@Rule | ||
public TemporaryFolder tmp = new TemporaryFolder(); | ||
|
||
@ClassRule | ||
public static BuildWatcher buildWatcher = new BuildWatcher(); | ||
|
||
@Rule | ||
public LoggerRule logs = new LoggerRule().record(Logger.getLogger(KubernetesCloud.class.getPackage().getName()), | ||
Level.ALL); | ||
//.record("org.jenkinsci.plugins.durabletask", Level.ALL).record("org.jenkinsci.plugins.workflow.support.concurrent", Level.ALL).record("org.csanchez.jenkins.plugins.kubernetes.pipeline", Level.ALL); | ||
|
||
@Rule | ||
public TestName name = new TestName(); | ||
|
||
@BeforeClass | ||
public static void isKubernetesConfigured() throws Exception { | ||
assumeKubernetes(); | ||
} | ||
|
||
private static void setEnvVariables(PodTemplate podTemplate) { | ||
TemplateEnvVar podSecretEnvVar = new SecretEnvVar("POD_ENV_VAR_FROM_SECRET", "pod-secret", SECRET_KEY, false); | ||
TemplateEnvVar podSimpleEnvVar = new KeyValueEnvVar("POD_ENV_VAR", POD_ENV_VAR_VALUE); | ||
podTemplate.setEnvVars(asList(podSecretEnvVar, podSimpleEnvVar)); | ||
TemplateEnvVar containerEnvVariable = new KeyValueEnvVar("CONTAINER_ENV_VAR", CONTAINER_ENV_VAR_VALUE); | ||
TemplateEnvVar containerEnvVariableLegacy = new ContainerEnvVar("CONTAINER_ENV_VAR_LEGACY", | ||
CONTAINER_ENV_VAR_VALUE); | ||
TemplateEnvVar containerSecretEnvVariable = new SecretEnvVar("CONTAINER_ENV_VAR_FROM_SECRET", | ||
"container-secret", SECRET_KEY, false); | ||
podTemplate.getContainers().get(0) | ||
.setEnvVars(asList(containerEnvVariable, containerEnvVariableLegacy, containerSecretEnvVariable)); | ||
} | ||
|
||
public void configureCloud() throws Exception { | ||
cloud = setupCloud(this, name); | ||
createSecret(cloud.connect(), cloud.getNamespace()); | ||
cloud.getTemplates().clear(); | ||
|
||
setupHost(); | ||
|
||
story.j.jenkins.clouds.add(cloud); | ||
} | ||
|
||
public void configureAgentListener() throws IOException { | ||
//Take random port and fix it, to be the same after Jenkins restart | ||
int fixedPort = story.j.jenkins.getTcpSlaveAgentListener().getAdvertisedPort(); | ||
story.j.jenkins.setSlaveAgentPort(fixedPort); | ||
} | ||
|
||
protected String loadPipelineScript(String name) { | ||
try { | ||
return new String(IOUtils.toByteArray(getClass().getResourceAsStream(name))); | ||
} catch (Throwable t) { | ||
throw new RuntimeException("Could not read resource:[" + name + "]."); | ||
} | ||
} | ||
|
||
@Test | ||
public void dynamicPodTemplateStepSupportsRestart() { | ||
story.then(r -> { | ||
// Do this only once: | ||
configureAgentListener(); | ||
configureCloud(); | ||
r.jenkins.setNumExecutors(0); | ||
// Do this stuff here and then again in a new story.then() call to simulate restart | ||
WorkflowRun b = getPipelineJobThenScheduleRun(r); | ||
r.waitForMessage("End of Pipeline", b); | ||
r.assertBuildStatus(Result.SUCCESS, r.waitForCompletion(b)); | ||
}); | ||
story.then(r -> { | ||
// Do what we do in the first one again, except the configureCloud and before bits | ||
WorkflowRun b = getPipelineJobThenScheduleRun(r); | ||
r.waitForMessage("End of Pipeline", b); | ||
r.assertBuildStatus(Result.SUCCESS, r.waitForCompletion(b)); | ||
}); | ||
} | ||
|
||
private PodTemplate waitForTemplate(KubernetesSlave node) throws InterruptedException { | ||
while (node.getTemplateOrNull() == null) { | ||
Thread.sleep(100L); | ||
} | ||
return node.getTemplate(); | ||
} | ||
|
||
private WorkflowRun getPipelineJobThenScheduleRun(JenkinsRule r) throws InterruptedException, ExecutionException, IOException { | ||
return createPipelineJobThenScheduleRun(r, getClass(), name.getMethodName()); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
57 changes: 57 additions & 0 deletions
57
...csanchez/jenkins/plugins/kubernetes/pipeline/dynamicPodTemplateStepSupportsRestart.groovy
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
def call(body) { | ||
def config = [:] | ||
body.resolveStrategy = Closure.DELEGATE_FIRST | ||
body.delegate = config | ||
milestone() | ||
def yamlPod = ''' | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
labels: | ||
jenkins: agent | ||
spec: | ||
containers: | ||
- name: maven-image | ||
command: | ||
- cat | ||
image: busybox | ||
imagePullPolicy: IfNotPresent | ||
resources: | ||
limits: | ||
cpu: 100m | ||
memory: 8Mi | ||
requests: | ||
cpu: 100m | ||
memory: 8Mi | ||
tty: true | ||
dnsPolicy: ClusterFirst | ||
restartPolicy: Never | ||
securityContext: {} | ||
terminationGracePeriodSeconds: 30 | ||
''' | ||
def yamlPodHash = '#{project.artifactId}-#{project.version}-maven-image' | ||
|
||
// one build per job to ensure tag consistency | ||
try { | ||
lock(resource: env.JOB_NAME.split('/')[1], inversePrecedence: true) { | ||
milestone() | ||
podTemplate(label: yamlPodHash, instanceCap: 6, idleMinutes: 60, cloud: 'dynamicPodTemplateStepSupportsRestart', yaml: yamlPod) { | ||
node(yamlPodHash) { | ||
container('maven-image') { | ||
timeout(20) { | ||
println "dynamic pod template step success" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
println "${JOB_NAME} ${currentBuild.displayName} see ${JOB_URL}" | ||
|
||
} | ||
catch (e) { | ||
println e | ||
println "${JOB_NAME} ${currentBuild.displayName} see ${JOB_URL}" | ||
throw e | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just remove the deprecated
label
parameterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jesse, thanks for taking a look! I don't understand how that can function without it in a dynamic way (not hard coded into the org setup).
The class used to work by looking up the pod template and thus available pods by the label. Then it changed to looking up by ID but with enforced randomness in the ID - removing the label would be fine if we could control the ID, but we currently can't control that.
The components that are equivalent to the above snippet are reusable libraries of pod-definition (podTemplate, if you will) and are defined in a totally separate and independent package to the organisation config/setup - this is rewritten completely every boot and is always the same and always works because always the same. On top of the pod template reusable blocks ride a bunch of job types with a many to one relationship to a pod spec, ie, three different build styles might use one pod spec, and another 3 additional build styles might use a second different pod spec. Different sets of build styles and pod specs are available in different packages with the final jenkins build including those that are useful to a particular team/project.
Without putting these pod specs in the organisation where IMO they do not belong how can you get reusable pods across N jobs that share M build styles that share 1 pod spec/template/definition that is implemented by P agent pods?
The real version of this has the body passed in from the out side and run as
body()
in place of the println on line 42 of that groovy.How does POD_LABEL provide the same experience for us that we had before this got broken and what is the perceived benefit of these items being IDed randomly at creation without it having any bearing on the yaml or name included? Help! :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid I am unable to follow what you are trying to accomplish or more to the point what you think does not already work. Simple
should work reliably, including across controller restarts, regardless of the number of jobs, concurrent builds, different YAML contents, etc. etc. You should not need to pay any attention to “id”, “label”, etc., or have any global configuration beyond the physical connection to the K8s cluster & namespace (not even that, if the controller itself is running inside K8s and using a service account with appropriate permissions). The above snippet will create a
Pod
with the definition you specify, wait for its agent to connect to the controller, do work, then delete the pod and clean up, with no interference with other builds or jobs or global settings.In the future we could introduce a separate step that bypasses the Jenkins queue altogether and makes things even simpler:
by analogy with https://www.jenkins.io/doc/pipeline/steps/docker-slaves/#dockernode-allocate-a-docker-node but for now the
POD_LABEL
idiom is the glue needed to make dynamic pod configuration work with the stocknode
step.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Jesse, we're somewhat in the same boat here :-) You don't understand what we're trying to do and we don't understand why you're not letting us do it :-D But the above clarifies your angle so hopefully I can build on that and explain what our intent is:
This is 100% not what we want :-D The idea is that we specify the definition in such a way that it matches existing pods and can then just use them. The result of the change Vincent made was that it doesn't match or not match, it just fails.
It seems like you've interpreted that groovy file as having the intent to create a bespoke single use pod, use it, and dump it - is that right? That's not the intent, and this setup worked flawlessly for years before the commit that changed from:
to
Allowing us to set that ID would restore perfect clean functionality for us - but both of you seem pretty heavily against that, so I need to understand what a solution that meets your idea of how it should be used AND meets our idea of an acceptable way to use it is, and if that can be found, rework our stuff and roll it out everywhere. And if not, maybe just fork it, add the one line, and hope no further breakage occurs. Though I really really really don't want to do that for a lot of reasons.
One of the reasons we don't want brand new pods every time is the cache that is held in the pod between builds will be empty every time and every build will be ultra slow, not just those first thing in the working day. Even if/when we do something to improve the cache seeding situation that seeding will be a fairly heavy overhead (largeish data copy) for a new pod and not something we'd want to do hundreds or thousands of times per day.
Our goal is:
I don't understand how any fresh-on-boot system can be consistent and can match the pod template when it runs if it's not possible to create those and specify the ID or have the ID derived from a hash of the contents? That would be fine, too.
Maybe it IS possible but in a different way and not in the build command spec like now where it goes via the PodTemplateStep object (created confusingly by podTemplate() call) and not directly into a PodTemplate object where that ID can already be set.
I hope that clarified a few things for you as your post did for me, and I look forward to your response! :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
kubernetes
plugin is designed to create a pod for every build (or more precisely, everynode
block), and then throw it away once the build (block) exits. Reusing pods is not a supported use case. (There is an option to keep a pod after exiting, but this is solely for diagnostic purposes, and even that is a dubious practice.)You can specify a persistent volume to reuse between pods, assuming the storage class supports concurrent mounts. This is generally a quite fragile setup, though (special requirements on volume drivers, UID mismatches, file locking issues, ad nauseam) so you are generally better off using a mirror (if dependency downloads are the problem) or a distributed-friendly build system with a cache server (if incremental builds are what you are looking for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take on this, with the given plugin state:
The fact dynamic pod templates (or corresponding pods) are accessible to other jobs is a undesirable side-effect of using the Cloud API. Unfortunately this is what we have today, and as Jesse mentioned in his earlier response this is something we want to eliminate if we have the opportunity.
For your use case, I think it can fit with the current plugin if you manage static pod templates using the jcasc aproach. I'm not in favour to make any plugin change that would facilicate re-use of pods if they have been defined in a Jenkinsfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC no, the request was to reuse pods, not just pod templates, for performance reasons. Certainly reusing pod templates across jobs is well supported (even if we would be inclined to recommend the
podTemplate
step going forward), but that is still assuming one pod pernode
block.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
kubernetes-plugin/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java
Lines 538 to 544 in 4068c52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OnceRetentionStrategy
is the normal usage, but the option to useCloudRetentionStrategy
suggests that the long-lived-agent style was deliberately added—albeit without tests, documentation, or the ability to use multiple executors? Bisecting says in #91, whose description seems to match this request.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jesse, Vincent, long time no talk! I did see all of these updates but have been busy with other stuff.
This is still burning everyone, however, and the number of people it's burning keeps growing. Please stop calling it a rare demand. I speak for 50+ people across 7 or so organisations when I comment here. I'd rather not repeat that a third or subsequent time. This is a big deal for a lot of people across two banks and various other companies.
Please, this is your personal perception and not any part of our reality. These aspects are managed properly and never give us any trouble.
It's not "vm like" it's normal for well composed software that is assembled from small reusable pieces.
You have personally, and I commend you for your curiosity and diligence in researching it, proven that that's not the case. It was added to this plugin by the founder of the plugin, and for good reason. Let's call that the design space. There are another dozen people wanting it on the various links you provided, in addition to the multiple dozens I bring to the table.
Our Jenkinsfiles are one or two line, paraphrasing:
And to your other comment:
How are pod templates defined in shared libraries that are static with respect to a thousand builds not static in reality?
I understand that you can't guarantee that, but can you make the assumption that someone neck deep in configuring this has the talent to ensure they don't hang themselves with the rope they're given, and just give them the rope?
All of our pod templates are named and versioned, unique and immutable. If they don't match, you get a new pod. If they do, during a master/controller run, or even after a restart before the fateful change, they get reused.
I'd like to take this moment to remind you that this is ONLY across restarts that there is now an issue. It STILL works as required during a single master node execution cycle. Please do not cripple it any further.
My change just lets me control the ID such that it's not random and is deliberate and meaningful in a controlled and safe way. For existing users, this makes no difference unless they're currently seeing a "field doesn't exist" of the same name by random chance. Only then could behaviour change.
My change simply matches the available-to-me API with the underlying object API, it doesn't add anything new, it just exposes a TINY bit more control that wasn't previous required when the default behaviour made sense and wasn't broken. I understand the desire to make them unique for your throw-away style of builds, but please understand our desire to keep them matching and reusable and look-up-able for our builds that work the same exact way, guaranteed by their own semantics, whether there is cache available, or not, just MUCH slower when not.
Please let us move on and let this PR die in a good way.