Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-52165] Calling Controller.watch API #63

Merged
merged 31 commits into from
Sep 13, 2018

Conversation

jglick
Copy link
Member

@jglick jglick commented Feb 9, 2018

JENKINS-52165 Split out of #21. Downstream of jenkinsci/durable-task-plugin#60. Subsumes #73. Should improve performance by switching durable task logging and exit status checking from being pull (initiated by master at intervals) to push (initiated by agent at close intervals).

@reviewbybees

@jglick jglick changed the title [JENKINS-38381] Calling Controller.watch API [JENKINS-38381] [JENKINS-37575] Calling Controller.watch API Feb 9, 2018
@jglick
Copy link
Member Author

jglick commented Feb 9, 2018

By the way jenkinsci/support-core-plugin#128 should make it easier to see the impact of this on actual Remoting usage. Have not yet tried it myself.

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflict, but the code itself LGTM

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict

@oleg-nenashev
Copy link
Member

Approved once the upstream PRs are integrated && the conflict is fixed

@vivek
Copy link
Contributor

vivek commented Apr 3, 2018

This looks good to go if merge conflicts fixed? ping @jglick.

@svanoort
Copy link
Member

svanoort commented Apr 3, 2018

@vivek The upstream isn't really ready-to-go yet, and the whole thing needs some sort of proper and thorough testing in a simulated environment. Oleg's review found some issues already.

We could of course YOLO-merge all this but I make no promises on the fallout if so.

@jglick
Copy link
Member Author

jglick commented Apr 3, 2018

Well, the upstream “issues” are just things like Javadoc that should be improved, not user-visible problems, but yes this all deserves testing in a realistic environment. Once @svanoort is ready to start that I will polish up all the PRs.

@jglick jglick changed the title [JENKINS-38381] [JENKINS-37575] Calling Controller.watch API [JENKINS-38381] Calling Controller.watch API Jun 25, 2018

private static final long MIN_RECURRENCE_PERIOD = 250; // ¼s
private static final long MAX_RECURRENCE_PERIOD = 15000; // 15s
private static final float RECURRENCE_PERIOD_BACKOFF = 1.2f;
private static final long WATCHING_RECURRENCE_PERIOD = Main.isUnitTest ? /* 5s */5_000: /* 5m */300_000;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be switched to nonfinal constant initialized via system property and overridden in the relevant test only.

this.remote = ws.getRemote();
setupTimer();
try {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should introduce a system property to act as a kill switch and fall back to polling.

@jglick jglick changed the title [JENKINS-38381] Calling Controller.watch API [JENKINS-52165] Calling Controller.watch API Jun 25, 2018
@jglick jglick requested a review from dwnusbaum June 25, 2018 21:38
/** If set to false, disables {@link Execution#watching} mode. */
@SuppressWarnings("FieldMayBeFinal")
private static boolean USE_WATCHING = !"false".equals(System.getProperty(DurableTaskStep.class.getName() + ".USE_WATCHING"));

/**
* Represents one task that is believed to still be running.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will flesh this out with an explanation of the expected state changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

os = ps;
}
try {
IOUtils.copy(stream, os);
Copy link
Member

@dwnusbaum dwnusbaum Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all PrintStream#write overloads synchronize on this and null-check out before writing (since it is nulled out once the stream is closed). Do we need to do either of those here as well? This usage should always be from the same thread IIUC, but do other threads have access to listener.getLogger()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that sounds safer.

if (ps.getClass() == PrintStream.class) {
// Try to extract the underlying stream, since swallowing exceptions is undesirable and PrintStream.checkError is useless.
OutputStream os = (OutputStream) printStreamDelegate.get(ps);
synchronized (os) { // like PrintStream.write overloads do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean synchronized (ps)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that seems to be the lock, will change.

if (ps.getClass() == PrintStream.class) {
// Try to extract the underlying stream, since swallowing exceptions is undesirable and PrintStream.checkError is useless.
synchronized (ps) { // like PrintStream.write overloads do
IOUtils.copy(stream, (OutputStream) printStreamDelegate.get(ps));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the PrintStream to be closed? If so, we'd get an NPE because printStreamDelegate.get(ps) will be null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be closed. If it is, we expect this method to throw an exception. I can make it throw a specific exception for better diagnostics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess it would be clear to anyone investigating where the NPE came from anyway, so it probably doesn't need a specific exception, but 3f8ced4 won't hurt.


// ditto
@Override public void problem(Exception x) {
Functions.printStackTrace(x, listener().getLogger());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work. Due to an apparent mistake in jenkinsci/pipeline-cloudwatch-logs-plugin#5 affecting agent-side, but not master-side, logging, I saw in the build log:

java.lang.NullPointerException
	at io.jenkins.plugins.pipeline_log_fluentd_cloudwatch.FluentdLogger$FluentdOutputStream.eol(FluentdLogger.java:128)
	at hudson.console.LineTransformationOutputStream.eol(LineTransformationOutputStream.java:60)
	at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:56)
	at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:74)
	at java.io.PrintStream.write(PrintStream.java:480)
	at org.jenkinsci.plugins.credentialsbinding.impl.BindingStep$Filter$1.eol(BindingStep.java:183)
	at hudson.console.LineTransformationOutputStream.eol(LineTransformationOutputStream.java:60)
	at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:56)
	at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:74)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1793)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$HandlerImpl.output(DurableTaskStep.java:571)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$Watcher.run(FileMonitoringTask.java:477)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

with the build proceeding nonetheless.

private static final Field printStreamDelegate;
static {
try {
printStreamDelegate = FilterOutputStream.class.getDeclaredField("out");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, reflection hacks: the universal panacea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I could not find a better way to do this.

} catch (Exception x) {
// Try to report it to the master.
try {
execution.problem(x);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I predict we'll be seeing a lot of this particular report when there are network issues. Thankfully it won't be that often due to the between recurrence periods.

Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's as good as it's going to get for now -- happy to see us doing something with errors besides letting the PrintStream swallow it

@svanoort svanoort merged commit 3bd3498 into jenkinsci:master Sep 13, 2018
@jglick jglick deleted the watch-JENKINS-38381 branch September 13, 2018 18:31
orgi pushed a commit to orgi/workflow-durable-task-step-plugin that referenced this pull request Nov 13, 2018
[JENKINS-52165] Calling Controller.watch API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants