[JENKINS-27650] Performance #27

jglick · 2015-04-02T22:13:07Z

Builds on #26.

The current changes do not help much. I tried caching some information from call to call but the performance is still rather poor. The result also seems to often be incorrect; my analysis in #26 is that this is a manifestation of JENKINS-27708.

… log message which is never printed.

…to make it clear this is just a helper method.

jenkinsadmin · 2015-04-03T00:02:35Z

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

jglick · 2015-04-03T14:29:03Z

Here I have tried to do extensive caching within top-level calls to ThrottleQueueTaskDispatcher. There is still massive overhead in my test case; while it seems better than the master version, I can observe multi-minute load times for the Jenkins home screen.

The root problem is that this plugin just does a lot of work for every call when there are many matching projects. canTake is required to be fast and is in fact called for everything in the queue, every time Queue.maintain is run (every 5s I think). Caching beyond a single top-level call would surely help a lot, but then you would be likely to get race conditions. Ideally a given call to maintain would either increment some visible counter, or ask the QueueTaskDispatcher to create a new stateful object, so that all calls to the dispatcher within this maintenance round would be able to share results.

jglick · 2015-04-03T14:33:05Z

Even though maintain is only called routinely every 5s, scheduleMaintenance can also be called at any time, and in fact is called for example when a task finishes. So just assuming that a Snapshot can be kept for (say) 4s is not safe.

jglick · 2015-04-03T14:38:18Z

BTW while testing I was using a temporary patch

diff --git a/pom.xml b/pom.xml
index 84f54aa..04f6576 100644
--- a/pom.xml
+++ b/pom.xml
@@ -26,7 +26,7 @@ THE SOFTWARE.
   <parent>
     <groupId>org.jenkins-ci.plugins</groupId>
     <artifactId>plugin</artifactId>
-    <version>1.424</version>
+    <version>1.580.2</version>
   </parent>

   <artifactId>throttle-concurrents</artifactId>
@@ -113,6 +113,11 @@ THE SOFTWARE.
             <version>2.0.1</version>
             <type>jar</type>
         </dependency>
+        <dependency>
+            <groupId>org.jenkins-ci.plugins</groupId>
+            <artifactId>matrix-project</artifactId>
+            <version>1.4</version>
+        </dependency>
     </dependencies>
 </project>

since the very old core this plugin currently uses as a baseline behaves quite differently, making it impossible to get a realistic sense of the impact of performance changes. I would recommend pushing up the baseline.

papajulio · 2015-04-13T17:30:43Z

👍 To the last comment I just upgraded to the latest core and hit all this problems.

oleg-nenashev · 2015-04-15T14:38:14Z

👍 for updating the baseline to 1.554 (?) when we resolve the issue with the scheduling correctness (see the analysis in JENKINS-27708)

The current version decreased the average GC frequency and CPU usage by canTake() but the performance stills about 10 times worse compared to the test w/o TCB plugin. The performance degradation ratio strongly depends on the queue size BTW.

My proposals (based on 1.580 LTS API):

Create a global cache of category and individual job ROUGH statuses
Setup cache values from RunListener, so the cache cannot be accurately calculate the number of jobs on executors
Use cache on the first calculation stage
- If a task cannot be taken according the cached counts, canTake() immediately returns false
- If a task can be taken, launch the existing calculation procedure with a "guaranteed" result

@jglick, what do you think?

jglick · 2015-04-16T15:09:48Z

Did you see my idea about ExecutorListener?

oleg-nenashev · 2015-04-16T15:18:16Z

@jglick
Yes, I saw this comment. I would definitely vote for such extension.
Usage of RunListener is just a workaround to get some improvements on the old cores, but #28 PoC was not successful enough. ExecutorListener is a thing, which would make the caching accurate.

Queue.isPending could be also eliminated by improving QueueListener

wolfs · 2015-06-21T09:18:54Z

What is the status on this? How can I best help fix the performance problem? Should I build on this pull request or on #28?

oleg-nenashev · 2015-06-21T10:13:03Z

@wolfs
This Pr does not provide enough performance to resolve the issue. #28 has not been tested enough, so I would not use it in the production now. Cache invalidation is required at least.

I think the best way would be to start integrating a new Extension point into Jenkins core. I was going to start working on it on July in order to get changes integrated by the next LTS line. BTW I have not even started yet, so you can take it.

wolfs · 2015-06-21T15:07:31Z

What exactly should the extension point provide? Would the Queue give all the calculation information to the new extension point?
Would it be feasible to extend ResourceList/Resource to provide all the necessary features to support this plugin?

Even without changes to Jenkins core - should I continue working on #28 in order to have a solution now? I am really having problems on my instance (1000 Jobs, 100 Executors) and I have currently no alternative to this plugin.

jglick · 2015-08-25T21:31:14Z

What exactly should the extension point provide?

Not an extension point per se, but I suggested earlier that QueueTaskDispatcher be allowed to create a state object which Queue.maintain would thread through all calls to the dispatcher within one round, so that it would only need to perform relatively expensive calculations once per round—about once per second. Whether that is enough to solve this issue is another matter, but it would certainly help.

Not sure if the locking work @stephenc did recently has any bearing on this.

basil · 2019-12-13T22:39:26Z

@jglick The last update to this PR was over 4 years ago. Can this PR be closed to clean up the open pull requests list?

jglick · 2019-12-14T21:53:57Z

Sure, I do not recall much about it now.

ydubreuil and others added 3 commits March 2, 2015 15:46

Avoid looping over and over on Queue.pendings

43d4730

A lot of time is wasted calling Jenkins.getDisplayName…to construct a…

de1533b

… log message which is never printed.

Removing getMatrixOptions, dead code. Making canRun overload private …

a0def69

…to make it clear this is just a helper method.

Experimenting with a Snapshot of current state.

7eec9c1

oleg-nenashev mentioned this pull request Apr 16, 2015

[JENKINS-27650] TCB plugin performance (extra caching) #28

Open

oleg-nenashev mentioned this pull request May 21, 2015

[JENKINS-12092] Block job by category #25

Open

jglick added the work-in-progress label Jan 4, 2016

jglick changed the title ~~[WiP] [JENKINS-27650] Performance~~ [JENKINS-27650] Performance Jan 4, 2016

jglick closed this Dec 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-27650] Performance #27

[JENKINS-27650] Performance #27

jglick commented Apr 2, 2015

jenkinsadmin commented Apr 3, 2015

jglick commented Apr 3, 2015

jglick commented Apr 3, 2015

jglick commented Apr 3, 2015

papajulio commented Apr 13, 2015

oleg-nenashev commented Apr 15, 2015

jglick commented Apr 16, 2015

oleg-nenashev commented Apr 16, 2015

wolfs commented Jun 21, 2015

oleg-nenashev commented Jun 21, 2015

wolfs commented Jun 21, 2015

jglick commented Aug 25, 2015

basil commented Dec 13, 2019

jglick commented Dec 14, 2019

[JENKINS-27650] Performance #27

[JENKINS-27650] Performance #27

Conversation

jglick commented Apr 2, 2015

jenkinsadmin commented Apr 3, 2015

jglick commented Apr 3, 2015

jglick commented Apr 3, 2015

jglick commented Apr 3, 2015

papajulio commented Apr 13, 2015

oleg-nenashev commented Apr 15, 2015

jglick commented Apr 16, 2015

oleg-nenashev commented Apr 16, 2015

wolfs commented Jun 21, 2015

oleg-nenashev commented Jun 21, 2015

wolfs commented Jun 21, 2015

jglick commented Aug 25, 2015

basil commented Dec 13, 2019

jglick commented Dec 14, 2019