Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-27650] Performance #27

Closed
wants to merge 4 commits into from

Conversation

jglick
Copy link
Member

@jglick jglick commented Apr 2, 2015

JENKINS-27650

Builds on #26.

The current changes do not help much. I tried caching some information from call to call but the performance is still rather poor. The result also seems to often be incorrect; my analysis in #26 is that this is a manifestation of JENKINS-27708.

@jenkinsadmin
Copy link
Member

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

@jglick
Copy link
Member Author

jglick commented Apr 3, 2015

Here I have tried to do extensive caching within top-level calls to ThrottleQueueTaskDispatcher. There is still massive overhead in my test case; while it seems better than the master version, I can observe multi-minute load times for the Jenkins home screen.

The root problem is that this plugin just does a lot of work for every call when there are many matching projects. canTake is required to be fast and is in fact called for everything in the queue, every time Queue.maintain is run (every 5s I think). Caching beyond a single top-level call would surely help a lot, but then you would be likely to get race conditions. Ideally a given call to maintain would either increment some visible counter, or ask the QueueTaskDispatcher to create a new stateful object, so that all calls to the dispatcher within this maintenance round would be able to share results.

@jglick
Copy link
Member Author

jglick commented Apr 3, 2015

Even though maintain is only called routinely every 5s, scheduleMaintenance can also be called at any time, and in fact is called for example when a task finishes. So just assuming that a Snapshot can be kept for (say) 4s is not safe.

@jglick
Copy link
Member Author

jglick commented Apr 3, 2015

BTW while testing I was using a temporary patch

diff --git a/pom.xml b/pom.xml
index 84f54aa..04f6576 100644
--- a/pom.xml
+++ b/pom.xml
@@ -26,7 +26,7 @@ THE SOFTWARE.
   <parent>
     <groupId>org.jenkins-ci.plugins</groupId>
     <artifactId>plugin</artifactId>
-    <version>1.424</version>
+    <version>1.580.2</version>
   </parent>

   <artifactId>throttle-concurrents</artifactId>
@@ -113,6 +113,11 @@ THE SOFTWARE.
             <version>2.0.1</version>
             <type>jar</type>
         </dependency>
+        <dependency>
+            <groupId>org.jenkins-ci.plugins</groupId>
+            <artifactId>matrix-project</artifactId>
+            <version>1.4</version>
+        </dependency>
     </dependencies>
 </project>  

since the very old core this plugin currently uses as a baseline behaves quite differently, making it impossible to get a realistic sense of the impact of performance changes. I would recommend pushing up the baseline.

@papajulio
Copy link
Contributor

👍 To the last comment I just upgraded to the latest core and hit all this problems.

@oleg-nenashev
Copy link
Member

👍 for updating the baseline to 1.554 (?) when we resolve the issue with the scheduling correctness (see the analysis in JENKINS-27708)

The current version decreased the average GC frequency and CPU usage by canTake() but the performance stills about 10 times worse compared to the test w/o TCB plugin. The performance degradation ratio strongly depends on the queue size BTW.

My proposals (based on 1.580 LTS API):

  • Create a global cache of category and individual job ROUGH statuses
  • Setup cache values from RunListener, so the cache cannot be accurately calculate the number of jobs on executors
  • Use cache on the first calculation stage
    • If a task cannot be taken according the cached counts, canTake() immediately returns false
    • If a task can be taken, launch the existing calculation procedure with a "guaranteed" result

@jglick, what do you think?

@jglick
Copy link
Member Author

jglick commented Apr 16, 2015

Did you see my idea about ExecutorListener?

@oleg-nenashev
Copy link
Member

@jglick
Yes, I saw this comment. I would definitely vote for such extension.
Usage of RunListener is just a workaround to get some improvements on the old cores, but #28 PoC was not successful enough. ExecutorListener is a thing, which would make the caching accurate.

Queue.isPending could be also eliminated by improving QueueListener

@wolfs
Copy link
Member

wolfs commented Jun 21, 2015

What is the status on this? How can I best help fix the performance problem? Should I build on this pull request or on #28?

@oleg-nenashev
Copy link
Member

@wolfs
This Pr does not provide enough performance to resolve the issue. #28 has not been tested enough, so I would not use it in the production now. Cache invalidation is required at least.

I think the best way would be to start integrating a new Extension point into Jenkins core. I was going to start working on it on July in order to get changes integrated by the next LTS line. BTW I have not even started yet, so you can take it.

@wolfs
Copy link
Member

wolfs commented Jun 21, 2015

What exactly should the extension point provide? Would the Queue give all the calculation information to the new extension point?
Would it be feasible to extend ResourceList/Resource to provide all the necessary features to support this plugin?

Even without changes to Jenkins core - should I continue working on #28 in order to have a solution now? I am really having problems on my instance (1000 Jobs, 100 Executors) and I have currently no alternative to this plugin.

@jglick
Copy link
Member Author

jglick commented Aug 25, 2015

What exactly should the extension point provide?

Not an extension point per se, but I suggested earlier that QueueTaskDispatcher be allowed to create a state object which Queue.maintain would thread through all calls to the dispatcher within one round, so that it would only need to perform relatively expensive calculations once per round—about once per second. Whether that is enough to solve this issue is another matter, but it would certainly help.

Not sure if the locking work @stephenc did recently has any bearing on this.

@jglick jglick changed the title [WiP] [JENKINS-27650] Performance [JENKINS-27650] Performance Jan 4, 2016
@basil
Copy link
Member

basil commented Dec 13, 2019

@jglick The last update to this PR was over 4 years ago. Can this PR be closed to clean up the open pull requests list?

@jglick
Copy link
Member Author

jglick commented Dec 14, 2019

Sure, I do not recall much about it now.

@jglick jglick closed this Dec 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants