Skip to content

Conversation

@mmitche
Copy link
Contributor

@mmitche mmitche commented May 16, 2016

This adds some synchronization to the maps at the root of the API to avoid duplicated calls to the actual GH REST API. Specifically this is targeted around the two maps, orgs and users. This fix makes the GHPRB jenkins plugin behave much better when there are lots of projects that could build for a specific repo (even if only a few are actually triggered)

@kohsuke
Copy link
Collaborator

kohsuke commented Jun 3, 2016

OTOH this change introduces contention when multiple threads try to access different users and orgs. I wonder if it'd be worthwhile to take the fix one step further and address that.

@mmitche
Copy link
Contributor Author

mmitche commented Jun 3, 2016

Wouldn't that still cause problems? We should avoid concurrent modification of the maps. What are you thinking?

I'm also thinking that this is an incomplete fix. There are a lot of non-final fields around the API that could be modified by multiple threads if someone passed...say...a pull request object to multiple threads and made multiple concurrent calls to certain APIs. However, the maps at the root appear to be the primary source of concurrency issues.

// map, the point is to avoid making unnecessary GH API calls, which are expensive from
// an API rate standpoint
synchronized (users) {
GHUser cachedUser = users.get(this.login);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kohsuke This is invalid right now (we call getMyself in the constructor). I'm fixing this to check for a valid login, but in the meantime don't merge this.

*/
private void populate() throws IOException {
if (merged_by!=null) return; // already populated
if (mergeable_state != null) return; // already populated by id
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged_by may be null if the PR was unmerged, leading to an API call every time we call populate()

This adds some synchronization to the maps at the root of the API to avoid duplicated calls to the actual GH REST API.  Specifically this is targeted around the two maps, orgs and users.  This fix makes the GHPRB jenkins plugin behave much better when there are lots of projects that could build for a specific repo (even if only a few are actually triggered)

There are also a few fixes around GHUser and GHPullRequest
* GHPullRequest was checking a field that may be null (merged_by) when determining whether to fetch details.  An unmerged PR would make a bunch of Github API calls for each property accessed.
* Where GHUser was returned in various objects, we weren't going through the caching mechanism at the root, so calls to APIs on GHUSer often resulted in new REST calls.  Instead, return from the cache wherever possible.
@mmitche
Copy link
Contributor Author

mmitche commented Jun 8, 2016

@kohsuke I added a bunch of logging to the API and made a bunch of additional fixes that should improve overall performance (few REST calls). PTAL.

@mmitche
Copy link
Contributor Author

mmitche commented Jun 21, 2016

@kohsuke Have you had a chance to look at the updated version here?

This reduced our GH API calls by around 50% in a large jenkins installation

@akostadinov
Copy link

In a small install we are hitting the rate_limit issue. Any update on this pull request?

@mmitche
Copy link
Contributor Author

mmitche commented Aug 29, 2017

@kohsuke Addressed issues. Changed over to using a ConcurrentHashMap, which has fast reads (the common case) and only locks on write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants