Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get author from blame #60

Merged
merged 12 commits into from
Jan 17, 2024
Merged

Conversation

sidsprasad
Copy link
Contributor

  • Modified Dockerfile to use alpine base
  • Added the new flag
  • Added git command to mark directory as safe for running git cmds
  • parallelly get the assignees for the newly created issues
  • edit the newly created issues by adding assignees after they've been created.
  • Upgraded libs

- Modified Dockerfile to use alpine base
- Added the new flag
- Added git command to mark directory as safe for running git cmds
- Upgraded libs
@sidsprasad sidsprasad force-pushed the get_author_from_blame branch 4 times, most recently from 8c6b137 to 5e828ae Compare January 15, 2024 02:37
@sidsprasad
Copy link
Contributor Author

sidsprasad commented Jan 15, 2024

@ribtoks ready for review!
As discussed, I created a routine to parallely get assignees for new issues from the commit hash.
And then only after the new issues have been created, do we edit those issues by adding assignees.

Copy link
Owner

@ribtoks ribtoks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wait, just got to this. There're some changes needed

README.md Outdated
@@ -60,6 +60,7 @@ You can use this action together with [parent issue updater](https://github.com/
| `CLOSE_LIMIT` | Upper cap on the number of issues to close (defaults to `0` - unlimited) |
| `COMMENT_ON_ISSUES` | Leave a comment in which commit the issue was closed (defaults to `0` - do not comment) |
| `CONCURRENCY` | How many files to process in parallel (defaults to `128`) |
| `AUTHOR_FROM_BLAME` | Get the author of the todo via git blame if not already specified in the comment (defaults to '0' - do not use) |
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTHOR_FROM_BLAME here and in action.yml it's ASSIGN_FROM_BLAME.

main.go Outdated
@@ -270,7 +273,7 @@ func (s *service) createProjectCard(issue *github.Issue) {
log.Printf("Created a project card. issue=%v card=%v", issue.GetID(), card.GetID())
}

func (s *service) openNewIssues(issueMap map[string]*github.Issue, comments []*tdglib.ToDoComment) {
func (s *service) openNewIssues(issueMap map[string]*github.Issue, comments []*tdglib.ToDoComment, newIssueMap *map[string]*github.Issue) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move newIssueMap to be a field of service. Then there will be less juggling around of these pointers.

P.S. There's no need to pass a pointer of map somewhere in Go, because maps are already reference types (like "automatic pointers").

main.go Outdated
@@ -325,6 +331,57 @@ func (s *service) openNewIssues(issueMap map[string]*github.Issue, comments []*t
log.Printf("Created new issues. count=%v", count)
}

func (s *service) assignNewIssues(issueMap map[string]*github.Issue, assigneeMap *map[string]string) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move assigneeMap to become a member of service

main.go Show resolved Hide resolved
main.go Outdated
}
}

func (s *service) getAssigneeFromCommitHash(commitHash string, title string, assigneeMap *map[string]string, successCount *int, assigneeMapMux sync.Locker, wg *sync.WaitGroup) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Usually method getXYZ() returns something. This method does not return anything. I propose to rename it to retrieveCommitAssignees() or something along these lines.
  2. There's no need to store successCount as the size of assigneeMap is equal to successCount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding point 2, successCount is not incremented if there is an error getting the commit author while calling the github API. So, not always the size of assigneeMap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand what you mean now 😛
Fixing this.

main.go Outdated

log.Printf("Waiting for issues management to finish")
svc.wg.Wait()

if env.assignFromBlame && len(newIssuesMap) > 0 {
log.Printf("Adding assignees to newly created issues.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should move this logline to assignNewIssues() for clarity

main.go Show resolved Hide resolved
main.go Outdated
Comment on lines 371 to 372
_, ok := issueMap[c.Title]
if !ok {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these 2 lines can be written more "Go-idiomatically" as:

if _, ok := issueMap[c.Title]; !ok {

main.go Outdated
func (s *service) getAssigneesForNewIssues(issueMap map[string]*github.Issue, comments []*tdglib.ToDoComment, assigneeMap *map[string]string) {
defer s.wg.Done()

var wg sync.WaitGroup
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the reason we need a new wait group and cannot reuse the one from the service? We're still waiting for that wait group in the main() before we return from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I wanted to output the number of issues for which we successfully got assignees. So, for that, we want all the threads to have completed. If we use the other existing wg, then we would have to get this info only after the wait command for that, but that is outside this function and the code would be a bit messy.

main.go Outdated
log.Printf("Error while getting commit from commit hash. err=%v", err)
} else {
assigneeMapMux.Lock()
defer assigneeMapMux.Unlock()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For short code blocks you don't need defer to unlock the mutex. It's more fail-safe, since if you did that in a loop, it would be a deadlock with defer

@sidsprasad
Copy link
Contributor Author

sidsprasad commented Jan 16, 2024

@ribtoks Thanks for the review.
I have left some comments above and made the changes discussed and pushed EXCEPT for the Git API limits discussion and cache.
I will look into them now, but pushed the other changes for you to review in the meantime 😄

@sidsprasad sidsprasad force-pushed the get_author_from_blame branch 2 times, most recently from fff365b to 0ee814a Compare January 16, 2024 04:40
@sidsprasad
Copy link
Contributor Author

@ribtoks okay, so we can look at the rate limits in the RateLimits attribute of the github client.
When the client is authorized, it usually has 1000 requests per hour. But if it is unauthorized, only 60 per hour.

In general, our code used to call the API n + C times, where n is the number of new/closed issues and C is the number of requests it would take to get all the issues (more than 1 if it is paginated).

With this change, it would call the API 3 * n + C times. +2n times because it gets the commit for each new issue and it edits the issues as well.

There is no issue with simultaneous requests as far as I can tell from the github documentation.
There could be an issue with rates if the number of issues is more than one-third of the rate limit.

So, what I propose is that we could calculate the number of new issues and issues to close (n) before making any API calls at all, and then check if 3 * n < Rate Limit + T where T is some threshold we would set. If it's false, then we can cap the total number of isses it would create and only run for those. Then, the next time the pipeline runs, it would add the next batch of issues, and so on... What do you think about this?

Will start to code this logic while waiting for you to get back.

@sidsprasad
Copy link
Contributor Author

Oh sorry, I might be wrong about the concurrent requests thing. Taking a look at that.

@sidsprasad
Copy link
Contributor Author

@ribtoks , made the calls to commitAPI linear 👍

@sidsprasad sidsprasad marked this pull request as draft January 16, 2024 05:49
@sidsprasad sidsprasad marked this pull request as ready for review January 16, 2024 06:30
@sidsprasad
Copy link
Contributor Author

@ribtoks Added cache as well.
Ready for review.
My earlier comment about limiting the number of API calls could be a future ticket, as this is something that would have had to be dealt with even outside of my change. Let me know what you think.

Copy link
Owner

@ribtoks ribtoks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 or 2 safeguarding comments + few very cosmetic beauty improvements (for you to consider).

P.S. Please do not force-push. In such way I can only review the diff from your latest changes, instead of having to review the whole thing again.

action.yml Outdated
@@ -56,6 +56,9 @@ inputs:
COMMENT_ON_ISSUES:
description: "Create commit reference in the comments before closing the issue"
default: "0"
ASSIGN_FROM_BLAME:
description: "Assign issue to the author of the comment via git blame"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Actually now this is a misleading documentation because we assign from GitHub API. I think we should specify this, as this is a not obvious feature. Also then name of the parameter is somewhat mismatched with reality.

main.go Show resolved Hide resolved
main.go Outdated
client: github.NewClient(tc),
env: env,
newIssuesMap: make(map[string]*github.Issue),
assigneeMap: make(map[string]string),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In general would be nice to use a more descriptive name, like commitToAssigneeMap OR have a comment that explains that. Because we will forget this in 1 month. Also applicable to commitAuthorCache (which is much better, just might become commitToAuthor Map/Cache)

main.go Outdated
}

command := "git"
args := []string{"config", "--global", "--add", "safe.directory", root}
Copy link
Owner

@ribtoks ribtoks Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question. I thought this is a responsibility of tdg? Why do we need this here?

UPD. Is it because we're using tdglib and this was done by tdg (executable) before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I can't rememebr exactly why, but before I had thought it wouldn't fit right in tdg. But now I think it would fit better in tdg. But you're saying that tdg is already doing this? I don't think so. Will create an MR for it on Gitlab.

main.go Outdated
@@ -325,6 +333,55 @@ func (s *service) openNewIssues(issueMap map[string]*github.Issue, comments []*t
log.Printf("Created new issues. count=%v", count)
}

func (s *service) assignNewIssues() {
log.Printf("Adding assignees to newly created issues.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  1. It's a print F (format) without any format modifiers %x inside. Either something is missing from inside or it can become a Println()
  2. Might be useful to actually add something there. e.g. "Adding assignees to %v newly created issues", len(s.assigneeMap)

main.go Outdated
req := &github.IssueRequest{
Assignees: &[]string{assignee},
}
_, _, err := s.client.Issues.Edit(s.ctx, s.env.owner, s.env.repo, issueNumber, req)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just for beauty, this can also become if _, _, err := ....Edit(); err != nil { - merged with next line.

@sidsprasad
Copy link
Contributor Author

@ribtoks I usually only force push to new commits after the one you had reviewed (Usually when I made a mistake and want to fix it in the same commit). I didn't realize this messes it up while reviewing. Okay, will not force push from now on.

Have made all the fixes mentioned above except the one related to marking the directory as safe, As mentioned, will create an MR for that in Gitlab.

@sidsprasad
Copy link
Contributor Author

@ribtoks created the aforementioned MR over here: https://gitlab.com/ribtoks/tdg/-/merge_requests/11
Once that is merged, will remove that section of code from this PR.

@sidsprasad
Copy link
Contributor Author

@ribtoks could you please create a new tag on tdg so that I can update it here and pull?

@sidsprasad
Copy link
Contributor Author

@ribtoks Thank You! Just pushed all the changes here now. Ready for a final review.

@ribtoks ribtoks merged commit 3482957 into ribtoks:master Jan 17, 2024
@ribtoks
Copy link
Owner

ribtoks commented Jan 17, 2024

@sidsprasad Amazing! Thank you for your contribution and navigating the maze of depedencies!

@sidsprasad
Copy link
Contributor Author

@ribtoks and thank you so much for your patience!
Learnt a lot from our interactions and your code reviews.
Will start using this in one of my repositories now!
Look forward to working with you in the future 😄

@sidsprasad sidsprasad deleted the get_author_from_blame branch January 17, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants