Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tide should merge elafros PRs automatically #543

Closed
krzyzacy opened this issue Mar 29, 2018 · 32 comments
Closed

tide should merge elafros PRs automatically #543

krzyzacy opened this issue Mar 29, 2018 · 32 comments
Assignees
Labels
area/test-and-release It flags unit/e2e/conformance/perf test issues for product features

Comments

@krzyzacy
Copy link

one loop of log:

{"component":"tide","controller":"sync","level":"info","msg":"Building tide pool.","time":"2018-03-29T23:55:19Z"}
{"component":"tide","controller":"sync","level":"info","msg":"Search for query \"is:pr state:open repo:\"GoogleCloudPlatform/k8s-tpu-operator\" repo:\"elafros/build\" repo:\"elafros/elafros\" label:\"lgtm\" label:\"approved\" -label:\"do-not-merge/hold\" -label:\"do-not-merge/work-in-progress\"\" cost 2 point(s). 4966 remaining.","time":"2018-03-29T23:55:19Z"}
{"client":"github","component":"tide","level":"info","msg":"GetRef(elafros, elafros, heads/master)","time":"2018-03-29T23:55:20Z"}
{"base-sha":"5ef09e17d72d5783adc0f4fd0869e93df33974b4","branch":"master","component":"tide","controller":"sync","level":"info","msg":"Syncing subpool: 3 PRs, 16 PJs.","org":"elafros","repo":"elafros","time":"2018-03-29T23:55:20Z"}
{"base-sha":"5ef09e17d72d5783adc0f4fd0869e93df33974b4","batch-passing":null,"batch-pending":null,"branch":"master","component":"tide","controller":"sync","level":"info","msg":"Subpool accumulated.","org":"elafros","prs-missing":[528],"prs-passing":[536,485],"prs-pending":null,"repo":"elafros","time":"2018-03-29T23:55:20Z"}
{"client":"github","component":"tide","level":"info","msg":"Merge(elafros, elafros, 485, {  062d66dc5c441a466248f4b79ef8631bd18ad0b6 squash})","time":"2018-03-29T23:55:20Z"}
{"base-sha":"5ef09e17d72d5783adc0f4fd0869e93df33974b4","branch":"master","component":"tide","controller":"sync","error":"Waiting on code owner review from elafros/elafros-writers.","level":"error","msg":"Merge failed: PR is unmergable. How did it pass tests?!","org":"elafros","pr":485,"repo":"elafros","sha":"062d66dc5c441a466248f4b79ef8631bd18ad0b6","time":"2018-03-29T23:55:20Z"}
{"action":"MERGE","base-sha":"5ef09e17d72d5783adc0f4fd0869e93df33974b4","branch":"master","component":"tide","controller":"sync","level":"info","msg":"Subpool synced.","org":"elafros","repo":"elafros","targets":[485],"time":"2018-03-29T23:55:20Z"}

while #485 is because github context?
but #528 should be in the pool.. not sure why is that?

/assign
/assign @cjwagner
cc @adrcunha

@krzyzacy
Copy link
Author

and looks like https://prow-internal.gcpnode.com/tide points to external k8s repos

@mattmoor mattmoor added the area/test-and-release It flags unit/e2e/conformance/perf test issues for product features label Mar 30, 2018
@cjwagner
Copy link

#528 is in the pool, that is why it is referenced in the Subpool accumulated message. It is in the missing category because the tests results that exist are against a previous base branch commit. This is WAI.

#485 is failing to merge with this message Waiting on code owner review from elafros/elafros-writers. That means that the repo has been configured to require Github CODEOWNER reviews which is not something that Tide supports or will support (it is not expressible with a search query).

CODEOWNERS may still be used to request reviews, but Github must not be configured to require reviews from CODEOWNERS in order for Tide to work properly. Someone with admin access to the repo will need to remove this requirement.

@grantr
Copy link
Contributor

grantr commented Mar 30, 2018

Followup on #485: @evankanderson was incorrectly omitted from @elafros/elafros-writers, so his approval was not counted as approval from that team. Upon adding him to the team, the PR became mergeable and was immediately merged by Tide.

@cjwagner
Copy link

@grantr The Github merge requirements for the repo still need to be changed. Tide will get stuck trying to merge a PR if that PR meets all the merge requirements it know about, but is subject to another CODEOWNER review merge requirement that it doesn't know about.

@grantr
Copy link
Contributor

grantr commented Mar 30, 2018

@cjwagner Can you help me understand what happens when Tide gets stuck? In this case, the observed behavior from the PR side seemed correct: the PR was merged as soon as merging was possible.

@cjwagner
Copy link

Tide is 'stuck' in the sense that it continually tries to merge that PR without success.
Tide thinks the PR is mergeable so it tries to merge, but fails every sync loop iteration (~once a minute). When the PR actually became mergeable Tide was able to successfully merge the PR during the next sync loop, but before that time, no other PRs could merge.

So the observed behavior on that PR is appropriate, the problem is that other PRs were blocked while Tide repeatedly tried and failed to merge that PR.

@grantr
Copy link
Contributor

grantr commented Mar 30, 2018

Ah I see, thanks @cjwagner for explaining. That's definitely an issue.

@grantr
Copy link
Contributor

grantr commented Mar 30, 2018

@cjwagner is it possible for the Tide sync loop to skip PRs that are not able to be merged?

@cjwagner
Copy link

As far as Tide is concerned, that PR is mergeable which is why it continues to attempt to merge.

We could keep a cache of PRs that have failed to be merged and ignore them, but that introduces its own problems:

  • We don't know why the PR wasn't mergeable so we don't have any way to know when it becomes mergeable again. This means we wouldn't know when to remove it from the cache of empirically unmergeable PRs.
  • The merge Github client call can fail on PRs that are actually mergeable. This can occur as a result of a network failure or if Github receives too many merge requests in quick succession. Adding these PRs to the set of empirically unmergeable PRs would be wrong because the PRs are actually mergeable.

It is simpler and more fault tolerant for Tide to understand the merge requirements itself instead of relying on empirically tested mergeability.

@grantr
Copy link
Contributor

grantr commented Mar 30, 2018

We don't know why the PR wasn't mergeable so we don't have any way to know when it becomes mergeable again.

Tide appears capable of detecting this particular case though, since the error from the GitHub API states the reason for the lack of mergeability:
"error":"Waiting on code owner review from elafros/elafros-writers."

This seems distinguishable from transient errors like network failures or rate limiting.

@cjwagner
Copy link

We would need to specifically handle each of the different error messages (which are not documented and could be changed) that could occur from failing to merge a PR in order to properly add them to the cache of unmergeable PRs. This is necessary because some Github errors are improperly classified. For example, if PRs are being merged too quickly we receive a 405 error saying "Base branch was modified" even though this isn't a request error.

Even if we did that, Tide doesn't have any way to know when a PR becomes mergeable again after receiving a merge error like this so if we removed it from the pool it would never be considered for merge again.

@mchmarny mchmarny added this to the M4 milestone Apr 3, 2018
@adrcunha
Copy link
Contributor

adrcunha commented Apr 6, 2018

So, looks like there's no action to be taken here?

@cjwagner
Copy link

cjwagner commented Apr 6, 2018

Was the Github CODEOWNERS review requirement disabled?

@bobcatfish
Copy link
Contributor

It looks like tide is currently stuck. It's trying to merge #597 which GitHub considers not ready for merging but Prow does. This is preventing other PRs (e.g. #599) from merging.

image

@adrcunha
Copy link
Contributor

adrcunha commented Apr 7, 2018

Tide is doing what it's supposed to do. This is exactly the CODEOWNERS x OWNERS mixup that is discussed in this thread.

@grantr what's the decision about CODEOWNERS? I see 3 options here:

  1. keep OWNERS and CODEOWNERS, and document the "double approval" requirement;
  2. remove the CODEOWNERS review requirement;
  3. disable Tide and approval/lgtm plugins for Elafros org;

For (1) we can hopefully prioritize the work on feature request 75956445 and have prow use only CODEOWNERS.

@cjwagner
Copy link

cjwagner commented Apr 9, 2018

@adrcunha Prow uses OWNERS files instead of CODEOWNERS because CODEOWNERS files cannot be specified per directory, there is only one CODEOWNERS file per org/repo:branch. For large repos with many contributors it is necessary to split up code ownership config instead of using a single monolithic config file in order to avoid a review bottle neck and support more granular permissions on the ownership config itself.

@adrcunha
Copy link
Contributor

adrcunha commented Apr 9, 2018

Thanks, Cole. Actually my last sentence missed an important scope definition: I actually meant "and have prow use only CODEOWNERS for Elafros". Feature request 75956445 is about adding such feature, not replacing the current OWNERS mechanism (which would be a pretty bad breaking change for existing projects).

@grantr
Copy link
Contributor

grantr commented Apr 9, 2018

@adrcunha it doesn't sound like 1 is really workable without fixing the bug you mention, since any PR with "single approval" blocks the auto-merge of all other "double approval" PRs.

Option 2 seems to actually be "Remove the CODEOWNERS review requirement and remove merge access from humans," since without the review requirement it would be possible to merge unreviewed code. We had a discussion about options 2 and 3 in an internal chat that I'll summarize here:

There is no correctness difference between auto-merge and the merge button as long as GitHub is configured to require PRs to be up to date before merging. This configuration means that whenever master is updated, any outstanding PRs must first merge master before they can be merged (i.e. it must be a fast-forward merge). Merging master into a PR invalidates any existing status checks (same as a push), so those status checks must pass again before the merge button is enabled.

The consequence is that test runtime sets an upper bound on the rate of PR merges: if tests take 5 minutes to run, only one PR can be merged every 5 minutes. In my own experience, this sets up a constant struggle to keep test runtime low. This can be a good thing, since fast tests are easier to run locally with a faster debug loop, but it's a significant time investment as both contributor count and test surface grow.

Tide allows decoupling test runtime from merge rate. When Tide controls all merge decisions, it can use optimizations like only running tests for the PR at the front of the queue, or batching up multiple PRs into a single test run.

My thoughts

Kubernetes, with its massive test surface and extremely large community, requires the Tide decoupling of test runtime from PR merge rate. I doubt Elafros' need for Tide's optimizations will ever be as great, but it might still be convenient to worry less about long test runtimes.

I prefer the GitHub review flow to Prow's because it's better integrated into the GitHub UI, generates less email (fewer command comments), and is more discoverable for new users. I prefer Prow's reviewer suggestions to GitHub's code owners suggestions - GitHub sends an email to every owner which generates a lot of noise.

I prefer the single CODEOWNERS file to the per-directory OWNERS files because the single file offers a complete overview of the project's ownership areas at a glance. As @cjwagner mentions, the overview convenience of the single file is ultimately outweighed by the bottleneck of managing it for a large project like Kubernetes. Plus, the top-level overview could be auto-generated from OWNERS files.

Possible compromise?

My preference would be option 2 with some (hopefully minor) changes to Prow and Tide to adapt to the GitHub review flow. Here's one way to do that, though certainly not the only way.

First, make the Prow robot the only code owner of every file in .github/CODEOWNERS. This makes Prow a required reviewer of every pull request (the CODEOWNERS review requirement is enabled). It could search for pull requests it hasn't reviewed with the query is:pr is:open review-requested: elafros-ci-robot.

Instead of using /lgtm comments, Prow detects the presence of an approving review from an approver (as expressed in OWNERS files). Authors can't approve their own PRs on GitHub, so in practice each PR must be approved by at least one non-author approver (this might be different than the current Prow behavior). Once a PR has completed its review requirements, Prow submits an approving review.

Tide searches for ready-to-merge PRs using the query is:pr reviewed-by:elafros-ci-robot (plus other clauses for things like hold labels). Since the robot only approves, this will return only the open PRs that have been approved as ready to merge by Prow.

@adrcunha
Copy link
Contributor

Sorry, this fell off my radar. Thanks for the detailed consideration, @grantr. My only concern about your proposal is that it might not be doable soon. @cjwagner any comments on that?

@cjwagner
Copy link

cjwagner commented May 8, 2018

Oops this fell off my radar as well. I can't tell exactly what is being suggested, these sentences seem to conflict:

Instead of using /lgtm comments, Prow detects the presence of an approving review from an approver (as expressed in OWNERS files). Authors can't approve their own PRs on GitHub, so in practice each PR must be approved by at least one non-author approver (this might be different than the current Prow behavior).

@grantr Are you suggesting using OWNERS file data about approvers or just requiring an approved review from at least one non-author?

@grantr
Copy link
Contributor

grantr commented May 8, 2018

@cjwagner the former. GitHub allows anyone with access to a repo to review a PR (I think), so the existence of any approving review isn't enough for Prow to consider the PR approved. However, Prow can use the Reviews API to cross-reference the list of approving reviewers with the list of approvers in OWNERS files, just like it does with comment authors today.

The "Authors can't approve their own PRs" sentence is explaining that the effect of the above strategy is that a PR author cannot single-handedly cause Prow to consider a PR approved.

The parenthetical is pointing out that I'm not sure what the current Prow behavior is: if an approver (as defined by an OWNERS file) authors a PR, does Prow consider that PR immediately approved, or does it wait for approval from another person? What qualifications does that person require, i.e. must they be a reviewer or approver in an OWNERS file?

@mattmoor
Copy link
Member

mattmoor commented May 8, 2018

Can we just flip OFF the bit (in Github) that requires review by code owners, and flip ON the bit (in Github) that requires certain presubmits (e.g. "tide")?

I feel like this small change, which we could literally make today, would go a long way towards alleviating some of the common confusion here.

@grantr
Copy link
Contributor

grantr commented May 8, 2018

👍 I think that makes sense given that the code owners feature currently doesn't give us the segmentation (#297) or hierarchy (#811) that we want.

@grantr
Copy link
Contributor

grantr commented May 8, 2018

Cross-posting from #834 (comment) as another example of how GitHub codeowner behavior isn't optimal for us:

Before #811, @josephburnett was the only owner of this file, so GitHub's approval behavior was: the change doesn't need codeowner approval, only reviewer approval.

After #811, this file has multiple owners, which apparently changes the approval behavior to: a codeowner must approve, even if the PR author is also a codeowner.

My hunch is that the behavior we actually want is that codeowner approval is only required from a single codeowner, regardless of authorship. AFAIK this is the behavior Prow implements.

@cjwagner
Copy link

cjwagner commented May 8, 2018

if an approver (as defined by an OWNERS file) authors a PR, does Prow consider that PR immediately approved, or does it wait for approval from another person?

This is configurable via the implicit_self_approve option.

What qualifications does that person require, i.e. must they be a reviewer or approver in an OWNERS file?

The approval process uses the approvers field of OWNERS files. reviewers is used for assigning PR reviews.

My hunch is that the behavior we actually want is that codeowner approval is only required from a single codeowner, regardless of authorship. AFAIK this is the behavior Prow implements.

Prow doesn't implement any behavior related to CODEOWNERS files.

I don't see why the proposed solution of making Prow a CODEOWNER of all files and reacting to PR reviews is better than the normal approval mechanism. Is this all so that approvers can leave an "Approved" Github review instead of typing /approve? If so I think that could be easily solved by optionally allowing "Approved" Github reviews to be treated as /approve commands.

@adrcunha
Copy link
Contributor

adrcunha commented May 8, 2018

If so I think that could be easily solved by optionally allowing "Approved" Github reviews to be
treated as /approve commands.

Does kubernetes/test-infra#6738 solve this?

@cjwagner
Copy link

cjwagner commented May 8, 2018

No, that applies the lgtm label when an "Approved" review is made. I'm suggesting changing the behavior of the approval process, not the LGTM review process. It is a related change though.

@grantr
Copy link
Contributor

grantr commented May 17, 2018

I don't see why the proposed solution of making Prow a CODEOWNER of all files and reacting to PR reviews is better than the normal approval mechanism.

The Prow as CODEOWNER solution decouples Prow automerge (which is essentially a test performance optimization) from Prow approval. That makes it possible to enable the merge button if repository owners so choose.

@grantr
Copy link
Contributor

grantr commented May 17, 2018

Instead of making Prow a codeowner, an equivalent solution that requires only repo config changes is making the Tide status check required. Then we can drop the CODEOWNERS requirement. I still like the idea of Prow reviewing PRs to point out specific issues that are blocking merge, but that seems like a better fit for the upcoming checks API (if it ever becomes available outside the GitHub marketplace).

So I recommend the following config changes for now:

  • Set Tide status check to required
  • Unset "code owner review required"

/cc @elafros/elafros-admin for repo config access.

@cjwagner points out that Tide runs merge and status check update concurrently, so the merge could fail if it's attempted before the status check update completes. In that case the merge will be retried 1 minute later. That seems acceptable to me.

@adrcunha
Copy link
Contributor

CODEOWNERS review were lifted, and checks are now enforced for elafros/build and elafros/elafros, which means:

  • a PR can be merged only when all checks pass (CLA, tests, Tide)
  • Tide check passes only when proper /lgtm and /approve (as specified by OWNERS files) are given
  • at this point (or 1 minute later at most) Tide will automatically merge a PR

@adrcunha adrcunha changed the title tide doesn't merge elafros PR properly? tide should merge elafros PRs automatically May 18, 2018
@adrcunha adrcunha self-assigned this May 18, 2018
@adrcunha
Copy link
Contributor

Everything is now working smoothly, closing this issue, finally! :D Thanks all!

@grantr
Copy link
Contributor

grantr commented May 18, 2018

I'll link here some Prow PRs that should improve our workflow when they're merged:

kubernetes/test-infra#8086 makes an approving GitHub review equivalent to the /approve command (and a request changes review equivalent to /approve cancel). If we also remove the requirement to /lgtm (since the consensus seems to be that it isn't useful for this repo), we can exclusively use GitHub reviews to approve PRs.

kubernetes/test-infra#7742 allows reviewers to be chosen from the approvers list in OWNERS files if the reviewers list is insufficient. This better matches our lack of a separate reviewer role.

skonto pushed a commit to skonto/serving that referenced this issue Nov 27, 2023
Provisioning a volume might take longer than default 120s.

Co-authored-by: Martin Gencur <mgencur@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-and-release It flags unit/e2e/conformance/perf test issues for product features
Projects
None yet
Development

No branches or pull requests

7 participants