Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cherrypicker to prow deployment bundle #7393

Closed

Conversation

jennybuckley
Copy link

Follow up to #7345

Fixes up cherrypicker's build to look more like needs-rebase and adds it to the prow deployment bundle.

Fixes #7332
/area prow

@k8s-ci-robot k8s-ci-robot added area/prow Issues or PRs related to prow size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 23, 2018
@jennybuckley jennybuckley changed the title Add cherrypicker to deployment bundle [WIP] Add cherrypicker to deployment bundle Mar 23, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 23, 2018
@jennybuckley
Copy link
Author

jennybuckley commented Mar 23, 2018

Having issues with prow/bump.sh...

INFO: Running command line: bazel-bin/prow/release-push
E0323 13:56:20.441786  216948 docker_session_.py:335] Error during upload of: gcr.io/k8s-prow/cherrypicker:v20180323-7a14d0f59
F0323 13:56:20.442425  216948 fast_pusher_.py:162] Error publishing gcr.io/k8s-prow/cherrypicker:v20180323-7a14d0f59: response: {'content-length': '57', 'x-google-shellfish-status': 'CA0gBEBG', 'x-google-gfe-response-code-details-trace': 'response_code_set_by_backend', 'x-google-netmon-label': '/bns/ph/borg/ph/bns/cloud-containers-registry/registry.server/5', 'x-xss-protection': '1; mode=block', 'cache-control': 'private', 'alt-svc': 'hq=":443"; ma=2592000; quic=51303432; quic=51303431; quic=51303339; quic=51303335,quic=":443"; ma=2592000; v="42,41,39,35"', 'x-google-gfe-service-trace': 'cloud-containers-registry', 'status': '403', 'transfer-encoding': 'chunked', 'x-google-backends': '/bns/ph/borg/ph/bns/cloud-containers-registry/registry.server/5,pcaj77-v6:443', '-content-encoding': 'gzip', 'date': 'Fri, 23 Mar 2018 20:56:17 GMT', 'x-google-gfe-request-trace': 'pcaj77-v6:443,/bns/ph/borg/ph/bns/cloud-containers-registry/registry.server/5,pcaj77-v6:443', 'x-google-dos-service-trace': 'main:cloud-containers-registry', 'x-google-service': 'cloud-containers-registry', 'server': 'Docker Registry', 'x-google-gfe-response-body-transformations': 'chunked', 'docker-distribution-api-version': 'registry/2.0', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json'}
Access denied.: None

@BenTheElder
Copy link
Member

prow/bump.sh publishes images etc., please don't do that in this PR. we'll bump prow in a different PR.

@jennybuckley
Copy link
Author

jennybuckley commented Mar 23, 2018

@BenTheElder
Okay, but how can I write the deployment.yaml without knowing what the image will be tagged as?

@BenTheElder
Copy link
Member

@jennybuckley you can put any image tag in the initial deployment.yaml, the script will update the tag.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 23, 2018
@jennybuckley jennybuckley changed the title [WIP] Add cherrypicker to deployment bundle [WIP] Add cherrypicker to prow deployment bundle Mar 23, 2018
@jennybuckley jennybuckley changed the title [WIP] Add cherrypicker to prow deployment bundle Add cherrypicker to prow deployment bundle Mar 23, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 23, 2018
secretName: hmac-token
- name: oauth
secret:
secretName: oauth-token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using a bot different from k8s-ci-robot. All it needs is org membership in order for its PRs to get automatically tested but other than that it shouldn't need any write permissions in repos.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fejta-bot perhaps? 🙃
thoughts @fejta?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do decide to use that bot do we just replace "oauth-token" with "fejta-bot-token" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we need a new bot / token

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fejta-bot sgtm assuming it needs no permissions to anything aside from a github account

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to set up a new token for it, or is there an existing one?

I can't find any mention of fejta-bot in this repo, aside from a few comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @fejta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search prow/config.yaml for prior art:

test-infra/prow/config.yaml

Lines 13705 to 13716 in 5b2902d

- --token-file=/etc/token/bot-github-token
- --test-owners-csv=/test_owners.csv
- --triage-window=1
- --triage-count=10
- --flakyjob-count=3
volumeMounts:
- name: token
mountPath: /etc/token
volumes:
- name: token
secret:
secretName: fejta-bot-token

@spiffxp
Copy link
Member

spiffxp commented Apr 2, 2018

If we're enabling this for k/k, is it going to do battle with the cherrypick mungegithub instance and related mungers?

sig-contribex is attempting to better document how the mungegithub-based cherry picker works, are there docs for how this works? kubernetes/community#1980

@cjwagner
Copy link
Member

cjwagner commented Apr 2, 2018

@spiffxp The cherrypicker Prow external plugin doesn't overlap with the cherrypick mungers AFAIK. The plugin just provides a command to automatically create a cherrypick PR on a different branch from the current PR. The mungers don't actually create cherrypick PRs, they just control process after the cherrypick PR exists.

@spiffxp
Copy link
Member

spiffxp commented Apr 2, 2018

@cjwagner ok, I hadn't really paged in the full cherrypick process, now that I've reviewed that PR, it looks like this is sort of a replacement for hack/cherry_pick_pull.sh in kubernetes?

@cjwagner
Copy link
Member

cjwagner commented Apr 3, 2018

@spiffxp Yeah, that seems accurate. It looks like the hack/cherry_pick_pull.sh script can do some stuff related to generated docs as well, but the primary purpose of both is to create a new cherrypick PR from an existing PR.

@spiffxp
Copy link
Member

spiffxp commented May 9, 2018

What remains to be done here? It'd be good to have this in and well understood prior to 1.11.x code freeze, so we could at least tell people to use a /cherrypick command instead of an arcane bash script

@stevekuznetsov
Copy link
Contributor

We have @openshift-cherrypick-robot for this as the bot needs fewer rights than the others, unclear if you guys want to do similarly.

@fejta
Copy link
Contributor

fejta commented Jun 8, 2018

/assign @fejta @BenTheElder

What remains to be done here?

☝️ Very cool, let's get this in (is this still alive?)

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 6, 2018
@spiffxp
Copy link
Member

spiffxp commented Aug 10, 2018

/retest
/remove-lifecycle rotten

Are there reasons we're holding off on this?

/cc @cblecker
I think you had concerns with this approach

I'm interested in this as a path to reduce the complexity of cherry picking (see issues in #1795)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 10, 2018
@k8s-ci-robot k8s-ci-robot requested a review from cblecker August 10, 2018 16:49
@cblecker
Copy link
Member

Yeah, I feel ways about having bots open PRs:

  • difficult/impossible to rebase with conflicts (unprivileged users wouldn't have access to branches)
  • creates a situation where a single person can generate a PR, LGTM it, approve it, and have it merge (against our principals of code review)

@Kargakis @stevekuznetsov How do you feel about/handle these in openshift? I'm really not sure this is a workflow we want in Kubernetes.

@stevekuznetsov
Copy link
Contributor

difficult/impossible to rebase with conflicts (unprivileged users wouldn't have access to branches)

How long do cherry-picks stay alive? We don't create the PR if it cannot be made cleanly in the first place and have not seen cases where a cherrypick lands first and ruins the possibility of another open cherrypick. In those cases, drop back to the normal workflows we all know and love. In the happy case, just use the bots.

creates a situation where a single person can generate a PR, LGTM it, approve it, and have it merge (against our principals of code review)

If the PR already merged into master clearly the code is good, so I don't really see why a clean cherry-pick shouldn't start with LGTM, even, or why the author should not be allowed to add it. Plus, in k/k won't you need the cherry-pick to be approved before tide merges anything anyway, thereby not allowing one dev to make this series of changes alone?

@guineveresaenger
Copy link
Contributor

Please excuse my butting in!
We talked about cherrypicks this morning on the 1.12 release team meeting and we all agree that the process needs to be simplified and policies need to be clarified in some cases.

I have further improvements to the cherrypick docs sitting in kubernetes/community#2408 and have also (somewhat successfully but laboriously) shepherded a coworker's cherrypicks to completion.

With those experiences in mind, I for one would love to see a /cherrypick bot command.

All cherrypick PRs are subject to final approval (the manual adding of cherrypick-approved) by the branch manager, so there is no danger of a single person sneaking in a cherrypick sight unseen. Unless that person is themselves the branch manager? We'd need to only choose branch managers who do not have unrestricted write access to k/k to avoid this...it seems a bit farfetched to worry.

@cblecker I don't entirely understand the concern about rebase conflicts and unprivileged users.

@cblecker
Copy link
Member

Improving the cherrypick process.. 100% all for it! Still have major, major concerns about the bot opening a PR route.

The code review concern, yes, is mitigated by the requirement of the branch manager's approval. I hadn't thought about that, and that makes me much more comfortable on that point.

Let me try to explain further on the rebase concern..

Release branches are all point in time snapshots of what was on master. As time goes on, with some commits cherry picked back and others not, the code will drift. If I make a PR to master right now, there's a okay chance it will pick back to 1.11 without a merge conflict. The same can't be said for 1.10 and 1.9.

Even if it does merge cleanly, there may be tests or other parts of the code that break because of the cherry picked code.

The cherry picks can also be held for awhile between generation and merge, such as when a branch manager is preparing for a branch cut, or because there is a security fix pending.

If there is any modifications required (merge conflict on PR generation, rebase needed after PR generation, or test failures requiring code changes), then a bot generated PR is basically a waste and needs to be discarded and recreated using the cherry pick pull script. These aren't hypotheticals.. these are actual situations I see all the time in k/k.

To summarize: Having a bot generate a PR may work a small percentage of the time and make the process super quick.. but I think a majority of the time one of the above scenarios will impact the PR, and then they have to fall back to the alternative process anyways. I'd rather improve and work on the alternative process, making it the one true process that is the best we can make it, then have two different paths.. especially when one of those paths may require discarding entire PRs because of circumstances outside of the contributor's control.

@0xmichalis
Copy link
Contributor

If there is any modifications required (merge conflict on PR generation, rebase needed after PR generation, or test failures requiring code changes), then a bot generated PR is basically a waste and needs to be discarded and recreated using the cherry pick pull script. These aren't hypotheticals.. these are actual situations I see all the time in k/k.

This is mostly not true with the cherrypick bot, you can push changes to open PRs if you will but that should be very rare. Only if there is a conflict in the first place, the bot will not open the PR and inform with a message about the conflict.

To summarize: Having a bot generate a PR may work a small percentage of the time and make the process super quick.. but I think a majority of the time one of the above scenarios will impact the PR, and then they have to fall back to the alternative process anyways. I'd rather improve and work on the alternative process, making it the one true process that is the best we can make it, then have two different paths.. especially when one of those paths may require discarding entire PRs because of circumstances outside of the contributor's control.

We actually see the opposite in openshift repos: https://github.com/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3Aopenshift-cherrypick-robot+archived%3Afalse+is%3Aclosed+

@nikhita
Copy link
Member

nikhita commented Aug 29, 2018

Some points/questions:

More than half of the cherry-picks I have opened in k/k have had conflicts. If the cherrypick bot fails with a message that it can't open a PR and wants me to use the script, it sgtm to me.

If there is any modifications required (merge conflict on PR generation, rebase needed after PR generation, or test failures requiring code changes), then a bot generated PR is basically a waste and needs to be discarded and recreated using the cherry pick pull script. These aren't hypotheticals.. these are actual situations I see all the time in k/k.

Have seen this too. There are cases where cherry-pick PRs either fail tests due to changed code or a discussion arises on the cherry-pick PR which ends up with removing some part in the cherry-picked PR and selecting only some code out of it.

Basically, with the bot it means that all cherry-picks created without conflicts can only have two future states: they are either ready to be merged or to be closed. This would make "reviews" on a cherry-pick PR redundant. Are we ok about that? 🤔

To be clear, I am all in for a bot but just want to understand the process and semantics better. :)

I'd rather improve and work on the alternative process, making it the one true process that is the best we can make it, then have two different paths.. especially when one of those paths may require discarding entire PRs because of circumstances outside of the contributor's control.

Personally, the script has always been reliable. Though I agree that the cherrypick process should be made easier. Is there an issue with the list of problems or suggestions for UX improvements for the script? Even if we decide to choose a bot-focused way, we should definitely fix those problems.

@nikhita
Copy link
Member

nikhita commented Aug 29, 2018

/sig release

@k8s-ci-robot k8s-ci-robot added the sig/release Categorizes an issue or PR as relevant to SIG Release. label Aug 29, 2018
@nikhita
Copy link
Member

nikhita commented Aug 29, 2018

creates a situation where a single person can generate a PR, LGTM it, approve it, and have it merge (against our principals of code review)

Also, I am not sure if this is really a concern but this allows someone listed in OWNERS of the respective change to create a cherry-pick PR and lgtm+approve it.

It's not always necessary that the branch manager will be familiar with the nitty-gritties of the change and may cherrypick-approve the PR if it looks good as a whole and has the lgtm+approved labels. This does not catch scenarios where the change might look good as a whole but have unintended side effects of being merged into a release branch.

If the approver had the power to "approve" the PR it means that they should be familiar with the change and understand the unintended side effects but this also essentially means that an approver listed in the OWNERS file is allowed to self-lgtm their PRs. Again, are we ok with that? 🤔

There are examples of PRs with cherrypick-approved which have been closed later by an OWNER in k/k today.

@cblecker
Copy link
Member

cblecker commented Aug 29, 2018

This is mostly not true with the cherrypick bot, you can push changes to open PRs if you will

@Kargakis How would you do this without write access?

@0xmichalis
Copy link
Contributor

0xmichalis commented Aug 29, 2018 via email

@stevekuznetsov
Copy link
Contributor

@cblecker your comments are 100% valid, but I guess if we just look at the query @Kargakis posted we see 2500+ (!!!!!) PRs that cherry-picked and merged cleanly with the bot only. I don't think cherry-picks to OpenShift release branches are necessarily less complicated than those to k8s release branches. I'm not sure that the fact that automatic cherry-picks might fail in some cases should be an argument against letting the flow help where it can, best-effort.

@cblecker
Copy link
Member

@stevekuznetsov Can you compare the workflow of openshift/openshift-docs vs openshift/origin or k/k? Of the ~2500 PRs there, ~1800 of them are in openshift/openshift-docs. It looks like there is only travis CI there, and the PRs are merged by humans. There are only 81 PRs in that query that are actually for openshift/origin, but again, I know very little about the openshift workflow.

@stevekuznetsov
Copy link
Contributor

Even without the docs repo -- which does have a simplistic test workflow -- we still have 700+ PRs made to more complex repositories where code is quickly changing and release branches are maintained long-term. We do more cherry-picks to a different repository for our long-term releases for OpenShift which unfortunately are not public.

Is there some way that you would want to quantify the cost (in a potentially more complex workflow if the automated /cherry-pick doesn't work) versus the benefit (for any best-effort successes)?

@cblecker
Copy link
Member

cblecker commented Aug 29, 2018

That's what I'm kind of thinking about right now. And there's the two different perspectives.. prow as a software product, vs how we use it in kubernetes. I think the ~2500 number absolutely justifies it as a part of prow the software product.

I'm just.. grappling with what I see in k/k (which currently has over 50 open cherry picks dating as far back as April -- https://github.com/kubernetes/kubernetes/pulls?q=is%3Apr+is%3Aopen+-base%3Amaster) and what kind of contributor experience people are having. I just see this potentially causing a whole lot more headache for both contributors and reviewers then the current system we have.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 27, 2018
@fejta
Copy link
Contributor

fejta commented Nov 28, 2018

Is this still alive? Please rebase and/or close

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 28, 2018
@fejta fejta closed this Jan 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow Issues or PRs related to prow cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/release Categorizes an issue or PR as relevant to SIG Release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.