Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration strategy going forward #662

Open
h-vetinari opened this issue Jan 13, 2023 · 46 comments
Open

Migration strategy going forward #662

h-vetinari opened this issue Jan 13, 2023 · 46 comments

Comments

@h-vetinari
Copy link
Member

h-vetinari commented Jan 13, 2023

Hey all

It seems that pushing aws-sdk-cpp to 1.9.x (after being stuck on 1.8 for ages) did not cause the world to go up in flames for arrow1, which is great. 🥳

But now the world has moved on to 1.10, and upstream arrow is also thinking about upgrading to that.

Since pyarrow feedstock CI is extremely costly (4 maintenance branches which need several manual restarts due to timeouts), we cannot migrate this feedstock all the time, there are way too many versions for that.

But I think we could do something every 20, 25 or 50 patch versions. I'd like to then create a respective abi_migration_branch for that version here (like I did for 1.9), so that this version will keep getting updates from the rest of the aws-c-* ecosystem while we hold it stable for a while.

Since we've now reached aws-sdk-cpp 1.10.50, I thought it's perhaps a good time to discuss this.

CC @conda-forge/aws-sdk-cpp @conda-forge/arrow-cpp

Also tagging @conda-forge/core, because the pinning repo is full of PRs for aws_sdk_cpp versions bumps, and I guess we should agree on strategy / frequency.

Footnotes

  1. effectively the only ABI-affected consumer in conda-forge

@kkraus14
Copy link

Is there any ABI/API stability guarantees of the aws-sdk-cpp package? Should we be migrating on minor versions as opposed to patch versions?

@beckermr
Copy link
Member

I have been thinking on this problem a while and I actually think the right answer is to specify the fraction of versions you want and then randomly choose whether or not to ship a given version with that fraction.

Version numbers are highly non-random. For example, a 2.0.0 release will typically be more buggy than a 2.0.1 release. If we ship every N we could systematically miss some versions over others depending on N. A random fraction will not have this issue.

@h-vetinari
Copy link
Member Author

h-vetinari commented Jan 13, 2023

Is there any ABI/API stability guarantees of the aws-sdk-cpp package? Should we be migrating on minor versions as opposed to patch versions?

I think this question needs to be addressed mostly to @xhochy, who did much (all?) of the initial heavy lifting for this stack. I think AWS' standpoint is they do whatever they want with their components (ABI wise), and as a user you just use the (bundled) sdk with a specific version. Don't hold me to that though...

I have been thinking on this problem a while and I actually think the right answer is to specify the fraction of versions you want and then randomly choose whether or not to ship a given version with that fraction.

Not sure I agree with "randomly", but I could live with "fraction of versions".

Version numbers are highly non-random. For example, a 2.0.0 release will typically be more buggy than a 2.0.1 release. If we ship every N we could systematically miss some versions over others depending on N. A random fraction will not have this issue.

Agreed, though I'm not proposing to fully automate this away (which would be hard anyway, with branch creation etc.). AFAIK there aren't many other libraries that (potentially) break ABI in patch releases and release really really frequently. My proposal would be to (for example) do migrations for:

  • 1.9.379 (last release of the 1.9 series)
  • 1.10.50
  • 1.10.100
  • etc.
  • 1.10.x (last release of the 1.10 series)
  • 1.11.50

Aside from the "last release in a series", this would also be an easily implementable filter for the bot opening PRs to the pinning repo, e.g.

# special-casing for aws-sdk-cpp due to high volume of ABI-relevant releases
major, minor, patch = version.split(".")
if int(patch) % 50 == 0:
     # create PR for pinning repo with aws_sdk_cpp migrator for `version`

@pkgw
Copy link

pkgw commented Jan 13, 2023

Is there a way to make it some kind of admin-request to issue a batch of migrations, or even just have a script that does so semi-manually? It sounds to me like there needs to be an element of human judgment of deciding if/when to migrate, and in that case I think the goal should be to keep a human-in-the-loop for the decision but then automate the execution.

@beckermr
Copy link
Member

We issue PRs to the pinnings repo and then a human decides on whether or not to merge them. So there is definitely always a human in the loop. The issue here is that there are a lot of patch releases with ABI breaks and so we get a lot of PRs to decide on.

@pkgw
Copy link

pkgw commented Jan 13, 2023

Yeah, the noise from those automatic PRs is definitely noticeable. So I guess I'm wondering if there's a clean way to move the human-in-the-loop earlier, without having to adopt a heuristic like "every 50th" or "every 2%".

@beckermr
Copy link
Member

Right. We can turn the automatic PRs off and simply let humans make them.

@h-vetinari
Copy link
Member Author

So I guess I'm wondering if there's a clean way to move the human-in-the-loop earlier, without having to adopt a heuristic like "every 50th" or "every 2%".

To be honest, I help keep the lights on here, but I couldn't tell you why to choose any patch version over another. It's a blind guess already, so might as well make it a memorable number (IMO).

@beckermr
Copy link
Member

We can add both to the bot. I took a quick look this morning. The right place to add the code is here: https://github.com/regro/cf-scripts/blob/33554e668c4bf27ed70b42757d81e89d7ed6c2d5/conda_forge_tick/migrators/version.py#L477

The code needs to pull some key from the conda forge yml config file (to be defined) that has the every N or random fraction or w/e, test the new version against that key, give True if the version should be skipped, give False otherwise, and then "or" that boolean with the final return statement.

@h-vetinari
Copy link
Member Author

Ah, looks like we have a new minor version, so now we can definitely migrate the last 1.10.x. :)

@xhochy
Copy link
Member

xhochy commented Jan 26, 2023

Is there any ABI/API stability guarantees of the aws-sdk-cpp package? Should we be migrating on minor versions as opposed to patch versions?

They just do what they want 🤷

So I guess I'm wondering if there's a clean way to move the human-in-the-loop earlier, without having to adopt a heuristic like "every 50th" or "every 2%".

To be honest, I help keep the lights on here, but I couldn't tell you why to choose any patch version over another. It's a blind guess already, so might as well make it a memorable number (IMO).

It is a similar blind guess on my side. I would be happy with every release where the patch version is dividable by a number. For the aws-c-* packages, I would prefer on the other side if we could move to an auto-migrations-merge workflow. Migrating them instantly would probably save us a lot of the conflicts that happen but auto-auto-auto-… migrations have gotten quite some pushback in the past in core meeting and thus I refrained until now from pursing that path any further.

@h-vetinari
Copy link
Member Author

For the aws-c-* packages, I would prefer on the other side if we could move to an auto-migrations-merge workflow

For some context, the aws-sdk-cpp packages ABI-depends on a lot of these aws-c-* libraries (which @xhochy has disentangled), and (at least based on the compilation times) adds a substantial amount of code on top.

However, these aws-c-* migrators have been completely on autopilot (except where there were race-conditions between migrators), and most importantly: they do not cause rebuilds of packages depending on aws-sdk-cpp.

This is how things worked all throughout the 1.9.x series here - the ABI branch for that was continuously migrated for newer aws-c-* libs, and I haven't heard of any issues caused by those rebuilds (which will get pulled into consumers environments without rebuilding arrow).

Where this becomes relevant is that the aws-c-* libraries also add to the volume of migrators, and in order to not let them collide with each other, it's either necessary to chain them using wait_for_migrators:, or presumably, as @xhochy suggests, just let them run automatically.

@jakirkham asked me in conda-forge/conda-forge-pinning-feedstock#3991 to consolidate this discussion here.

Final point: As of 1.10, arrow apparently also has a dependency on aws-crt-cpp (so nominally another trigger for arrow-rebuilds), but there the situation is broadly the same - we don't need to migrate it nearly as often as the component libs.

@jaimergp
Copy link
Member

I think the next step is to add the changes suggested by @beckermr in #662 (comment). Then the maintainer(s) can decide how often migrator PRs should be created.

@beckermr
Copy link
Member

I forget what we discussed. Did we make an issue with the results?

@jaimergp
Copy link
Member

It feels like this one here is the issue :D

@beckermr
Copy link
Member

Ok. What did we decide should be implemented in the bot.

@jaimergp
Copy link
Member

From your comment above:

The right place to add the code is here: https://github.com/regro/cf-scripts/blob/33554e668c4bf27ed70b42757d81e89d7ed6c2d5/conda_forge_tick/migrators/version.py#L477

The code needs to pull some key from the conda forge yml config file (to be defined) that has the every N or random fraction or w/e, test the new version against that key, give True if the version should be skipped, give False otherwise, and then "or" that boolean with the final return statement.

@beckermr
Copy link
Member

FYI: regro/cf-scripts#1657

@beckermr
Copy link
Member

I went with a random fraction for now since that was easy.

@beckermr
Copy link
Member

The syntax in the conda-forge.yml is

bot:
  version_updates:
    random_fraction_to_keep: 0.1  # keeps 10% of versions at random

@h-vetinari
Copy link
Member Author

I went with a random fraction for now since that was easy.

Fraction relative to what? What's 100%? How do (or might) those 100% change over time? I'm glad we got something, but I still think a fixed only_every_xth_{minor,patch}_version=... would be much more predictable/desirable.

@beckermr
Copy link
Member

Relative to 100%. The every nth is a pain to implement and so I did what I had time for.

@h-vetinari
Copy link
Member Author

h-vetinari commented Apr 19, 2023

OK, I still didn't get it, so I looked at the implementation. I had naïvely thought you made a relative ratio, but it's effectively a random draw whether a number in [0,1] (seeded by the package version) is below a cut-off.

It means we'll be getting roughly (not exactly) 10x less migrators if that fraction is 0.1, and it has nothing to do with major/minor/patch levels of the version, meaning we'll get a selection of version numbers with no pattern.

FWIW I think that's an OK trade-off. aws-sdk-cpp could probably use a ratio of 1/40 or 1/50.

@beckermr
Copy link
Member

One other feature here is that the decision is stable with respect to the input version. This means that if there are feedstocks with correlated versions, we should be able to specify the same random fraction for each and as long as we start at the same version, they should all get bumped at once for the same random fraction.

@h-vetinari
Copy link
Member Author

I think I may have misunderstood the mechanism initially (aside from the whole relative-to-what thing). I though we would still build all versions here, and just not open migrators for them, but IIUC, it would just mean we're not opening version PRs on this repo anymore.

As such, I'm going to change my estimate from above...

FWIW I think that's an OK trade-off. aws-sdk-cpp could probably use a ratio of 1/40 or 1/50.

to that we should build 1 in 10 at random. That will reduce the migrators on the pinning repo 10-fold, but we won't be flying blind for quite as long w.r.t. to what versions are coming in (and we still have a separate choice of whether to merge any given migration PR).

I opened a PR to do so: #763

@beckermr
Copy link
Member

I think you had it right. This rate limits version prs. That should automatically rate limit migrations too.

@h-vetinari
Copy link
Member Author

@beckermr, I think we have to consider the possibility that the rate limit mechanism is broken. This feedstock is at 1.11.68, while upstream is at 1.11.116. In the almost 50 new versions since then, the 1-in-10 from #763 should have realistically opened at least one PR with probability 99.36% (== (1 - (1 - 0.1) ** 48) * 100).

@beckermr
Copy link
Member

Hmmmmm ok. I'll take a look later.

@beckermr
Copy link
Member

Ouch. I think I see the bugs. Idk what I was thinking when I coded this up.

@beckermr
Copy link
Member

OK I put in a fix. Let's see what happens!

@h-vetinari
Copy link
Member Author

@beckermr, meanwhile there have been another 20 upstream releases, and unfortunately still no new PR to the pinning repo.

@beckermr
Copy link
Member

beckermr commented Aug 8, 2023

Oh no. Will look again. Sorry.

@beckermr
Copy link
Member

beckermr commented Aug 8, 2023

I think I found it. Will push some changes to spots and we can wait. Sorry again.

@beckermr
Copy link
Member

beckermr commented Aug 8, 2023

OK the bug here was that the code was pulling the next version in numerical order, not the latest git tag. I added a custom hook to force the bot to only look at github for versions. Hopefully this does the trick!

@h-vetinari
Copy link
Member Author

Thanks a lot Matt! Finger's crossed that third time's the charm!

@h-vetinari
Copy link
Member Author

Another 13 releases (up to 1.11.149 now) without a PR... Perhaps we should increase the probability to 0.99 or something until we figured out a way for this to work? 🤔

@beckermr
Copy link
Member

I've been watching the log statements and it appears to be working correctly. Happy to increase to be sure. That'll be very helpful!

@h-vetinari
Copy link
Member Author

SUCCESS! 🥳

The bot just opened a new PR: #795

Thanks a lot for the help & iterations here @beckermr! ❤️

(I had bumped the fraction from 0.1 to 0.99 to accelerate the validation process; that means we can turn it down again now...)

@beckermr
Copy link
Member

beckermr commented Sep 4, 2023

That's great!

@h-vetinari
Copy link
Member Author

Now we seem to have the opposite problem, as in: the bot is opening too many rather than too few PRs. Since I switched the ration back to 0.1 in #795, the bot has opened #796 & #799, which is a bit unrealistic for a 1-in-10 chance per draw.

@beckermr
Copy link
Member

beckermr commented Sep 8, 2023

The bot takes time to register changes in the parameters. Give it a chance and see what happens.

@beckermr
Copy link
Member

Two more release came by and no PRs. Sometimes when doing things at random, you get two in a row by chance. I think we are good to close this issue.

@h-vetinari
Copy link
Member Author

Yeah, I removed the automerge for such coincidences. But basically this looks good now - thanks so much for your work on this!

@jakirkham
Copy link
Member

Thanks all! 🙏

@h-vetinari
Copy link
Member Author

@beckermr, it seems that something broke the random_fraction_to_keep logic. We've now had consecutive PRs for 1.11.{193,194,195}, and the last one was immediately reopened after a conflict with something else arose.

Since all this has a less than 1-in-1000 probability with our current fraction (0.1), I'm pretty confident it's broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants