-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VERSION: Changing master to v3.2.0 #4401
Conversation
There are currently no known binary incompatible changes on master that would require a first digit change. Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
I think the intermittent cray connectivity issue struck again. Bad Jenkins. bot:ompi:retest |
bot:ompi:retest Not sure why Jenkins got angry there... |
Not sure what's going on with the pull request checker; the build was successful. It looks like the API call from Jenkins to GitHub just didn't set the status properly. @gpaulsen, please go ahead and merge this. |
bot:retest |
I'd love to get this into master soon before we forget that our intent is the release a v3.2 and not a v4.0 |
are we sure that the next version will be 3.2 instead of 4.0? I believe we wanted to remove the mxm mtl, since no one is supporting or testing it, but that would require bumping the version to 4.0. We should have removed for 3.0, but apparently I sucked. So we can make it 3.2 and then bump it to 4 if we do the removal, but I'm not sure how that plays in your internal release. |
Both @jsquyres and I seem to recall that we needed to go to 4.0 next time. I don't believe we can do a 3.2. |
Well we should have some way to tell users if the API has changed, and if they need to recompile, versus if we just no longer support a certain interconnect. IBM very much wants to maintain "forwards compatibility" for users compiled applications (On our platforms) but are also for culling older interconnect support that is no longer tested or used. If we can distinguish between the two, we can help ensure the former while not precluding the later. |
I think what we need is a better understanding of why IBM seems so concerned about staying at 3.x instead of moving to 4.0. I get maintaining ABI for SpectrumMPI users, but that surely is just a corporate decision on what level to base your code on, and not a general OMPI community issue. Are there things in master that you want in a 3.x release, but aren't scheduled for 3.1 inclusion? If so, why doesn't IBM just backport them in SpectrumMPI? We know you are maintaining your own patches (bug fix and feature) - isn't this just another set? |
I believe this is an issue for Open MPI community as a whole. Every time the mpi library so name changes, it requires end users to relink (probably recompile and relink to be safe) their applications, which in turn generally requires a re-validation of their entire software stack. In production environments, even if it's trivial to rebuild/relink an end user's application, policies can require days or weeks of validation after a rebuild. Policies usually allow for minor version updates which include changes that don't change .so version numbers, thereby allowing customers to upgrade to the next minor version at a much lower cost (in terms of validation testing). Therefore I'm advocating a "lets not rev the major version numbers (or the user mpi library .so versions) unless its absolutely needed and planned for" strategy. Even in cases where we thought we needed to break ABI, we've found creative solutions to prevent that breakage or delay the ABI break until a planned for release. |
Now I am further confused. We specifically did deliberately plan to make a 4.0 break in 2018 - it was openly discussed in the last devel meeting. The feeling was that enough changes were occurring to justify it, and that the 3.x series was completing its life with the 3.1.x releases (which are expected to continue throughout 2018). Since many of us come from the national lab environment, we fully grok the validation issue, though I think you overstate it here 😄 Production codes never picked up the latest x.0 release right away, but stay back one from there. So in this case, the labs will likely stay in the 3.0 series for at least 2018, and then move to 3.1 in 2019, going to 4.x (x > 0) in 2019/2020. Note that we (at least while I was there) always posted the newer releases so those wanting/needing access to the new features could use them. So what precisely is your point of concern? You were one of the orgs pushing for a time-based release schedule - why is 2+ years of 3.x not adequate? Why would we want to distort the code base with workarounds simply to avoid a major release? And why is the lab's strategy not adequate for the customers you are concerned about? |
I think I agree with everyone on this thread, which means my head has exploded :). A couple of notes / thoughts...
So I'm not sure what we want to do with the release that follows 3.1.x. It seems like we're going around and around here; perhaps we should bring this topic up at the next telecon and see if we can make more progress there? |
Yeah, I think that makes sense. IIRC, the rationale here was that we planned to remove some things (e.g., the sm btl, mxm mtl) and have new options. One could argue that these could be delayed, but I'm trying to understand why as the historical way of dealing with these version changes has seemed adequate and acceptable. |
I was still out of the office yesterday; I don't know if you had the Tuesday webex this week or not to discuss this stuff. Here's my $0.02:
Meaning: as @rhc54 pointed out, if we remove those components and/or change CLI/MCA params, then the next series needs to be 4.0.x. If we delay all those things (and no other backwards-incompatible changes occur relative to v3.0.x and v3.1.x), then the next series needs to be v3.2.x. |
We did not meet this week, so this will get discussed next week (and likely run into the devel meeting before getting resolved). We all are in violent agreement over what you said. The issue is whether or not there should be a backwards-incompatible release in the first half of 2018. I think IBM is advocating for "no", but I still fail to grok the reasoning behind that request. |
@bwbarrett suggests that it's possible to rev to a new major version to incorporate backwards incompatible changes (like mpirun command line changes, or removal of components), but to NOT rev the user lib .so versions. This would support pre-built MPI apps, and more accurately describe that the change in Open MPI did not affect our ABI. It seems a somewhat confusing message, but perhaps this is a solution. As @jsquyres said: If we do not make backwards-incompatible changes, we (really really) should not change the major version. But how strong should that "really really" be? The beauty of Open MPI's component architecture is that there is a lot of flexibility to change the internals of Open MPI without affecting the layer above. |
Again, I "really, really" want to understand what problem you are trying to solve. The user community has had a way of dealing with this that was considered acceptable and adequate for nearly 14 years. What precisely is the issue now driving us to modify our methods? People argued (rather loudly) that our feature/stable release methods should be replaced by time-based releases, and that we would let the major version indicate breaks in compatibility. This was defined as broader than what is now being suggested - specifically, it included changes in command line options and behavioral mods that would be apparent to a user. Revving the library is a totally different question - there are strict libtool rules that govern those versions, and they have absolutely nothing to do with the release versioning. So I don't understand why this conversation is even bringing those into the thread. |
We discussed at our weekly meeting: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20180109 Decision was to keep master/next release at v4.0, but not break .so versioning unless audit determines that it's needed on a library by library basis. |
There are currently no known binary incompatible changes
on master that would require a first digit change.
Signed-off-by: Geoffrey Paulsen gpaulsen@us.ibm.com