Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document some std library evolution policies #18479

Closed
wants to merge 1 commit into from

Conversation

arnetheduck
Copy link
Contributor

In wanting to improve the standard library, it's helpful to have a set
of principles and guidelines to lean back on, that show how to introduce
such improvements while at the same time considering existing users of
the language.

A key idea behind documentation of this sort is that it should highlight
a path forwards in making changes rather than trying to prevent them,
and although the current snippet does include some language for what one
shouldn't do, it also shows a few options for what one can do.

This is a riff on #18468 mainly
based on what helps enable to the use of Nim productively in
environments and teams where the priority and focus is not always on the
tool (Nim in this case) but rather on the codebase itself, and its use
by end users.

We use similar guidance documentation to help coordinate the code of the
teams using Nim in https://status-im.github.io/nim-style-guide/ where it
is applied not as law, but rather as recommendations for the default
approach that should be considered first.

In wanting to improve the standard library, it's helpful to have a set
of principles and guidelines to lean back on, that show how to introduce
such improvements while at the same time considering existing users of
the language.

A key idea behind documentation of this sort is that it should highlight
a path forwards in making changes rather than trying to prevent them,
and although the current snippet does include some language for what one
shouldn't do, it also shows a few options for what one can do.

This is a riff on nim-lang#18468 mainly
based on what helps enable to the use of Nim productively in
environments and teams where the priority and focus is not always on the
tool (Nim in this case) but rather on the codebase itself, and its use
by end users.

We use similar guidance documentation to help coordinate the code of the
teams using Nim in https://status-im.github.io/nim-style-guide/ where it
is applied not as law, but rather as recommendations for the default
approach that should be considered first.
@Varriount
Copy link
Contributor

Varriount commented Jul 13, 2021

When can breaking changes be made? There will inevitably be scenarios where a non-breaking alternative isn't feasible or is extremely undesirable compared to a breaking change.

While I understand that breaking changes are a source of frustration, it's not like they're uncommon in programming languages either. Most (if not all) commonly used languages - from Python to Java - introduce breaking changes in their standard libraries with each language version, excluding patch versions that generally only fix severe bugs in non-breaking ways. (And the point of this paragraph isn't to blindly assert that "oh, these other languages do it, we should do it too", but to say that breaking changes are expected and accepted, at least to a certain degree.)

https://rubyreferences.github.io/rubychanges/2.7.html#standard-library-contents-change
http://cr.openjdk.java.net/~iris/se/10/latestSpec/apidiffs/overview-summary.html
https://docs.python.org/3/whatsnew/3.7.html#changes-in-python-behavior

@arnetheduck
Copy link
Contributor Author

Key to this whole process of maturing Nim is that changes are introduced in such a way that old stuff by and large keeps working - it's not even that complicated to do, if only you adapt the mindset and learn the skills necessary to introduce things in such a way - the "oh but I can't break things any more" kind of comment is mostly a matter of habit and education than anything else - it's a little uncomfortable at first to change because you need to approach problems different than you're used to, but really, it's not that bloody once you've learned it.

from Python to Java

Notable about these examples are that in spite of having massive standard libraries, even compared to Nim, the breaking change lists are fairly small and contained - Java for example even has tooling to identify them which is easy because the language itself is a lot less complex - as a consequence the rules governing what is a breaking change are easier as well, and you can see the API report detailing what broke - to take an relevant example, can you imagine Java changing the semantics of ArrayList.remove in a minor release?

These languages were created when the internet was being popularized and getting access to packages was .. tricky which has contributed to their large libraries - more modern approaches include splitting things up into packages with a package manager where each package can have its own change and breakage policy, as well as "maturity level", and this is what the text promotes as a next step in the evolution of Nim, in the case of immature std libs - of course, impossible situations still arise but the problem is then localized to those packages, and they can be upgraded or not, independently.

If we're going to make these kinds of comparisons, also take a look at C, C++, rust and go instead - these more closely relate to Nim as being "systems programming languages" - these all come with fairly strict breakage policies, which has enabled third-parties to build up large eco-systems around them, being able to trust that the language by and large will remain usable from one release to the next, without having to reexamine large parts of the codebase and deal with mutually incompatible changes.

The standard library is particular in that upgrading is an all-or-nothing proposition, and the problems are somewhat exacerbated compared to simpler languages due to Nim:s semantics (such as the global namespace, loose typing around generics and so on) which in particular are sensitive to silent runtime breakage that is difficult to understand and work around.

That said, the guideline is there to establish a culture around a non-breaking-approach-first mentality that typically accompanies a language used in production. It doesn't say that no breaking changes should ever be made, merely that there exist ways to introduce change that have empathy for current users - for example spreading the upgrade over multiple versions where the final breaking version gets a breaking version number, using the deprecation features, etc.

Recent standard library efforts have gone all-out in the other direction however where small imperfections are used to motivate breaking changes, and the model used to introduce them - -d:nimLegacy is both fundamentally flawed for being global (this is one of the most basic no-no:s in any sane evolution strategy) and opt-out, instead of opt-in - above all, this is what is different with this guideline in place: such changes should not be happening in the majority of cases because they do make life more difficult unnecessarily, for those of us that actually have codebases in Nim deployed.

This kind of policy is also here because it establishes a norm around which users of Nim can plan how they use Nim (or not) - as things stand right now, the norm is moving towards a break things mentality where every user with Nim code constantly must scour the PR flow to point how how things will break and argue the points over and over - at that stage, a guideline serves as a coordination point so that potential users of Nim can take a look at it and decide if the language is for them - I'd encourage those that downvote this and the other PR to write an alternative set of guidelines that they're willing to live up to, so that people like us that want to use Nim can decide if we should continue to do so, and if yes, what safeguards we need to put in place.

@Araq
Copy link
Member

Araq commented Jul 13, 2021

I just read https://rust-lang.github.io/rfcs/1122-language-semver.html. And while the rules for Nim seem to be remarkably similar and I'll accept these for the sake of moving forward, I have to point out that Rust's way of doing it is in practice quite empirical:

What precisely constitutes "small" impact? This RFC does not attempt to define when the impact of a patch is "small" or "not small". We will have to develop guidelines over time based on precedent. One of the big unknowns is how indicative the breakage we observe on crates.io will be of the total breakage that will occur: it is certainly possible that all crates on crates.io work fine, but the change still breaks a large body of code we do not have access to.

So an empirical approach to the problem via "important packages" would equally be justified.

@Varriount
Copy link
Contributor

So when/how often would Nim be releasing "major" versions, that would possibly contain breaking changes, compared to minor versions?

@timotheecour
Copy link
Member

timotheecour commented Jul 13, 2021

"breaking change" is simply the wrong criterion to decide whether a change is acceptable, because most changes (bugfixes, features) are a potential breaking change. For e.g., this PR considers adding a function with a new name as non-breaking, but that's not even true, as I've shown in #18468 (comment): simply adding a new API proc foo*(a: int): int = a to some stdlib module can, depending on cases, either lead to a CT error or a silent behavior change (with no warning), and it's not even a contrived example (as evidenced by the introduction of {.since.} to support --useVersion).

Plenty of other changes that seem innocuous (fixing a bug, adding an inline pragma which makes std/random 10x faster (it does), making a proc generic to generalize an API, adding a new API, changing hashing to avoid major slowdowns, changing std/random to avoid non-random behavior etc) are in fact a breaking change for some use case (whether the use case predates the change or not). Under this policy, a ton of PRs and bug fixes would have to be reverted (see some examples in #18468 (comment) and #18468 (comment)), a ton of changes between 1.0 and 1.2 or 1.2 or 1.4 would violate this policy, not to mention most RFCs. At this point, might as well not issue any new minor point releases, and just start shipping nim 2.0, then 3.0 etc, without intermediate point releases.

Simply deprecating APIs and keeping behavior immutable isn't the solution, as shown in #18468 (comment); a change to hashes for example (as was done since 1.0 to fix performance or other issues) would require introducing std/hashes2, and then in turn (to avoid changing tables behavior), would require introducing std/tables2, std/json2, and then soon enough all of stdlib would be deprecated and duplicated. Ditto with a change to $ or os./. Any client of any of those dependencies would have to be updated to use the new modules (or suffer from original performance or other issues) and in the process cut support for prior nim versions, creating a mess of incompatible duplicate modules that need to be maintained and break nimble ecosystem.

This is the 1 vs N churn: you either fix the bug in 1 place and make users of 1 package that relies on the old behavior un-happy, or you pass on the problem to all (transitive) clients of the API and require all of them to change their code, making everybody unhappy, and causing balkanization of APIs and a maintenance nightmare.

Simply deferring breaking changes to nim 2.0 isn't a practical solution either, because nim 2.0 will also have bugs and waiting for 3.0 to have those bugs fixed isn't practical unless we decide to forgo of minor point releases.

A better, more practical criterion for breaking changes is to assess impact, as done in rust (https://rust-lang.github.io/rfcs/1122-language-semver.html); impact can be assessed as follows:

  • how much code is affected, for which the best proxy today is important_packages (as in crates.io for rust)
  • how easy is it to identify the lines of code that need attention (this is achievable via a warning or other compiler help)
  • how difficult is it to fix the code (this is achievable via a changelog entry + good warning msgs)
  • whether the code can be fixed in a forward and backward compatible way

As to when those changes should be made available, I argue in #18486:

  • in devel: as soon as the bugfix is merged
  • in stable: possibly in next stable release or in a later TBD minor or major point release, on a case-by-case basis

Most languages frequently introduce necessary breaking changes; they improve not in spite of but thanks to judicious introduction of breaking change, whether it's python, java, D, rust, swift, etc; the product they offer improves release after release and despite some unavoidable complaints, the majority of people welcome those.

@Araq
Copy link
Member

Araq commented Jul 14, 2021

"breaking change" is simply the wrong criterion to decide whether a change is acceptable, because most changes (bugfixes, features) are a potential breaking change. For e.g., this PR considers adding a function with a new name as non-breaking, but that's not even true, as I've shown in #18468 (comment): simply adding a new API proc foo*(a: int): int = a to some stdlib module can, depending on cases, either lead to a CT error or a silent behavior change (with no warning), and it's not even a contrived example (as evidenced by the introduction of {.since.} to support --useVersion).

In fact, you can argue that introducing new overloads for an already overloaded proc is an "implementation detail" (we already trusted the overload resolution process before to get it right) whereas entirely new procs are "breaking" changes.

a ton of changes between 1.0 and 1.2 or 1.2 or 1.4 would violate this policy, not to mention most RFCs.

Well the policy would be adhered to for new releases, not a big deal if the policy actually works.

At this point, might as well not issue any new minor point releases, and just start shipping nim 2.0, then 3.0 etc, without intermediate point releases.

Semver is just broken and empiricism is superior. Also, versions are used for marketing, Nim 2.0 should be a significant release, not just yet-another release because of Semver.

@arnetheduck
Copy link
Contributor Author

This is the 1 vs N churn: you either fix the bug in 1 place and make users of 1 package that relies on the old behavior un-happy, or you pass on the problem to all (transitive) clients of the API and require all of them to change their code, making everybody unhappy, and causing balkanization of APIs and a maintenance nightmare.

I'm not sure about that - ie the maintenance nightmare is when a base package changes and now all your dependencies must follow suit - this is the coordination problem we're trying to avoid by constraining the cowboy approach - in particular, because most changes can be introduced in an additive manner without said disruption if merely you give it some thought. It is true that the result is not as pretty as it could be was Nim a greenfield project - the way out here is to roll the changes as opt-in for a few releases then make a breaking change.

Above all though, this is a problem that is only going to be growing worse for every module added to the standard library - that's why the default reaction to a broken API is to move it out of the std lib and evolve it outside - this way, we split up the problem into pieces where each piece moves independently, and none of these problems exist any more. This is a much more powerful way of addressing things because it scales: it's not artificially held up by coordination overhead with the rest of the langauge and library.

Again, maintaining a codebase of any size on top of a constantly moving target, such as we have in the real world, is a real and concrete a maintenance nightmare while the "balkanization" nightmare remains a theoretical construct. Everything sits on top of the standard library, hence everybody is affected when it changes - that's a fact - the theoretical future users are pretty to inspire fear with, but the reality is that people are smart enough to figure things out and move on, incredible as this may sound.

In fact, you can argue that introducing new overloads for an already overloaded proc

adding overloads is usually fine - except when they're in the wrong module and a match-all generic overload exists

Semver is just broken and empiricism is superior.

er, semver is a communication tool to communicate the result of the empiricism to your users in an asynchronous best-effort manner

@Araq
Copy link
Member

Araq commented Jul 14, 2021

er, semver is a communication tool to communicate the result of the empiricism to your users in an asynchronous best-effort manner

Well surely you can simply use empiricism ("we follow these rules and test releases against an ever growing set of real world code") without following semver. There is no proof that semver significantly improves the "best-effort"-ness that everybody ends up doing and plenty of successful software projects do not follow semver (yet remain backwards compatible).

In fact, semver assumes that bugfixes are much less harmful than new features and that's just wrong, esp. for a compiler. (Compiler bugfixes make the compiler stricter and do break code, a new feature can use new syntax that was previously a syntax error so by construction you know the new feature doesn't break code.)

@arnetheduck
Copy link
Contributor Author

In fact, semver assumes that bugfixes are much less harmful than new features and that's just wrong, esp. for a compiler. (

I dunno, I just see it as a way for the author of a library to communicate their best knowledge of the changes to the users of the library etc - ie "I fixed this bug" vs "I knowingly made a mess for you and you need to work to upgrade" - there's no real value statement about harm or importance in that, merely pragmatism.

@arnetheduck
Copy link
Contributor Author

"breaking change" is simply the wrong criterion to decide whether a change is acceptable,

like any policy, one can get hung up on strict interpretations, or focus on the intent of the policy - in the former case, one makes absurd examples while in the latter, one makes a comon-sense judgement. the intent with this policy in particular would be that it's of the latter kind, where it's applied judiciously - for example, if it's applied to random, it is the case that a random function can return a random results, and making it more random is not a breaking change - neither is speeding up a hash function - however, adding a new hash function in such a way that existing code which didn't explicitly import a module now gets different results depending on which modules are imported is a breaking change because there is no sane way to diagnose such a problem reliably.

Likewise, changing the semantic of a function which previously explicitly guaranteed non-nil values such that it suddenly starts handing out nil:s is also a breaking change, by the same common sense argument .

@Araq
Copy link
Member

Araq commented Jul 14, 2021

there's no real value statement about harm or importance in that, merely pragmatism.

The order is <breaking change>.<new features>.<bugfixes> and it's enforced by the tooling, so yes, the assumption that bugfixes are much more harmless than feature additions is hardcoded in semver. Which is why backporting features like new compiler warning messages into the 1.2.x line would be useful for quite some people...

@timotheecour
Copy link
Member

timotheecour commented Jul 14, 2021

Above all though, this is a problem that is only going to be growing worse for every module added to the standard library - that's why the default reaction to a broken API is to move it out of the std lib and evolve it outside - this way, we split up the problem into pieces where each piece moves independently, and none of these problems exist any more. This is a much more powerful way of addressing things because it scales: it's not artificially held up by coordination overhead with the rest of the langauge and library.

Moving things around or splitting stdlib into separate repos isn't magically going to fix problems. Take a look at fusion: 68 commits since it started (in april 2020); in the same timespan, nim saw 2518 commits, or 37x more; even if you restrict it to lib sub-directory, that's still 1201 commits vs 68.

Monorepos are easier to maintain and grow because there are less moving parts, less dependencies, and it's easier to maintain consistency and you don't have synchronization issues when a change involves 2 repos. See also the background story around nim-lang/RFCs#310

This means even though Fusion is supposed to be a staging area for the stdlib, contributing to Nim's stdlib is easier than contributing to Fusion. Clearly against Fusion's design.

You might say, let's use a decentralized stdlib, maybe even pkg/os, pkg/sequtils, but that'll only make things worse: less maintainers, less reviewers, lower quality bar, less trust when using those packages and yes, increased chances of dependency conflicts.

Even llvm realized that and transitioned to a monorepo, containing both compiler and standard library, why do you think that is?

@Araq
Copy link
Member

Araq commented Jul 14, 2021

Moving things around or splitting stdlib into separate repos isn't magically going to fix problems.

Indeed it does not, but what arnetheduck needs is a way to pin down dependencies in a more fine granular manner than "requires Nim version X" where X combines 10 very good bugfixes with 3 runtime changing behaviors that are quite risky.

@dom96
Copy link
Contributor

dom96 commented Jul 14, 2021

Indeed it does not, but what arnetheduck needs is a way to pin down dependencies in a more fine granular manner than "requires Nim version X" where X combines 10 very good bugfixes with 3 runtime changing behaviors that are quite risky.

This will always be the case. Even if stdlib isn't a factor, a new compiler version will combine 5 different bug fixes with 10 different features. Unfortunately if you need one of those bug fixes then you'll need to swallow the other changes and test that your software still works with them, if you can prove that it does not then you can ask for a special backported version that fixes the bug that you need fixed.

@Araq
Copy link
Member

Araq commented Jul 15, 2021

This will always be the case. Even if stdlib isn't a factor, a new compiler version will combine 5 different bug fixes with 10 different features.

It can be mitigated though by not doing these risky runtime changing behaviors (instead deprecate the offending proc) and/or by having a fixed number of these per release. And the fact that there is no agreement on some of these changes implies that "just use common sense" is too limited and we should have more policies. (And I cannot believe that I wrote this last sentence as it's against my beliefs...)

@arnetheduck
Copy link
Contributor Author

I think above all we have lots of software and libraries that we want to write in Nim and would like to focus on that, instead of chasing aesthetic changes.

Crucial in that process is having a stable base to build on - that means that the core libraries of the language don't change frivolously - semantically changing core libraries breaks idioms and introduces new failure modes - this applies to the parts that of that are covered by test suites, but also to the informal parts that rest on principles and precedent.

When a module doesn't work out the way it should, it's easy to create a new one that does, assuming that it is truly that important, and it can easily be done without changing the existing one. Many in the Nim community have done so already, creating successful libraries and improvements to things in the standard library, to the point that they get used in spite of the significant convenience subsidy that the standard library offers. This is the power that we should be encouraging and unleashing instead of entrenching an unscalable model further .

The amount of work needed to write a new module is directly related to the amount of work you're breaking when you change the existing one - is introducing hash causing lots of follow-up effects? That's a signal that changing hash is most likely not a good idea unless the underlying cause is truly damning, and not just a trivial preference for a subjectively more elegant world.

The baseline for the precedent was set with the 1.0 release - for the sake of argument, let's say it's 1.4 even - we will obviously not be going back from there. The question is what to do next - keep breaking things and thus discouraging people from investing their time in using Nim for production projects or draw a line and start applying a higher standard to some standard library modules and move the rest to packages where they can be independently improved at an appropriate pace?

Let's face it - some modules have outlived their times in a way that they can't be fixed without breaking them and significantly changing the way they work - the right place for such radical changes is not together with the compiler and the core standard libraries without which the language becomes meaningless.

You might say, let's use a decentralized stdlib, maybe even pkg/os, pkg/sequtils, but that'll only make things worse: less maintainers, less reviewers, lower quality bar, less trust when using those packages and yes, increased chances of dependency conflicts.

I'm curious, from where did you draw all these conclusions? There's nothing in this proposal that says that the separate packages have to live in a separate repo, be maintained by other people or with lower quality standard. Of course, some libraries are so out of place that the only obvious and correct choice is to move them to a separate repo where likely they will fall out of use since better alternatives exist, but that doesn't necessarily apply to everything.

On the contrary - the pace and the difficulty to make a Nim release speaks to the opposite - instead of being able to release a highly criticial bugfix to components like json parser or http server, the community must wait for months while the maintainers of the distribution meticulously must go through all completely unrelated changes, coordinate with all packages that broke as a result of changes in those unrelated packages, fix all new bugs introduced by new features and crusades, then release.

@Araq
Copy link
Member

Araq commented Jul 15, 2021

I'm curious, from where did you draw all these conclusions?

That's because I keep bringing them up, I think. cough :-/

@arnetheduck
Copy link
Contributor Author

This will always be the case. Even if stdlib isn't a factor, a new compiler version will combine 5 different bug fixes with 10 different features.

Ergo the need for a smaller standard library and separately versioned components thereof - this is a win-win for everyone: upgrades to the compiler are easier and upgrades to applications are easier and all these releases can be made in a more timely fashion instead of one thing blocking the other.

@Araq
Copy link
Member

Araq commented Jul 21, 2021

Merged my PR instead (based on yours).

@Araq Araq closed this Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants