Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Include compiler ranges in package metadata #255

Open
thomashoneyman opened this issue Oct 18, 2021 · 18 comments · May be fixed by #669
Open

Suggestion: Include compiler ranges in package metadata #255

thomashoneyman opened this issue Oct 18, 2021 · 18 comments · May be fixed by #669
Labels
alpha enhancement Something that would be good to have but it's not a priority

Comments

@thomashoneyman
Copy link
Member

thomashoneyman commented Oct 18, 2021

A PureScript package normally is compatible with some, but not all, compiler versions. I suggest that we allow package authors to specify this information in their package manifest the same way that dependencies are specified (as described in #43). In other words, a purs.json file would include:

{ "name": "my-package"
, "version": "1.2.3"
, "compiler": ">=12.0.0 <14.0.0"
}

Field names could be purs, purs-version, compiler, compiler-version, or something like that.

Why?

A manifest including supported compiler versions opens up a few new possibilities:

  • We can verify that a package compiles when added to the registry (or package set) by ensuring it compiles with a sample of compiler versions in the specified range (at least 1 per major version specified, perhaps).
  • We can show what compiler versions a package supports in the Pursuit documentation.
  • We can add a filter to Pursuit that allows you to look at packages only associated with particular compiler versions. Folks regularly ask to see just 0.14-compatible packages, but it would also be nice to be able to go "back in time" when working on an old project and see only 0.12-compatible packages (for example).
@f-f
Copy link
Member

f-f commented Oct 20, 2021

I think we considered it in the past - the big issue with allowing users to specify compatibility ranges is that they are often incorrect in two ways:

  1. overly restrictive bounds: a package version might be compatible with a compiler version but cannot be used because the bound prevents it
  2. overly lax bounds: the package author specifies a bound that includes future versions of the compiler that might end up breaking it - in this way users are allowed to use the package but it would break their build.

To sum up: we wouldn't be able to trust such bounds, which might be worse than having them in the first place!

I think a more useful solution for this issue is something like the Hackage Matrix Builder, where all the packages that are uploaded to Hackage are built with various versions of the compiler to figure out the compatible versions (here's e.g. the build matrix for the purescript package).
We could set up something similar, and store all this info in the Metadata file for every package, here in this repo.

@thomashoneyman
Copy link
Member Author

I agree that compatibility ranges are likely to be incorrect for the reasons you've specified, and I also think they're still useful to include somewhere that tools like Pursuit would be able to access them. So long as it's possible for Pursuit to access the Metadata information then it seems sensible to me that the information lives there.

I agree that something like the Hackage Matrix Builder is a good way to determine actual compatibility bounds for a package upon publication. Would a process like this make sense?

Package Update

When a package is added or updated, we test it across compiler versions:

  1. We attempt to compile the package with every compiler version we choose to support (everything from 0.11 onward?)
  2. For each major version that compiles, we switch to minor versions and compile with those.
  3. For each minor version that compiles, we switch to patch versions and compile with those.
  4. The resulting list of successful compilations is stored in the package metadata as an array of versions, ie. [ "0.11.0", "0.11.1" ]

Compiler Update

When a new compiler version is released, we walk through all packages and check if they compile with the new version. (We could narrow this by only choosing packages that already compiled with at least one version in the compiler series, ie. when 0.14.3 comes out we only re-check packages that compiled with at least 0.14.0, 0.14.1, or 0.14.2).

  1. We add support for the new compiler version by making the compiler usable in our pipeline
  2. We produce a list of all packages that should be re-checked
  3. One-by-one we compile the packages, and add the compiler version to their supported versions in their metadata files if the compilation succeeds, and do nothing otherwise.

@thomashoneyman thomashoneyman changed the title Suggestion: Include compiler ranges in purs.json Suggestion: Include compiler ranges in package metadata Oct 20, 2021
@thomashoneyman
Copy link
Member Author

I've updated the name of this issue to reflect storing this information in the package metadata instead of in the purs.json file.

@natefaubion
Copy link
Contributor

For each major version that compiles, we switch to minor versions and compile with those.

This assumes that a library which compiles on something like 0.14.4 will also compile on 0.14.0, which isn't always the case. For example, nameless instances were introduced in a "patch" release, though since it's still 0.x versioning, it's dubious to call it a patch release.

@thomashoneyman
Copy link
Member Author

This assumes that a library which compiles on something like 0.14.4 will also compile on 0.14.0, which isn't always the case.

Alternately, we could compile with all compiler versions from a certain start version up to now; it’s a lot of compilation, but over not an enormous number of packages and so perhaps it’s ok.

@thomashoneyman
Copy link
Member Author

I'd like to propose a specific solution for the build matrix. This assumes we have access to all compilers from the earliest library releases that have been registered (say, purs-0.6.0 or something).

Package Build Matrix Process

If we're doing this in bulk, then first topologically sort all packages by their dependencies such that no package is processed until its dependencies have been processed, then do the rest of this one-by-one.

Try to compile the package with all compilers according to the following rules.:

  • No dependencies: attempt compilation with every compiler version, recording the first and last versions that work in a continuous range. For example: if slug@1.0.0 works with purs-0.13.0 and stops working with purs-0.14.0 then its range is >=0.13.0 <0.14.0.
  • Has dependencies: solve the dependencies, then take the intersection of compiler ranges supported by each dependency. This is the set of compilers that will possibly work to build the package with its dependencies. (See Issues below for a reason why this may not be good enough, and a different approach we could take.) Then, try every compiler in that range, producing a range of compilers that worked.

Then, write the range to the compiler field in the package metadata.

New Compiler Added

When a new compiler version is added to the build matrix, all packages that could have admitted that version should be retried.

For example, if slug@3.0.0 worked with compilers >=0.15.0 and the latest compiler was 0.15.8, then we'd produce the range >=0.15.0 <0.15.9. Then, 0.15.9 comes out. We should retry all packages that had the upper bound <0.15.9 and, if they work, update their bounds — for example, slug@3.0.0 would become >=0.15.0 <0.15.10.

In the case a new major compiler version comes out we should do the same thing. We would retry all packages with the upper bound <0.15.9 using the 0.16.0 compiler. If slug@3.0.0 worked then it would become >=0.15.0 <0.16.1.

If the package fails with the new compiler then its bounds are unchanged.

Issues

1. Overly-restrictive bounds
What happens when a package like convertible-options is usable with a dependency like prelude: >=3.0.0 <5.0.0, where prelude@3.0.0 supports compilers >=0.13.0 <0.14.0 and prelude@4.0.0 supports compilers >=0.14.0 <0.15.0. Technically, depending on how you solved the dependencies, convertible-options is compatible with compilers >=0.13.0 <0.15.0, but will probably end up being listed as compatible with just >=0.14.0 <0.15.0.

Is it reasonable for a solver to take a compiler version into account, too, so that when solving convertible-options with compiler 0.13.0 it chooses prelude@3.0.0 and when solving with compiler 0.14.0 it chooses prelude@4.0.0? Is this a pie-in-the-sky dream, or would it actually make the solver work a bit better by trimming out paths?

2. Delayed metadata updates
This requires that, when we update the PureScript compiler and make a new release, we also need to update the build matrix at the same time so bounds can be retried. Otherwise, packages will have stale, overly-restrictive compiler bounds until the build matrix is updated and re-run for all eligible packages.

@MonoidMusician
Copy link
Contributor

Is it reasonable for a solver to take a compiler version into account, too, so that when solving convertible-options with compiler 0.13.0 it chooses prelude@3.0.0 and when solving with compiler 0.14.0 it chooses prelude@4.0.0? Is this a pie-in-the-sky dream, or would it actually make the solver work a bit better by trimming out paths?

We can implement it as a fake package dependency that represents the compiler version bounds. I don't think think there'd any additional benefit to special casing it in the solver, since a fake dep would be used to prune possibilities the same, and this would mean we get to see it in the errors for free.

@colinwahl
Copy link
Collaborator

In #639 I added a script which computes the supported compiler versions (from 0.13.0 onward) for every Manifest currently uploaded to the registry. @thomashoneyman was kind enough to loan his computer for a few days to run the whole thing (there were almost 11k manifests to check, and 27 compiler releases since 0.13.0).

I'd like to discuss the results of that script run, and discuss what our next steps could be.

Results

You can view the script I used to derive these results here.

Sanity Check

My first check was to glance at how many manifests we were able to verify a compiler version for:

Total Manifests: 10863
Total Manifests published after 0.13.0 was released (May 19, 2019): 3242
Manifests we found a compiler version for: 4358

At first I was a bit concerned by this result - only 4358? I thought there were 10863 manifests currently in the registry. Well, you're right that there are 10863 in the registry - however - then I realized that there have only been 3242 manifests published since the date 0.13.0 was released. With that in mind, this seems pretty good!

Correctness

My second check was for correctness. I went through every package set from 0.13.0 to now, and checked if we had recorded that package set's compiler version for each package in the set.

Total unique package releases in package sets: 2003
Total unique package releases for which we correct computed the compiler versions: 1947
Total unique package releases for which we did NOT correctly compute the compiler versions, based off their inclusion in a package set: 56

Click here to view all package releases that were in a package set, but we didn't compute that package sets compiler version for that release
- array-search@0.5.6
- assert-multiple@0.3.4
- barbies@1.0.1
- basic-auth@1.0.1
- bolson@0.0.6
- bolson@0.0.9
- bolson@0.1.1
- bytestrings@8.0.0
- chameleon-react-basic@0.1.0
- chameleon-styled@0.1.0
- chameleon-transformers@0.1.0
- concur-react@0.5.0
- deku@0.4.13
- deku@0.5.2
- deku@0.6.0
- deku@0.6.1
- deku@0.9.13
- deku@0.9.14
- deku@0.9.15
- deku@0.9.16
- deku@0.9.18
- env-names@0.3.1
- env-names@0.3.4
- freedom-transition@1.0.0
- halogen-bootstrap4@0.1.3
- halogen-bootstrap4@0.1.4
- hyper@0.10.0
- hyper@0.10.1
- int-53@4.0.0
- interactive-data@0.3.0
- maps-eager@0.4.1
- mdast-util-from-markdown@0.2.1
- ocarina@1.2.0
- ocarina@1.2.1
- ocarina@1.3.0
- phoenix@4.0.0
- ps-cst@1.2.0
- quotient@3.0.0
- react-basic@9.0.0
- redis-client@1.0.1
- redis-hotqueue@0.2.1
- refined@1.0.0
- remotedata@4.2.0
- rito@0.0.0
- rito@0.0.2
- rito@0.0.3
- rito@0.1.0
- simple-jwt@1.0.0
- simple-jwt@1.0.1
- simple-jwt@1.0.2
- snabbdom@1.0.1
- subcategory@0.2.0
- svg-parser-halogen@1.0.0
- tecton-halogen@0.1.0
- trout-client@0.11.0
- webaudio@0.1.2

I took a glance through a few of the entries in the following list, and in each case I could easily identify an overly restrictive dependency bound on a core library as the reason we wouldn't have been able to compute the "correct" supported compiler version. I am pretty happy with that!

Complete list of post-0.13.0 packages without supported compiler versions

Click here to view.

arraybuffer: 11.0.0
arrays-zipper: 1.1.0
assert-multiple: 0.2.1
aws-sdk-basic: 0.15.1, 0.15.0
bolson: 0.1.1, 0.1.0, 0.0.9, 0.0.6
boxes: 1.0.1, 1.0.0
causal-graphs: 0.4.1, 0.4.0, 0.3.0, 0.2.0, 0.1.0
chameleon-react-basic: 0.1.0
chameleon-styled: 0.1.0
chameleon-transformers: 0.1.0
cirru-edn: 0.0.8, 0.0.7, 0.0.6, 0.0.5, 0.0.4, 0.0.3
concur-react: 0.5.0
decision-theory: 0.1.2, 0.1.1
deku: 0.9.18, 0.9.16, 0.9.15, 0.9.14, 0.9.13, 0.6.1, 0.6.0, 0.5.2, 0.4.13
difference-containers: 1.0.1
dom: 5.0.0
elmish: 0.8.0, 0.5.0, 0.1.0, 0.0.5
elmish-enzyme: 0.1.0
elmish-hooks: 0.8.0
emo8: 0.7.1, 0.7.0
erlps-core: 0.0.1
eth-core: 4.0.0
fast-vect: 0.5.0
flame: 0.5.0, 0.4.5, 0.4.4, 0.4.3, 0.4.2, 0.4.1, 0.4.0, 0.3.1, 0.3.0, 0.2.0
foreign-datetime: 2.0.1
form-decoder: 0.0.3
generics: 4.0.1
graphql-client: 2.0.6, 2.0.4, 2.0.3, 2.0.2, 2.0.1, 2.0.0, 1.7.27, 1.7.24, 1.7.23, 1.7.21, 1.7.19, 1.7.17, 1.7.14, 1.7.13, 1.7.11, 1.7.10, 1.7.9, 1.7.8, 1.7.7, 1.7.6, 1.7.5, 1.7.3, 1.7.1, 1.7.0, 1.6.13, 1.6.11, 1.6.9, 1.6.7, 1.6.6, 1.6.5, 1.6.4, 1.6.1, 0.1.1, 0.1.0
graphviz: 1.1.0
halogen-bootstrap4: 0.1.4
halogen-pure: 0.0.1
homogeneous: 0.2.0
httpurple: 0.11.0
huffman: 0.2.0
hyper: 0.10.1, 0.10.0
hypertrout: 0.10.0
inject: 4.0.1
interactive-data: 0.3.0, 0.2.0, 0.1.0
interpolate: 2.0.0
intertwine: 0.4.2, 0.4.1, 0.4.0
jack: 3.0.0
jarilo: 1.0.0
jelly: 0.8.0
jelly-hooks: 0.2.0, 0.1.0
jelly-router: 0.1.0
kishimen: 1.0.0
lit-html: 0.2.0
maps: 3.6.1
mason-prelude: 0.7.0
mdast-util-from-markdown: 0.2.1, 0.2.0
metajelo: 3.1.0, 3.0.1, 3.0.0, 2.0.0, 1.1.0, 1.0.1
metajelo-web: 2.0.0, 1.0.2
mol-draw: 1.0.15, 1.0.14, 1.0.13
multiset-hashed: 0.0.1
oak-debug: 0.6.10
ocarina: 1.3.0, 1.2.1, 1.2.0, 0.0.1
ocelot: 0.19.1, 0.19.0, 0.18.3, 0.18.2, 0.18.1, 0.18.0, 0.17.0, 0.16.10, 0.16.9, 0.16.8, 0.16.7, 0.16.6, 0.16.5, 0.16.4
p5: 0.11.0, 0.10.0
parsing-repetition: 0.0.7
pha: 0.7.3, 0.7.2, 0.7.1, 0.5.5, 0.5.0, 0.4.0, 0.0.7
postgresql-client: 3.1.1, 3.1.0
presto: 0.4.1
protobuf: 0.9.1, 0.9.0
pseudo-random: 0.2.2
react-basic-hooks: 6.1.0
record-prefix: 2.0.0
redis-client: 1.0.1
redis-hotqueue: 0.2.1
repr: 0.3.0
rito: 0.1.0, 0.0.3, 0.0.2, 0.0.0
roman: 0.2.0
sets: 3.2.1
simple-jwt: 1.0.2, 1.0.1, 1.0.0
slug: 3.0.7
son-of-a-j: 0.2.2, 0.2.1, 0.2.0, 0.1.1, 0.1.0
soundfonts: 3.0.2
specular: 0.11.0, 0.10.2, 0.10.1, 0.10.0, 0.9.1, 0.9.0
stac: 1.0.1, 1.0.0
strongcheck-laws: 3.2.1
struct: 1.0.1, 1.0.0, 0.1.0
stylesheet: 0.0.1
symbols: 3.0.1
tables-parse: 0.1.2, 0.1.1
tecton-halogen: 0.1.1, 0.1.0
template-literals: 0.2.0
tolerant-argonaut: 1.0.1, 1.0.0, 0.1.0
trout-client: 0.11.0
twoset: 0.1.0
typelevel-lists: 0.2.0
typelevel-prelude: 4.0.2
uk-modulo: 6.12.0, 6.0.0, 5.90.1, 5.90.0, 5.80.0
untagged-union: 0.1.4
url-validator: 2.1.0, 2.0.1, 2.0.0, 1.2.0
veither: 1.0.0
web-workers: 0.1.2, 0.1.1, 0.1.0
webaudio: 0.2.0, 0.2.1
xpath-like: 2.0.0, 1.0.1, 1.0.0, 3.0.0
yoga-json: 4.0.0

In particular, I can't find any core or contrib packages in this list - which I take as a good sign.

Next Steps

I am pretty confident that the results of the script are correct in the sense that we've compute the proper supported compiler versions for each manifest while respecting its dependency bounds.

I'd like to move forward with the results of this script and actually extend Metadata with a list of supported compilers for the entry. I have a few questions:

  • What do we do with entries for which we couldn't figure out a supported compiler version? I was thinking we'd drop it from the registry, but that means that tons of pre-0.13.0 code will be dropped - is that an issue?
  • What do we do about the entries for which the dependency ranges are overly restrictive, but we have evidence that they work with a specific compiler because of their inclusion in a package set? My gut instinct is to drop those as well, so that the registry is a self-consistent set of entries.

@JordanMartinez
Copy link
Contributor

I'd go for a clean slate and drop whatever is necessary to do that. 0.13.0 was published 4 years ago.

@thomashoneyman
Copy link
Member Author

In #577 we agreed to drop pre-0.13 packages (though I’m surprised at so many versions affected!)

Personally I don’t think we should consider inclusion in a package set as evidence a package version compiled with the set’s compiler version, if the set breaks the package version’s bounds.

The registry guarantees all package versions in the registry solve. With compiler versions in metadata, we also guarantee each package version compiles with at least one compiler. I think this is where we should stop; no need to go hunting through package sets to try and find more compiler versions that work (or, on publication a new package set, go hunting for packages that may now work with the given compiler version).

@f-f
Copy link
Member

f-f commented Aug 4, 2023

Agreed with Thomas - I'd also like to remark that adding compiler versions retroactively is a "nice-to-have": it's not critical to have this information, so no need to drop packages where it's not straightforward to figure it out.
From here onwards we can possibly be more careful about including this information, but there's no need to be strict about including it at this point, unless we have a critical usecase for it.

@thomashoneyman
Copy link
Member Author

thomashoneyman commented Aug 4, 2023

I don’t think we should keep any package versions for which we couldn’t determine any viable compiler version. That means that the version either doesn’t solve, or that the way it solves is not usable with any compiler (you’ll download dependencies at versions that themselves can’t be compiled).

We didn’t historically check if packages solved or compiled, we just imported them wholesale from bower and GitHub tags. Later we determined we would require packages to be solvable and any new packages must compile.

We’ve always had gaps and inconsistencies, though, because the legacy importer disables some checks and allows package versions into the registry we wouldn’t otherwise let in. And it runs every day!

The work done here simply finishes the job by extending the solve & compile rule to mean every package version must solve & compile with at least one compiler version supported by the registry. In turn, that means we can make the legacy importer work the same as the normal publish pipeline and get rid of the “Legacy” vs “Current” distinction.

Why would we want to include package versions that don’t solve and compile with any supported compiler?

The only reason I can think of is because they could technically compile in a package set where you’re explicitly ignoring their bounds. But I feel like this is a weak motivation; the package version still ought to be solvable & buildable with at least a single compiler version.

It can also be resolved for any existing package versions that would be dropped here. We could follow Colin’s guidance and grandfather in existing package versions by adjusting their bounds based on their package set inclusion as of the day we do the big update, hence keeping packages that would otherwise be removed. But future packages would have to work with at least one compiler.

@f-f I may be missing your point somewhat; are you seeing cases that make this solve/compile requirement untenable? For example, we’re generating (quite lenient) bounds for folks who don’t have bowerfiles or spago.yaml files so there’s a chance our solutions are wrong. Still, if our solutions are wrong, then it seems we should treat that as ‘unsolvable’.

@f-f
Copy link
Member

f-f commented Aug 4, 2023

I'm ok with making this a strict check (i.e. dropping packages for which we don't find at least one compiler version), if the amount of post-0.13 packages that we'd drop is very low (ideally zero).

I can't figure out what this number is in Colin's post above, but looking at the list it seems like it's about ~100 versions? We should go through these and make sure that we're not missing anything straighforward that would make these solve.

@thomashoneyman
Copy link
Member Author

That sounds like a good idea. I know there were 56 package versions that existed in package sets but had no workable compiler version. We reviewed a bunch of these and it was always some overly-restrictive dependency. So we can fix those.

We’ll try to get as many as possible to work, and once we’re at a number we all think is reasonable we can move forwards

@colinwahl
Copy link
Collaborator

Thomas is right that there were 56 package versions that existed in a package set, but we didn't compute that package set's compiler version in the packages supported compilers array - however, that doesn't mean that we didn't find any supported compilers for the package version.

The amount of package versions that were in a package set but for which we computed no supported compilers is lower at 42. Do we want to update the compiler versions for just these 42 packages, or update all 56 so that we report that we support each compiler version we have evidence for via inclusion in a package set? We should be able to easily add compiler versions for the ones we have evidence worked via their inclusion in a package set (it should be pretty easy to just bump the dependencies based on the others in the package set).

In total there were 246 manifests published after 0.13.0 for which we didn't compute any supported compilers.

I can look through the 246 to see if there's something obvious missing, but that's a lot to go through, frankly. (In particular, at a glance, it is mostly old versions of things).

Are you all proposing literally updating the manifests to get things to solve so that we can admit a larger set of things to the registry?

@thomashoneyman
Copy link
Member Author

thomashoneyman commented Aug 4, 2023

Are you all proposing literally updating the manifests to get things to solve so that we can admit a larger set of things to the registry?

I'm not proposing any manual process. But we do have some automated possibilities:

  • We could reimport package versions that clearly work with a given package set but didn't end up with valid compiler versions, because that will automatically widen their bounds to ranges we know work. Or that we use some other method to widen bounds to a range we think is better.
  • We could mark package versions that fail but have other versions in the same minor/patch range that succeed, and if their dependency bounds are different just adjust the bounds such that they match the successful package and retry.

The second point there's a bit iffy and probably not worth the effort. But we could at least go with the first option. I agree that having to go look through 246 manifests and figure out how to fix them all is a bit much. That said, if something jumps out as an obvious reason why many fail, but could have been admitted, it would be good to fix it in an automated fashion if possible.

@thomashoneyman
Copy link
Member Author

I haven’t gotten around to discovering all packages that would have to be reimported (as well as their dependents), but I was thinking on it and — we already plan to reimport the whole registry once this build matrix change is in, so why not just have this be part of the wholesale reimport of the registry?

I quite frankly don’t have much time and just bundling this in to the existing full reimport plan would be a better use of it. cc: @f-f

@f-f
Copy link
Member

f-f commented Aug 29, 2023

@thomashoneyman agreed - I think "trying to compile a package to figure out if it can be compiled with that compiler version" is a good plan in general, and maybe we should just rely on that since the script approach still gives us over 200 packages without a compatible compiler, and the other bounds that it comes up with might still be overly restrictive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alpha enhancement Something that would be good to have but it's not a priority
Projects
None yet
6 participants