Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack init should try at least one snapshot for each major ghc version #1628

Closed
harendra-kumar opened this issue Jan 9, 2016 · 18 comments
Closed

Comments

@harendra-kumar
Copy link
Collaborator

Many packages require ghc-7.8.x to build. But with the current logic we do not even try lts-2.x. For example, I just tried to init Haxl and it resulted in this:

None of the following snapshots provides a compiler matching your package(s):
    - lts-4.0
    - lts-3.19
    - nightly-2016-01-03
    - nightly-2015-11-17
    - lts-3.21
    - nightly-2016-01-08

Though it works with --resolver lts-2.22 . We should be trying at least one snapshot corresponding to each major version of ghc. Here we are trying all 7.10.x and no 7.8.x.

@mgsloan
Copy link
Contributor

mgsloan commented Jan 9, 2016

Agreed! The current approach prioritizes "already has packages built" too much. I think the logic came from before we had package sharing among snapshot dependencies. So, nowadays the cost of choosing a new snapshot is much lower.

I think people will just want the latest lts for a given lts major version, as it's likely to have bugfixes / enhancements. How about for now we try the latest of every lts major version, and the latest nightly? This won't scale forever, but once it becomes a problem that there are too many lts major versions, we can address the issue then.

@harendra-kumar
Copy link
Collaborator Author

Here is a bit more complex plan:

  1. From all locally available lts select unique ghc majors
  2. From all locally available nightlies select unique ghc majors not yet tried
  3. From all lts select unique ghc majors not yet tried
  4. All lts (irrespective of ghc version) - exclude the ones already tried.
  5. latest nightly
  • Stop trying lts if n lts have been tried
  • Stop trying nightlies if n nightlies have been tried
  • Make sure to always try latest lts and latest nightly

where n ~ 3

@mgsloan
Copy link
Contributor

mgsloan commented Jan 9, 2016

Hmm, interesting ideas! That strategy would certainly handle most usecases well. However, I think we can do something simpler that works just as well, perhaps better.

Selecting which snapshots to try based on the GHC version is an interesting way to avoid trying all different lts-N.latest. It makes a decent amount of sense, as if there are constraints on ghc or base, some snapshots will simply be incompatible with some packages. On the other hand, some of the skipped lts versions might provide a better set of dependencies.

So, I'm still sticking with the "try the latest of every lts". Once there are so many lts releases that it's a problem, we can figure out what to do about it then. Maybe we'll just set a lower bound, once ghc-7.8 starts being considered an archaic version of ghc.

@mgsloan
Copy link
Contributor

mgsloan commented Jan 9, 2016

I'm leaning towards ignoring the local availability of snapshots, because:

  1. I think the decision to have this preference came from before we had package sharing. It's not such a big deal to use a new snapshot now.

  2. This way, the state of .stack/ doesn't influence stack init's behavior (more deterministic)

  3. This will make it easier to make the snapshot paths on windows, to avoid MAX_PATH - see Store "stack-path" files in directories that have been shortened with SHAs #1173

@snoyberg @borsboom Thoughts? Is stack init's affinity for installed snapshots still sensible?

@borsboom
Copy link
Contributor

I think removing the affinity for installed snapshots makes sense. It's no longer needed for space efficiency/performance, and is potentially confusing.

@harendra-kumar
Copy link
Collaborator Author

Certainly trying all major lts versions is simple and will work well for now and maybe in future as well with a limit applied which covers all major GHC versions that is sensible to support and is not too many to try. I am ok with that solution as well.

To explain the thinking behind the more complex implementation, my goal was to achieve two important objectives:

  • limit the choices to minimum possible.
  • be able to build everything that is possible by the supported snapshots.

GHC version is the most important factor if we want to fulfill both the goals. GHC version affects the crucial ability to build or not build whereas the choice of snapshots only makes a difference of a few more or a few less extra dependencies.

Limiting the choices to minimum is very important for a better user experience. Too many choices with very little extra benefit makes it harder to choose and turns users away with not so good impression.

Look at this output, anyone doing an init may have to go through something like this:

cueball:/vol/hosts/cueball/workspace/diagrams-ci/build-tmp$ stack init 

Selecting the best among 6 snapshots...

* Partially matches lts-4.0
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    gtk version 0.14.2 found
        - diagrams-gtk requires >=0.12.0 && <0.14
    lens version 4.13 found
        - package-ops requires >=4.11 && <4.13
    linear version 1.20.3 found
        - diagrams-input requires >=1.11.3 && <1.19
    texrunner not found
        - diagrams-pgf requires <=0.0.2
    tuple not found
        - SVGFonts requires -any

* Partially matches lts-3.19
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    graphviz not found
        - diagrams-graphviz requires >=2999.17 && <2999.19
    linear version 1.19.1.3 found
        - diagrams-input requires >=1.11.3 && <1.19
        - diagrams-lib requires >=1.20.1 && <1.21
    texrunner not found
        - diagrams-pgf requires <=0.0.2

* Partially matches nightly-2016-01-03
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    gtk version 0.14.2 found
        - diagrams-gtk requires >=0.12.0 && <0.14
    lens version 4.13 found
        - package-ops requires >=4.11 && <4.13
    linear version 1.20.3 found
        - diagrams-input requires >=1.11.3 && <1.19
    texrunner not found
        - diagrams-pgf requires <=0.0.2
    tuple not found
        - SVGFonts requires -any

* Partially matches nightly-2015-11-17
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    linear version 1.19.1.3 found
        - diagrams-input requires >=1.11.3 && <1.19
        - diagrams-lib requires >=1.20.1 && <1.21
    texrunner not found
        - diagrams-pgf requires <=0.0.2

* Partially matches lts-3.21
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    graphviz not found
        - diagrams-graphviz requires >=2999.17 && <2999.19
    linear version 1.19.1.3 found
        - diagrams-input requires >=1.11.3 && <1.19
        - diagrams-lib requires >=1.20.1 && <1.21
    texrunner not found
        - diagrams-pgf requires <=0.0.2

* Partially matches nightly-2016-01-10
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    gtk version 0.14.2 found
        - diagrams-gtk requires >=0.12.0 && <0.14
    lens version 4.13 found
        - package-ops requires >=4.11 && <4.13
    linear version 1.20.3 found
        - diagrams-input requires >=1.11.3 && <1.19
    texrunner not found
        - diagrams-pgf requires <=0.0.2
    tuple not found
        - SVGFonts requires -any

Selected resolver 'nightly-2015-11-17' does not have all the packages to match your requirements.
    cubicbezier not found
        - diagrams-contrib requires >=0.4.0.1 && <0.5
    linear version 1.19.1.3 found
        - diagrams-input requires >=1.11.3 && <1.19
        - diagrams-lib requires >=1.20.1 && <1.21
    texrunner not found
        - diagrams-pgf requires <=0.0.2

However, you can try '--solver' to use external packages.

This overwhelming output is presenting the user 6 choices and the only difference between them is just a few extra dependencies, if at all. I would be happy to present only one out of these even if that is the worst of these. Going through and evaluating the six options is not worth the benefit and is only going to confuse and possibly turn users away.

Whereas the difference between 'can build' and 'cannot build' is huge and a braindead choice. My point is that limiting the burden on the user for making a choice or the effort involved in making a choice is crucial even if that comes with some implementation complexity behind the scenes.

@harendra-kumar
Copy link
Collaborator Author

A modification to accommodate the user experience would be to try all major lts internally and present the output corresponding to only the best 3. That will require the implementation to keep the output until then end when we know the best three.

@harendra-kumar
Copy link
Collaborator Author

It need not even be the best three it could be just one if the difference is not much. Of course, you can see the details about all that were tried using a verbose flag.

@harendra-kumar
Copy link
Collaborator Author

Another consideration is to prefer snapshots corresponding to installed compiler versions and especially so for system-ghc when --system-ghc is specified. For some reason one may want to stick to a given compiler version and use a snapshot which satisfies that constraint. We can afford the users the flexibility of that choice.

The reason could be anything e.g.

  • avoid some specific bug in a particular compiler version
  • one may have confidence in the debian stable's choice of compiler version or the patches they choose to apply
  • one may just want to use only the system compiler and not want to install or use any other compiler at all.
  • scarcity of space to install more compiler versions

@mgsloan
Copy link
Contributor

mgsloan commented Jan 13, 2016

Doh, sorry for not getting back on this sooner!

A modification to accommodate the user experience would be to try all major lts internally and present the output corresponding to only the best 3. That will require the implementation to keep the output until then end when we know the best three.

Yup, this sounds great! We should let the user know when a particular snapshot is being tried, and then present the top 3 results.

Another consideration is to prefer snapshots corresponding to installed compiler versions and especially so for system-ghc when --system-ghc is specified. For some reason one may want to stick to a given compiler version and use a snapshot which satisfies that constraint. We can afford the users the flexibility of that choice.

I'd say that this case is adequately covered by the --resolver argument.

@harendra-kumar
Copy link
Collaborator Author

I'd say that this case is adequately covered by the --resolver argument.

My bad. I keep forgetting about the compiler resolver and the proposed init-resolver stuff. The right way to address this use case would be to set the default init-resolver as the compiler resolver for your preferred compiler.

@harendra-kumar
Copy link
Collaborator Author

huh, once again. I guess my original point was - how do you find snapshots which work with a given compiler version? Compiler resolver as its implemented today does not use a snapshot, it just uses a given compiler and all extra deps.

We need a way to automatically pick the best snapshot matching a given compiler version. One way would be to change the compiler resolver to do just that.

@mgsloan
Copy link
Contributor

mgsloan commented Jan 13, 2016

how do you find snapshots which work with a given compiler version? Compiler resolver as its implemented today does not use a snapshot, it just uses a given compiler and all extra deps.

Hmm, right. So the idea is that if packages have constraints on ghc wired in packages, we can optimize the snapshot search process?

I don't think we should add code that does something special for snapshot selection based on ghc version. It's not a bad idea, I'm just trying to make sure we only introduce complexity where it makes a sizable difference to the user. By pushing back on the more complicated ideas, I'm trying to make implementation and maintenance easier! I think something can be done here that's rather simple and effective.

I could be wrong, though. I just think we should do the simple thing first.

We need a way to automatically pick the best snapshot matching a given compiler version. One way would be to change the compiler resolver to do just that.

I don't think changing the meaning of compiler resolvers is a good idea. Having resolvers that only consist of the wired-in packages is quite handy. It would be possible to introduce a new variety of resolver for this. However, an issue with this is that it means the resolver could produce a different build plan at a later date (due to newer snapshots). It's best if resolvers are as deterministic as possible.

@harendra-kumar
Copy link
Collaborator Author

we can optimize the snapshot search process?

No I wasn't thinking about optimization. Consider a use case where a user wants to compile his package using ghc-x.y.z. He wants to find out which snapshots provide that particular version of compiler? Once he knows that information then he can use --resolver to use the specific snapshot.

One way is to grep the build plans for the given compiler version. This is the only way as of now.

Alternatively, we can provide a simpler way to somehow directly specify a compiler version as a constraint in choosing the resolver. Without an explicit constraint init is free to choose any compiler version that satisfies the dep constraints so will not work for this use-case.

Another possible way is to provide a list/show command to print info about all snapshots and one of the fields in that would be compiler version. That will help the user in choosing a resolver matching the compiler.

However, an issue with this is that it means the resolver could produce a different build plan at a later date (due to newer snapshots).

Yeah, compiler resolvers are inherently non-deterministic so can't be used as a real deterministic resolver. I wasn't proposing to really change the meaning of compiler resolver but only the solver algo to choose the packages when a compiler resolver is used. Instead of freely choosing from Hackage index prefer packages from a matching snapshot which could be a better option.

@harendra-kumar
Copy link
Collaborator Author

I was going to fix this and noticed this. If we are trying all major lts and the latest nightly and select the best - what do we do with [--prefer-lts] | [--prefer-nightly]? I suggest we drop these two to keep things simple. They don't seem very useful to me.

If someone has a preference over what resolver to use for init then they can use the proposed init-resolvers (refer #1590 ) which is much more powerful way to express the same thing . Too many choices are confusing and not much useful unless there is a strong need to have multiple ways of doing the same thing.

@mgsloan
Copy link
Contributor

mgsloan commented Jan 15, 2016

Agreed, dropping --prefer-lts and --prefer-nightly makes sense to me.

No I wasn't thinking about optimization. Consider a use case where a user wants to compile his package using ghc-x.y.z. He wants to find out which snapshots provide that particular version of compiler? Once he knows that information then he can use --resolver to use the specific snapshot.

Ah, I see! I think I'd prefer a generalization of this feature. What if init took extra constraints? So, to choose snapshots with ghc-7.10.3, you'd specify --constraint ghc==7.10.3

Another possible way is to provide a list/show command to print info about all snapshots and one of the fields in that would be compiler version. That will help the user in choosing a resolver matching the compiler.

Having list/show commands for snapshot info makes sense to me. One tricky part of this for this application is choosing which snapshots to list. I suppose there could be a way to list the most recent version of each major lts + latest nightly.

@harendra-kumar
Copy link
Collaborator Author

So, to choose snapshots with ghc-7.10.3, you'd specify --constraint ghc==7.10.3

That makes sense.

One small inconsistency maybe that we encode the snapshot constraint in --resolver today. For example --resolver lts-2 means the latest lts-2. But that may not be seen as inconsistency as we can view --resolver for specifying and constraining snapshot namespace and --constraint for package versions including ghc.

So in general we could perhaps specify --constraint package-name=x.y.z --constraint package-name2=x.y.z.

We can perhaps call it --package instead to be more specific? Though it also includes the compiler.

@mgsloan
Copy link
Contributor

mgsloan commented Jan 19, 2016

One small inconsistency maybe that we encode the snapshot constraint in --resolver today. For example --resolver lts-2 means the latest lts-2. But that may not be seen as inconsistency as we can view --resolver for specifying and constraining snapshot namespace and --constraint for package versions including ghc.

Yup, I see both as ways of constraining snapshot selection, so I don't see any inconsistency. This does allow for the user to ask for impossible things, though (lts-2 with ghc-7.10)

We can perhaps call it --package instead to be more specific? Though it also includes the compiler.

We could, but if we call it constraint, we can be consistent with cabal install. It should take all valid cabal constraint syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants