Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Begin using component-based builds #4745

Closed
snoyberg opened this issue Apr 16, 2019 · 30 comments
Closed

Begin using component-based builds #4745

snoyberg opened this issue Apr 16, 2019 · 30 comments
Milestone

Comments

@snoyberg
Copy link
Contributor

Originally discussed in #834, we shelved work on this until Cabal added support. Cabal added support for component based builds long enough ago that we can drop support for older versions and fully embrace a component based system (see #4475). Not only will this help address some bugs, but will also open the door for Backpack support (see #2540).

@qrilka is planning on taking a look at this to estimate how long an implementation will take. This is currently planned for Stack 2.0, but may be deferred if it turns out this is more complex than it seems.

Additional issues which may be affected by this (incomplete list):

@snoyberg
Copy link
Contributor Author

I've given this a bit more thought, and given the number of changes already planned for Stack 2.0, and the goal of releasing relatively soon, I'd rather hold off on this until Stack 2.2. I'm going to leave this in the P1 milestone for the Stack 2.2 release.

@Ericson2314
Copy link

What's the progress on this? 2.2 is the next non-patch release right? Correct me if I'm wrong, but the linked issues all are reverse dependencies on this. Are there other issues/PRs that are working towards this?

@snoyberg
Copy link
Contributor Author

snoyberg commented Nov 7, 2019

I believe there is currently no progress on this, unless someone is working on it that I'm not aware of. This would definitely be a great addition to get in, and I'd be happy to provide feedback/mentoring if someone is interested in tackling it.

@Ericson2314
Copy link

Ericson2314 commented Nov 12, 2019

@snoyberg Thanks for the update! I came here from (surprise) the backpack issue (#2540), as this depends on this. Obsidian is potentially interested in working on this. I'll need to make estimates and roadmap.

@theobat
Copy link
Contributor

theobat commented Aug 16, 2020

So as @qrilka stated he won't be able to work on this, and this is needed for backpack support (and likely needed for additional cabal3 features such as public sublibraries). I started to look at the code a bit, I feel like this change is very pervasive in the code base such as changing all mentions of Package in Build to Component ? Is there type creation involved or is every notion of cabal component already in there (with a few additions to NamedComponent ?)

Also I'm not sure to what extent this is impacting the build plan ? Or is it only impacting the Task level ?

Maybe a more precise description of what you mean by "component-based builds" would help @snoyberg ?

tl;dr I'm interested in doing this, but I wouldn't mind a little help to assemble a "todo" list.

@theobat
Copy link
Contributor

theobat commented Aug 16, 2020

So, as far as I understand, the main structural change is to switch

-- extracted from Stack.Build.ConstructPlan
Task { taskProvides = PackageIdentifier
                    (packageName package)
                    (packageVersion package)
-- ... other stuff

Into something with an additional NamedComponent instead (and pass it from the cabal file representation, which, I havn't figured out yet).

This is however, (still, as far as I understand), not enough for a real component-based build; we should build all the transitive dependencies of a component with an appropriate NamedComponent specifier (e.g build lib or internal-lib xyz only if it's something with more than one option). SO quite a bit of change in the addPackageDeps function of Stack.Build.ConstructPlan.

@snoyberg
Copy link
Contributor Author

@theobat thank you for volunteering to step up on this one, we'd definitely appreciate the help! I think you generally speaking have the right idea. But I'm not the person with the most up-to-date information on this. @qrilka has more details, but is currently on vacation. When he's back, he should be able to provide more concrete feedback.

@qrilka
Copy link
Contributor

qrilka commented Sep 1, 2020

Thanks @theobat for your interest in this ticket, finally I'm back from my vacation and hopefully could help a bit here.
When I looked into this topic I had an initial plan like the following: find out how to build components using Setup.hs and then create a proper build plans with components which should look like a graph. The current plan generation in Stack.Build.ConstructPlan is a bit ad-hoc and that results for example in problems with handling dependency cycles.
With that in place I think fixing execution of build plans shouldn't be a problem.
I know it's not a very detailed plan but I suppose that further details and directions will be known after starting the tasks I listed.
I'd be happy to answer any further questions if I can.

@theobat
Copy link
Contributor

theobat commented Sep 14, 2020

hi @qrilka, thanks for the answer. I'm getting into Setup.hs in the cabal doc/code. If you have any pointer to a good ressource for learning about it, please let me know.

@qrilka
Copy link
Contributor

qrilka commented Sep 14, 2020

Unfortunately I'm not sure if there is a good resource describing the interface Cabal provides with Setup.hs but at least the main details could be seen in its CLI help, e.g. by using stack runhaskell -- Setup.hs --help (or just use runhaskell if there is a proper GHC on your $PATH)

@theobat
Copy link
Contributor

theobat commented Sep 23, 2020

So as far as I understand, stack already use Setup.hs to build packages. The real question is how to use setup.hs for components (pretty obvious with hindsight, but I was not aware of the former...).

Here's what I gathered:

In Cabal 2.0, support for a single positional argument was added to runhaskell Setup.hs configure This makes Cabal configure the specific component to be configured. Specified names can be qualified with lib: or exe: in case just a name is ambiguous (as would be the case for a package named p which has a library and an executable named p.) This has the following effects:

  • Subsequent invocations of cabal build, register, etc. operate only on the configured component.
  • Cabal requires all “internal” dependencies (e.g., an executable depending on a library defined in the same package) must be found in the set of databases via --package-db (and related flags): these dependencies are assumed to be up-to-date. A dependency can be explicitly specified using --dependency simply by giving the name of the internal library; e.g., the dependency for an internal library named foo is given as --dependency=pkg-internal=pkg-1.0-internal-abcd.
  • Only the dependencies needed for the requested component are required. Similarly, when --exact-configuration is specified, it’s only necessary to specify --dependency for the component. (As mentioned previously, you must specify internal dependencies as well.)
  • Internal build-tool-depends and build-tools dependencies are expected to be in the PATH upon subsequent invocations of setup.

I should probably propose an example for this (examples are great when you're the reader...) in cabal.
Here's how I understand it, building components is not different from building packages, you just have to use the correct positional argument sepcifying what to configure first and then build as it's built today.

So my idea of the "shortest path" to have component based builds:

  • add the component required for every lib (so lib:name-of-the-package) in Stack.Build.Execution at the config stage.
  • pass down to Stack.Build.Execution the needed component for the actual stack command target (say, you are executing stack build somehting:lib, pass down this somehting:lib to the build plan).

Now this is a simple and ad-hoc addition, and I don't see why Stack.Build.ConstructPlan should be involved, but I'm certainly missing something. Besides, my goal is to set up proper foundations for supporting backpack, so I suppose I should check how backpack works at the runhaskell cli level...

@qrilka
Copy link
Contributor

qrilka commented Sep 23, 2020

If I understand correctly you're planning just to make components explicit when they were set implicitly as "build complete package"/"build complete package with tests/benchmarks". I think this could help with some issues where components appear only as targets as it will not be able to use components as dependencies (which requires ConstructPlan changes) and thus e.g. cycle from #4891 (comment) won't be treated properly. As I'm not very familiar how Backpack works under the hood it's not clear if your "shortest path" proposal could be enough for Backpack.

@theobat
Copy link
Contributor

theobat commented Sep 24, 2020

Fine, let's talk about the grand plan then. It seems that Plan is PackageName's adressed:

-- | A complete plan of what needs to be built and how to do it
data Plan = Plan
    { planTasks :: !(Map PackageName Task)
    , planFinals :: !(Map PackageName Task)
    -- ^ Final actions to be taken (test, benchmark, etc)
    , planUnregisterLocal :: !(Map GhcPkgId (PackageIdentifier, Text))
    -- ^ Text is reason we're unregistering, for display only
    , planInstallExes :: !(Map Text InstallLocation)
    -- ^ Executables that should be installed after successful building
    }
    deriving Show

At the very least, planTasks should have the full component name + package name. Now it's the first problem there is no such type. Something like:

data PackageAndComponentName = Plan
    { packageName :: !PackageName
    , componentName :: !NamedComponent
    }
    deriving (Show, Eq)
-- with NamedComponent as:
-- | A single, fully resolved component of a package
data NamedComponent
    = CLib
    | CInternalLib !Text
    | CExe !Text
    | CTest !Text
    | CBench !Text
    deriving (Show, Eq, Ord)

And then we could use that to address the planTasks, which to my understanding would probably solve the circular dependency problem.
Then a bunch of Plan consumers would have to change.
The other part of this problem, as I see it, is to update Task datatype which has mostly two attributes of interest:

-- | A task to perform when building
data Task = Task
    { taskProvides        :: !PackageIdentifier -- FIXME turn this into a function on taskType?
    -- ^ the package/version to be built
    , taskType            :: !TaskType
    -- ^ the task type, telling us how to build this
-- ... etc ...
}

-- task type:

-- | The type of a task, either building local code or something from the
-- package index (upstream)
data TaskType
  = TTLocalMutable LocalPackage
  | TTRemotePackage IsMutable Package PackageLocationImmutable
    deriving Show

Nothinf currently points in the Packagetype what component we should build (it's a descriptive type) and nothing in the Task type either. So, something like componentToBuild :: !NamedComponent should probably be added to task.

From this point, we mostly have to change addDep in ConstructPlan to use the freshly created PackageAndComponentName.

There are a bunch of problems though since the Installed packages only suppport :

data Installed
    = Library PackageIdentifier GhcPkgId (Maybe (Either SPDX.License License))
    | Executable PackageIdentifier
    deriving (Show, Eq)

Which is not what we want (we want the full power of installed component that is the constructor list provided by NamedComponent).
Overall everything being PackageName addressed is a deep problem, and a real PackageAndComponentName should be used, pretty much everywhere...

Seems like a pretty drastic change everywhere on a hard to navigate codebase ( for various reasons ; Execute.hs is huge and not modular, TemplateHaskell and CPP usage prevent the excellent haskell language server from functionning properly).
Anyway I'll tinker a bit to see what can be done, please let me know if I'm mistakening somewhere or missing something.

@qrilka
Copy link
Contributor

qrilka commented Sep 24, 2020

I don't see any particular comments to what you have written with an exception of 1 question: what are the "modules" you have highlighted as bold? Did you mean components maybe?

@theobat
Copy link
Contributor

theobat commented Sep 25, 2020

Yes It's a mistake I fixed it, thanks.

@qrilka
Copy link
Contributor

qrilka commented Sep 25, 2020

Sure, no problem

@qrilka
Copy link
Contributor

qrilka commented Oct 10, 2020

Hello @theobat
I wanted to ask about your progress with this issue and if I could be of any help.
Also during previous weeks I was trying to get a correct solution for #5381 but it looks like that such a proper solution could require component-based builds. And at the same time it looks like the current implementation of "implicit snapshots" which was the main reason for Stack 2 has some issues with component based builds. The main point is that Stack assumes 1 snapshot as a source of all dependencies of the project. And for a normal project stack build and stack test dependencies are different and thus give different snapshots. Some way around this is that Stack assumes that a library built with a wider "target" as e.g. stack test works a more narrow stack build and because of that stack build after stack test will not rebuild anything.
A proper way around this would be to have dependency tracking per component.
I'm not sure if anything of this is needed for Backpack but building new hacks around current hacks in Stack could get result in a fragile construction.
Hopefully this gives you some more information and not just scares you away :)

@theobat
Copy link
Contributor

theobat commented Oct 12, 2020

Hi @qrilka,

I've been busy the past ffew weeks, but I did get a better understanding of the whole stack build function calls. What seems to matter is the 3 last bullet points in the following list:

  • main in src/main/Main.hs
  • commandLineHandler in src/main/Main.hs
  • buildCmd in src/main/Main.hs
  • Stack.Build.build in src/Stack/Build.hs
  • gather localDumpPkgs sourceMap installedMap in build in src/Stack/Build.hs
  • constructPlan in Stack.Build.ConstructPlan
  • executePlan in Stack.Build.Execution

Now my main concern is that, what I planned to do above only represent half of the equation (constructing the plan and executing it). I'm stuck on the source map changes needed for now. I don't really understand how to gather the components dependencies from stack.yaml itself (or the cabal file for that matter).
It definitely seems absent in the current SourceMap, and the issue is that the dependencies are distributed by locations such as:

  • dependency package
  • project package
  • global package

So maybe you can help me with that ? Otherwise it'll need a bit more digging in the whole SourceMap thing.

PS: as a side note, why is template haskell so pervasive in the code base ? What's his purpose ? It prevents the haskell-language-server to function properly, so maybe we should do something about this, it definitely hampers any effort on the codebase.
At the very least it should be isolated enough so that commenting out one import and providing a mock settles the problem.

@qrilka
Copy link
Contributor

qrilka commented Oct 12, 2020

Actually I have given a part of a response to your question in my reply above - SourceMap as a base of implicit snapshots it not quite compatible with components thus needs to be changed. And I guess dependency tracking also needs to be updated. The current dependency information is mostly Cabal 1.0 - i.e. tracking only dependencies between packages (and package ~ library) with some extra hacks on top (e.g. supporting sublibraries to some extent).
In principle implementation could be done in 2 steps:

  • enable component-based builds of project packages (sometimes called "local" in the source code) while keeping dependency (or "remote") packages as they are right now
  • enable building components of dependency packages (i.e. libary as usual, sublibrariesm, executables and also multiple public libraries) - in that case it looks like we will need to split dependency packages into their components. One question here could be if it could be more optimal to build multiple components at once.
    I'm not sure if this gives you a good answer about source maps - please ask more concrete questions so I'll try to answer them better.
    As for TH I don't have much advice or additional information here - maybe @snoyberg could tell something about this.

@theobat
Copy link
Contributor

theobat commented Dec 12, 2020

A stratospherically high overview

This is the set of actions that are required for having full blown component based builds.
First of all, at a high level, what's "just" missing is:

  • The component addressed dependencies as stated in the cabal files.
  • The component set to build in each task of the construct plan.
  • The component set already installed when a package has already been installed.
  • The systematic component registering before doing anything with setup.hs

Now the reality is much worse, because there are a bunch of datatypes which are still heavily based on old cabal versions such as Package in Package.hs.

As stated in the snoyberg comment of the file, the whole mess between Package, LocalPackage etc should be simplified, which is probably the best first step to envision.

The grand master plan

For now the grand master plan, has 2 groups : the descriptive changes (1 and 2) and the actionnable changes (3 and 4). Descriptive changes should not change stack's behavior. actionnable ones should change stack behavior. It should roughly follow this:

    1. Replace the Package datatype with PackageDescription from cabal and, with a number of fields on top to be determined.
      This is a pretty tough endeavor (but probably not the worst in this list) : Package is explicitely used 14 times in 7 different files. But I believe this is the way to go.
      This should also make clear whether NamedComponents or the Cabal rep for components should be used.
      Note that as far as backpack is concerned, this should be the most significant shift, because we can introduce support for mixins by fixing the Package mess.
    1. The InstalledMap misses the installed components for a package, which should be added. It comes from the sourceMap so this has to be changed as well. I believe the datatype for this should be a double map:
      Map PackageName (Map NamedComponent installationStuff)
      Or maybe a tuple, package level info and the map of
      component specific things.
      AFAIK, the Installed datatype is only produced in the realBuild function in singleBuild in Execute.hs
      Note that this is not affecting the InstallMap type, which is merely pointing at installation info, which should not change (a component location should be the same as its package location right ? Actually not sure about this anymore)
      The missing bit from ghc-pkg is (supposedly) in installedComponentId which should be added to DumpPackage (and probably all the way down to InstalledMa)

    1. Define a datatype for the ConstructPlanMonadState, with the ability to find circular deps at the component level. Should probably keep the PackageName map and just add the set of components within it. Thus keeping a single task for all components of a package is easier (and we don't have to reunite afterwards). We can lookup the package name and then refine the lookup at the component level with the result.
    1. Define the action to be taken at the execution stage (singleBuild function) where we actually register, build and copy using Setup.hs. It should probaly, systematically use components, which is not the case right now. This requires, of course that we pass the required component set to build. Adding this component set to the Task datatype will probably be required.

The plan has to be refined a bit more to be sure this (significant) effort is the right one.
A pull request for documenting current behaviors will stay open while I make the descriptive changes (it can be merged after that).

Request for comments @qrilka :)

@qrilka
Copy link
Contributor

qrilka commented Dec 13, 2020

@theobat I didn't get what you mean in your item ii when you say that "InstalledMap misses the installed components" and also "this is not affecting the InstallMap type". If you say that we'll only need actually installed components - unfortunately I didn't check what does Cabal (or probably ghc-pkg) allow to find out (or regsiter): at least previously the situation was that only libraries could be registered and properly checked for being already registered.
As for iii - I don't think I understand your plan about circular dependencies. One way out is of course is just to ignore them at the beginning.
Otherwise I don't see any particular objections to you plan. Thanks for sharing.

@theobat
Copy link
Contributor

theobat commented Dec 13, 2020

I didn't get what you mean in your item ii when you say that "InstalledMap misses the installed components" and also "this is not affecting the InstallMap type". If you say that we'll only need actually installed components - unfortunately I didn't check what does Cabal (or probably ghc-pkg) allow to find out (or regsiter): at least previously the situation was that only libraries could be registered and properly checked for being already registered.

Maybe I don't understand what InstalledMap is all about, my understanding is that, if something has been installed, we track this in some kind of SQLite database which is updated whenever we build something new. I thought the question of recording the set of installed components was completely "controlled" and didn't think it could be a ghc/cabal thing.
I had something like this in mind:

type InstalledMap = Map PackageName (Map NamedComponent (InstallLocation, Installed))

The toInstallMap and getInstalled (just below, same file) functions could use better comments though so feel free to give a better description so that I can improve their haddocks.

As for iii - I don't think I understand your plan about circular dependencies. One way out is of course is just to ignore them at the beginning.

With the Package datatype changes, dependencies will be tracked at the component level (old & default behavior, being the main lib component), which means the check for circular dependencies should support that, whereas today it's enforced using this:

type ConstructPlanState = Map PackageName (Either ConstructPlanException AddDepRes) 

As above I think something like

type ConstructPlanState = Map PackageName (Map NamedComponent (Either ConstructPlanException AddDepRes))

Will be required. But this means we should have a single task per component, while a bunch of things are common between components so I thought of something like:

type ConstructPlanState = Map PackageName (Map NamedComponent (Maybe ConstructPlanException), Maybe AddDepRes)

Which would enable a set of components to trigger a single task. It would probably trigger less changes to the task datatype.
This is still at the idea stage though, maybe all this belongs in the task itself (that is, the Task datatype).

Maybe each step of the plan should be tracked in a dedicated issue, don't you think ?

@qrilka
Copy link
Contributor

qrilka commented Dec 14, 2020

Ouch, it looks like I was not seeing the difference between InstallMap and InstalledMap. One note - Stack doesn't track installed packages by itself, it relies on package database for that.
As for splitting this into separate tickets - I don't see such a need but if this would help you - go ahead. Just don't forget to interlink all the subparts properly.

@theobat
Copy link
Contributor

theobat commented Dec 14, 2020

One note - Stack doesn't track installed packages by itself, it relies on package database for that.

Could you elaborate on that ?

As for splitting this into separate tickets - I don't see such a need but if this would help you - go ahead. Just don't forget to interlink all the subparts properly.

On second thought I'm fine with keeping everything in this one, we can separate the PR's themselves, it should be enough.

@qrilka
Copy link
Contributor

qrilka commented Dec 14, 2020

Could you elaborate on that ?

See loadDatabase - it loads packages from a ghc-pkg dump

@theobat
Copy link
Contributor

theobat commented Dec 14, 2020

So what I'm talking about is likely this:
https://downloads.haskell.org/ghc/latest/docs/html/libraries/Cabal-3.2.0.0/Distribution-InstalledPackageInfo.html#v:installedComponentId
Which roughly means changing all the Map PackageName DumpPackage and their subsequent use to something like Map PackageName (Map NamedComponent DumpPackage)

EDIT:
This is quite undocumented in ghc-pkg but there is a ----ipid option to get the lib-name key:

➜  okok ghc-pkg --ipid describe kk-0.1.0.0-e5f5040f -f ~/.cabal/store/ghc-8.8.4/package.db
name:               z-okok-z-band
version:            0.1.0.0
package-name:       okok
lib-name:           band
id:                 kk-0.1.0.0-e5f5040f
key:                kk-0.1.0.0-e5f5040f
maintainer:         
author:             
abi:                0c91a172260cad5da7d5ed91dbf6e936
data-dir:
    ~/.cabal/store/ghc-8.8.4/kk-0.1.0.0-e5f5040f/share

depends:            base-4.13.0.0

This is necessary for DumpPackage rework (and InstalledMap etc)

@simonmichael
Copy link
Contributor

simonmichael commented Dec 16, 2021

This sounds like a good thing, and I really appreciate the work done on it. Were showstoppers found, or is the plan still viable ?
[Found this after trying to make hls+stack work on a package with an internal library.]

@theobat
Copy link
Contributor

theobat commented Dec 21, 2021

hey @simonmichael thanks for your interest, I lack the time to implement the plan drawn here, but most of it is still valid as far as I know.
I think my initial PRs focused too much on refactoring existing things in Stack, which consumed a lot of time and hindered the underlying goal.
This PR #5504 already implement most of the plan though, so maybe a good first step is to understand why the tests are failing in it (Which I failed to understand, but I'll admit I didn't look much because I ran out of time).

@mpilgrem
Copy link
Member

mpilgrem commented Dec 1, 2023

@theobat, given your #6343, I wondered if this issue could be sensibly closed in favour, perhaps, of opening more specific/granular ones. I would welcome your view.

@theobat
Copy link
Contributor

theobat commented Dec 1, 2023

Yes this should be closed, I'll rephrase what's still valid in another dedicated issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants