Refactoring proposition about Package/LocalPackage #5920

theobat · 2022-10-30T16:23:12Z

While trying to renew some effort on #5460, #5504, I envisoned a much needed refactoring first and a more iterative approach. The current merge request is old and too ambitious to land in any reasonable fashion.

So my proposal this time is much more limited and yet I believe very important. My assessment about the Package and LocalPackages type is that they are Cabal 1.0 oriented and they have very ad-hoc code for named components (tests, executables, benchmarks, libraries). This has very bad impacts on the codebase in many places and it should be fixed before trying any component based builds.

Here are some examples of what I mean :

-- in the definition of LocalPackage in ./src/Stack/Types/Package.hs
data LocalPackage = LocalPackage
    { lpPackage       :: !Package
    -- ^ The @Package@ info itself, after resolution with package flags,
    -- with tests and benchmarks disabled
    , lpComponents    :: !(Set NamedComponent)
    -- ^ Components to build, not including the library component.
    , lpUnbuildable   :: !(Set NamedComponent)
    -- ^ Components explicitly requested for build, that are marked
    -- "buildable: false".
    , lpWanted        :: !Bool -- FIXME Should completely drop this "wanted" terminology, it's unclear
    -- ^ Whether this package is wanted as a target.
    , lpTestDeps      :: !(Map PackageName VersionRange)
    -- ^ Used for determining if we can use --enable-tests in a normal build.
    , lpBenchDeps     :: !(Map PackageName VersionRange)
    -- ^ Used for determining if we can use --enable-benchmarks in a normal
    -- build.
    , lpTestBench     :: !(Maybe Package)
    -- ^ This stores the 'Package' with tests and benchmarks enabled, if
    -- either is asked for by the user.

And the main creation of such type is enlightened by an edifying comment :

        -- We resolve the package in 4 different configurations:
        --
        -- - pkg doesn't have tests or benchmarks enabled.
        --
        -- - btpkg has them enabled if they are present.
        --
        -- - testpkg has tests enabled, but not benchmarks.
        --
        -- - benchpkg has benchmarks enabled, but not tests.
        --
        -- The latter two configurations are used to compute the deps
        -- when --enable-benchmarks or --enable-tests are configured.
        -- This allows us to do an optimization where these are passed
        -- if the deps are present. This can avoid doing later
        -- unnecessary reconfigures.
        pkg = resolvePackage config gpkg
        btpkg
            | Set.null tests && Set.null benches = Nothing
            | otherwise = Just (resolvePackage btconfig gpkg)
        testpkg = resolvePackage testconfig gpkg
        benchpkg = resolvePackage benchconfig gpkg

I mean this is pretty crazy, see, enabling tests and benchmarks in a local package (that is, enabling component centric aaspects) is leading to the duplication of an entire Package type (with very package level information being duplicated) in lpTestBench. And this is only one example but many things are extremely unordered and feel bad and ad-hoc for the reader. That we have to resolvePackage in 4 different manners is very telling, again Package is keeping the very same information 4 times there (e.g. packageName, packageVersion, packageLicense ...)

And then we have the very 2017 explanation of what I'm saying here in the code comment :

-- | A pair of package descriptions: one which modified the buildable
-- values of test suites and benchmarks depending on whether they are
-- enabled, and one which does not.
--
-- Fields are intentionally lazy, we may only need one or the other
-- value.
--
-- MSS 2017-08-29: The very presence of this data type is terribly
-- ugly, it represents the fact that the Cabal 2.0 upgrade did _not_
-- go well. Specifically, we used to have a field to indicate whether
-- a component was enabled in addition to buildable, but that's gone
-- now, and this is an ugly proxy. We should at some point clean up
-- the mess of Package, LocalPackage, etc, and probably pull in the
-- definition of PackageDescription from Cabal with our additionally
-- needed metadata. But this is a good enough hack for the
-- moment. Odds are, you're reading this in the year 2024 and thinking
-- "wtf?"
data PackageDescriptionPair = PackageDescriptionPair
  { pdpOrigBuildable :: PackageDescription
  , pdpModifiedBuildable :: PackageDescription
  }

The reason for this seems that the Package datatype has not been revised correctly since the introduction of proper named components in Cabal 2.0. I think addressing this issue is a top priority for the viability of the stack codebase in the future. I see two potential (broad) approach here, either we try to get closer to the Cabal datatypes and in particular to GenericPackageDescription which roughly is the Cabal counterpart of Package in stack. Or we try to build something way more factorized [than the current Package definition] with proper named components architecture in mind, something like :

data PackageOrComponentInfo = PackageOrComponentInfo {
 -- flags, dependencies, whatever makes sense for both components and libraries.
}
data Package = Package {
  -- package level only info such as name, version, license, etc
  packageInfo :: PackageOrComponentInfo,
  componentInfoMap :: Map NamedComponent PackageOrComponentInfo
}

-- Or maybe even better/more cabal like : 
data Package = Package {
  -- package level only info such as name, version, license, etc
  packageInfo :: PackageOrComponentInfo,
  testInfo :: Map UnqualComponentName PackageOrComponentInfo,
  executableInfo :: Map UnqualComponentName PackageOrComponentInfo,
  benchmarkInfo :: Map UnqualComponentName PackageOrComponentInfo,
  libInfo :: Map LibraryName PackageOrComponentInfo
}

For what it's worth, @snoyberg was suggesting in 2017 that we pursue the first scenario here (pull in as much cabal-original- types as possible and relevant and add required info on top). This requires a digging in the Cabal provided datatypes to explore to what extent they are compatible with this or not.
Both scenarios are major refactorings.
Tell me what you think (besides, #5504 can be considered abandonned for now, I don't see how we can properly support components without having this refactoring first).

The text was updated successfully, but these errors were encountered:

theobat · 2022-10-30T21:47:57Z

After digging a bit into the cabal types, it seems bringing in the PackageDescription from cabal as a (mostly) replacement for the Package datatype from stack (and the LocalPackage datatype) should work. Maybe we'll have to bring additional data such as "named components to build" (or maybe not maybe cabal already has a field for that, but it seems a lot of what we do in stack is redundant with the info/functions related to PackageDescription.

Another possibility, which might bring the best of both world is to create a very similar structure to the PackageDescription from cabal with potentially additional data as well, something like :

-- for instance for the subLibraries field in cabal's PackageDescription : 

subLibraries :: [[Library](https://hackage.haskell.org/package/Cabal-syntax-3.8.1.0/docs/Distribution-Types-Library.html#t:Library)]

-- we have a stack counterpart like :
data StackLibrary =  StackLibrary {
  cabalLibrary :: Library,
  shouldBeBuilt :: Boolean,
  -- etc, some more fields 
}
-- and then in the equivalent of PackageDescription for Stack (something like StackPackageDescription) we get :
subLibraries :: [StackLibrary]

mpilgrem · 2022-10-31T22:13:14Z

@theobat, thanks for thinking about this. It is going to take me a little while to get up to speed with the history but anything that puts Stack on a firmer foundation for future improvements and chimes with suggestions by @snoyberg is likely good by me.

theobat · 2022-11-14T11:33:38Z

Here's a more formal proposition of what I envison (I already began to work on this but It can change so any counter-proposition, request for change is welcome). I'll link this post into the stack-collaborators channel in slack and potentially post a summary of the comments/reactions (if any).

Overall description and example

Overall the idea is to bring as much of the component level information into cabal-like structured component datatypes.
I'll try to illustrate this plan with an example : the packageLibraries field of the Package type.

Here's the definition of the packageLibraries field of the Package type :

          packageLibraries :: !PackageLibraries          -- ^ does the package have a buildable library stanza?

And here's the (simplified) place where we define it :

packageLibraries =
        let mlib = do
              lib <- library pkg
              guard $ buildable $ libBuildInfo lib
              Just lib
         in
          case mlib of
            Nothing -> NoLibraries
            Just _ -> HasLibraries foreignLibNames

We can see here this field group different notions into one :

Do we have a buildable main library
Give me the set of foreign library names

In many places we only pattern match on the ADT to know whether we have a buildable main library.
I argue this can be simplified through a dedicated type for Library, ForeignLibrary and the collections of each of them in the Package datatype like this :

// in the Package datatype :
          ,packageLibrary :: Maybe StackLibrary
          ,packageSubLibraries :: CompCollection StackLibrary
          ,packageForeignLibraries :: CompCollection StackForeignLibrary
          ,packageTestSuites :: CompCollection StackTest
          ,packageBenchmarkSuites :: CompCollection StackBenchmark
          ,packageExecutables :: CompCollection StackExecutable

These would be Stack equivalent to the Cabal related types with only the necesary information for Slack :

-- Stack foreign libraries.
--
-- [ForeignLib](https://hackage.haskell.org/package/Cabal-syntax/docs/Distribution-Types-Foreign-Libraries.html) is the Cabal equivalent.
--
data StackForeignLibrary =
    StackForeignLibrary
    { name :: StackUnqualCompName
    , buildInfo :: !StackBuildInfo
    }
    deriving (Show, Typeable)

-- Stack executable.
--
-- [Executable](https://hackage.haskell.org/package/Cabal-syntax/docs/Distribution-Types-Executable.html) is the Cabal equivalent.
--
data StackExecutable =
    StackExecutable
    { name :: StackUnqualCompName
    , buildInfo :: !StackBuildInfo
    , modulePath :: FilePath
    }
    deriving (Show, Typeable)

-- etc

Envisoned list of refactorings

Provide all Stack equivalent to Cabal component types (Library, ForeignLibrary, Executable, TestSuite, Benchmark)
Provide a dedicated Collection for those components (to facilitate migration for the underlying struture depending on interface requirements, now we need to be able to lookup by component names, but we don't know what the best and most efficient struture for this is).
Attach those component collections to the Package type
Remove/refactor previous definitions (packageDeps, packageAllDeps, packageLibraries, packageTest, etc) into actual and explicit component collections.
Refactor package files and options to use component level functions

I dont believe this is exhaustive, but it's the gist of it.

Why do I believe this is good for stack ?

Current design has a very blurry separation layer between cabal types and stack types, this has, and will, make any migration to next cabal versions harder and harder. An obvious example of that is that, for gathering package files, we traverse all the (cabal structured) components again after we already traversed for gathering their dependencies. We do this by keeping a reference to the PackageDescription type which is huge so I don't think this is particularly good.
Having a more cabal like structure for packages will potentially enable stack to more easily factorise code with Cabal through smple newtype wrappers (UnqualifiedComponentName, PackageName cpome to mind). This will contrast to the current situation where we either use unwrapped cabal types (which is potentially dangerous for migrations) or stack dedicated types with non negligible translation costs.
Finally there are Cabal features since Cabal 2.2 (Backpack, sublibrary dependency, etc), which will be complex to add to Stack as-is (and without this refactoring, bringing these features will bring redundancy in the Package type and even more denormalization).

This was referenced Nov 1, 2022

Refactoring proposal on Library name haskell/cabal#8567

Open

Simple refactoring #5926

Merged

theobat mentioned this issue Nov 13, 2022

refactoring: bring package file gathering logic into dedicated module… #5942

Closed

theobat mentioned this issue Nov 14, 2023

Package components followup #6335

Merged

theobat closed this as completed Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring proposition about Package/LocalPackage #5920

Refactoring proposition about Package/LocalPackage #5920

theobat commented Oct 30, 2022 •

edited

Loading

theobat commented Oct 30, 2022 •

edited

Loading

mpilgrem commented Oct 31, 2022

theobat commented Nov 14, 2022 •

edited

Loading

Refactoring proposition about Package/LocalPackage #5920

Refactoring proposition about Package/LocalPackage #5920

Comments

theobat commented Oct 30, 2022 • edited Loading

theobat commented Oct 30, 2022 • edited Loading

mpilgrem commented Oct 31, 2022

theobat commented Nov 14, 2022 • edited Loading

Overall description and example

Envisoned list of refactorings

theobat commented Oct 30, 2022 •

edited

Loading

theobat commented Oct 30, 2022 •

edited

Loading

theobat commented Nov 14, 2022 •

edited

Loading