Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring proposition about Package/LocalPackage #5920

Closed
theobat opened this issue Oct 30, 2022 · 3 comments
Closed

Refactoring proposition about Package/LocalPackage #5920

theobat opened this issue Oct 30, 2022 · 3 comments

Comments

@theobat
Copy link
Contributor

theobat commented Oct 30, 2022

Hello @mpilgrem,

While trying to renew some effort on #5460, #5504, I envisoned a much needed refactoring first and a more iterative approach. The current merge request is old and too ambitious to land in any reasonable fashion.

So my proposal this time is much more limited and yet I believe very important. My assessment about the Package and LocalPackages type is that they are Cabal 1.0 oriented and they have very ad-hoc code for named components (tests, executables, benchmarks, libraries). This has very bad impacts on the codebase in many places and it should be fixed before trying any component based builds.

Here are some examples of what I mean :

-- in the definition of LocalPackage in ./src/Stack/Types/Package.hs
data LocalPackage = LocalPackage
    { lpPackage       :: !Package
    -- ^ The @Package@ info itself, after resolution with package flags,
    -- with tests and benchmarks disabled
    , lpComponents    :: !(Set NamedComponent)
    -- ^ Components to build, not including the library component.
    , lpUnbuildable   :: !(Set NamedComponent)
    -- ^ Components explicitly requested for build, that are marked
    -- "buildable: false".
    , lpWanted        :: !Bool -- FIXME Should completely drop this "wanted" terminology, it's unclear
    -- ^ Whether this package is wanted as a target.
    , lpTestDeps      :: !(Map PackageName VersionRange)
    -- ^ Used for determining if we can use --enable-tests in a normal build.
    , lpBenchDeps     :: !(Map PackageName VersionRange)
    -- ^ Used for determining if we can use --enable-benchmarks in a normal
    -- build.
    , lpTestBench     :: !(Maybe Package)
    -- ^ This stores the 'Package' with tests and benchmarks enabled, if
    -- either is asked for by the user.

And the main creation of such type is enlightened by an edifying comment :

        -- We resolve the package in 4 different configurations:
        --
        -- - pkg doesn't have tests or benchmarks enabled.
        --
        -- - btpkg has them enabled if they are present.
        --
        -- - testpkg has tests enabled, but not benchmarks.
        --
        -- - benchpkg has benchmarks enabled, but not tests.
        --
        -- The latter two configurations are used to compute the deps
        -- when --enable-benchmarks or --enable-tests are configured.
        -- This allows us to do an optimization where these are passed
        -- if the deps are present. This can avoid doing later
        -- unnecessary reconfigures.
        pkg = resolvePackage config gpkg
        btpkg
            | Set.null tests && Set.null benches = Nothing
            | otherwise = Just (resolvePackage btconfig gpkg)
        testpkg = resolvePackage testconfig gpkg
        benchpkg = resolvePackage benchconfig gpkg

I mean this is pretty crazy, see, enabling tests and benchmarks in a local package (that is, enabling component centric aaspects) is leading to the duplication of an entire Package type (with very package level information being duplicated) in lpTestBench. And this is only one example but many things are extremely unordered and feel bad and ad-hoc for the reader. That we have to resolvePackage in 4 different manners is very telling, again Package is keeping the very same information 4 times there (e.g. packageName, packageVersion, packageLicense ...)

And then we have the very 2017 explanation of what I'm saying here in the code comment :

-- | A pair of package descriptions: one which modified the buildable
-- values of test suites and benchmarks depending on whether they are
-- enabled, and one which does not.
--
-- Fields are intentionally lazy, we may only need one or the other
-- value.
--
-- MSS 2017-08-29: The very presence of this data type is terribly
-- ugly, it represents the fact that the Cabal 2.0 upgrade did _not_
-- go well. Specifically, we used to have a field to indicate whether
-- a component was enabled in addition to buildable, but that's gone
-- now, and this is an ugly proxy. We should at some point clean up
-- the mess of Package, LocalPackage, etc, and probably pull in the
-- definition of PackageDescription from Cabal with our additionally
-- needed metadata. But this is a good enough hack for the
-- moment. Odds are, you're reading this in the year 2024 and thinking
-- "wtf?"
data PackageDescriptionPair = PackageDescriptionPair
  { pdpOrigBuildable :: PackageDescription
  , pdpModifiedBuildable :: PackageDescription
  }

The reason for this seems that the Package datatype has not been revised correctly since the introduction of proper named components in Cabal 2.0. I think addressing this issue is a top priority for the viability of the stack codebase in the future. I see two potential (broad) approach here, either we try to get closer to the Cabal datatypes and in particular to GenericPackageDescription which roughly is the Cabal counterpart of Package in stack. Or we try to build something way more factorized [than the current Package definition] with proper named components architecture in mind, something like :

data PackageOrComponentInfo = PackageOrComponentInfo {
 -- flags, dependencies, whatever makes sense for both components and libraries.
}
data Package = Package {
  -- package level only info such as name, version, license, etc
  packageInfo :: PackageOrComponentInfo,
  componentInfoMap :: Map NamedComponent PackageOrComponentInfo
}

-- Or maybe even better/more cabal like : 
data Package = Package {
  -- package level only info such as name, version, license, etc
  packageInfo :: PackageOrComponentInfo,
  testInfo :: Map UnqualComponentName PackageOrComponentInfo,
  executableInfo :: Map UnqualComponentName PackageOrComponentInfo,
  benchmarkInfo :: Map UnqualComponentName PackageOrComponentInfo,
  libInfo :: Map LibraryName PackageOrComponentInfo
}

For what it's worth, @snoyberg was suggesting in 2017 that we pursue the first scenario here (pull in as much cabal-original- types as possible and relevant and add required info on top). This requires a digging in the Cabal provided datatypes to explore to what extent they are compatible with this or not.
Both scenarios are major refactorings.
Tell me what you think (besides, #5504 can be considered abandonned for now, I don't see how we can properly support components without having this refactoring first).

@theobat
Copy link
Contributor Author

theobat commented Oct 30, 2022

After digging a bit into the cabal types, it seems bringing in the PackageDescription from cabal as a (mostly) replacement for the Package datatype from stack (and the LocalPackage datatype) should work. Maybe we'll have to bring additional data such as "named components to build" (or maybe not maybe cabal already has a field for that, but it seems a lot of what we do in stack is redundant with the info/functions related to PackageDescription.

Another possibility, which might bring the best of both world is to create a very similar structure to the PackageDescription from cabal with potentially additional data as well, something like :

-- for instance for the subLibraries field in cabal's PackageDescription : 

subLibraries :: [[Library](https://hackage.haskell.org/package/Cabal-syntax-3.8.1.0/docs/Distribution-Types-Library.html#t:Library)]

-- we have a stack counterpart like :
data StackLibrary =  StackLibrary {
  cabalLibrary :: Library,
  shouldBeBuilt :: Boolean,
  -- etc, some more fields 
}
-- and then in the equivalent of PackageDescription for Stack (something like StackPackageDescription) we get :
subLibraries :: [StackLibrary]

@mpilgrem
Copy link
Member

@theobat, thanks for thinking about this. It is going to take me a little while to get up to speed with the history but anything that puts Stack on a firmer foundation for future improvements and chimes with suggestions by @snoyberg is likely good by me.

@theobat
Copy link
Contributor Author

theobat commented Nov 14, 2022

Here's a more formal proposition of what I envison (I already began to work on this but It can change so any counter-proposition, request for change is welcome). I'll link this post into the stack-collaborators channel in slack and potentially post a summary of the comments/reactions (if any).

Overall description and example

Overall the idea is to bring as much of the component level information into cabal-like structured component datatypes.
I'll try to illustrate this plan with an example : the packageLibraries field of the Package type.

Here's the definition of the packageLibraries field of the Package type :

          packageLibraries :: !PackageLibraries          -- ^ does the package have a buildable library stanza?

And here's the (simplified) place where we define it :

packageLibraries =
        let mlib = do
              lib <- library pkg
              guard $ buildable $ libBuildInfo lib
              Just lib
         in
          case mlib of
            Nothing -> NoLibraries
            Just _ -> HasLibraries foreignLibNames

We can see here this field group different notions into one :

  • Do we have a buildable main library
  • Give me the set of foreign library names

In many places we only pattern match on the ADT to know whether we have a buildable main library.
I argue this can be simplified through a dedicated type for Library, ForeignLibrary and the collections of each of them in the Package datatype like this :

// in the Package datatype :
          ,packageLibrary :: Maybe StackLibrary
          ,packageSubLibraries :: CompCollection StackLibrary
          ,packageForeignLibraries :: CompCollection StackForeignLibrary
          ,packageTestSuites :: CompCollection StackTest
          ,packageBenchmarkSuites :: CompCollection StackBenchmark
          ,packageExecutables :: CompCollection StackExecutable

These would be Stack equivalent to the Cabal related types with only the necesary information for Slack :

-- Stack foreign libraries.
--
-- [ForeignLib](https://hackage.haskell.org/package/Cabal-syntax/docs/Distribution-Types-Foreign-Libraries.html) is the Cabal equivalent.
--
data StackForeignLibrary =
    StackForeignLibrary
    { name :: StackUnqualCompName
    , buildInfo :: !StackBuildInfo
    }
    deriving (Show, Typeable)

-- Stack executable.
--
-- [Executable](https://hackage.haskell.org/package/Cabal-syntax/docs/Distribution-Types-Executable.html) is the Cabal equivalent.
--
data StackExecutable =
    StackExecutable
    { name :: StackUnqualCompName
    , buildInfo :: !StackBuildInfo
    , modulePath :: FilePath
    }
    deriving (Show, Typeable)

-- etc

Envisoned list of refactorings

  • Provide all Stack equivalent to Cabal component types (Library, ForeignLibrary, Executable, TestSuite, Benchmark)
  • Provide a dedicated Collection for those components (to facilitate migration for the underlying struture depending on interface requirements, now we need to be able to lookup by component names, but we don't know what the best and most efficient struture for this is).
  • Attach those component collections to the Package type
  • Remove/refactor previous definitions (packageDeps, packageAllDeps, packageLibraries, packageTest, etc) into actual and explicit component collections.
  • Refactor package files and options to use component level functions

I dont believe this is exhaustive, but it's the gist of it.

Why do I believe this is good for stack ?

  • Current design has a very blurry separation layer between cabal types and stack types, this has, and will, make any migration to next cabal versions harder and harder. An obvious example of that is that, for gathering package files, we traverse all the (cabal structured) components again after we already traversed for gathering their dependencies. We do this by keeping a reference to the PackageDescription type which is huge so I don't think this is particularly good.
  • Having a more cabal like structure for packages will potentially enable stack to more easily factorise code with Cabal through smple newtype wrappers (UnqualifiedComponentName, PackageName cpome to mind). This will contrast to the current situation where we either use unwrapped cabal types (which is potentially dangerous for migrations) or stack dedicated types with non negligible translation costs.
  • Finally there are Cabal features since Cabal 2.2 (Backpack, sublibrary dependency, etc), which will be complex to add to Stack as-is (and without this refactoring, bringing these features will bring redundancy in the Package type and even more denormalization).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants