-
Notifications
You must be signed in to change notification settings - Fork 698
Liam's Notes on Cabal Codebase (Draft, will remove name on finalization)
('23, Oct 4)
Hi folks, I hope people don't mind that I'm using repo access to set up a guide to the current codebase and help with navigation.
Right now, my goal is to
- Help provide a space for my own notes as I traverse the codebase.
- Provide a basis for a future codebase guide to help onboard people who wish to assist with the codebase in the future.
- Provide a public page for people on the project to correct me ergonomically when I'm wrong.
This wiki page is intended to be transient either way; if I give up, I'll delete this page, if I finish up, it's just uncollected notes and will be reformatted into a broader guide.
As for the codebase itself, this is the central repository for Cabal, the standard GHC Haskell build tool.
The codebase dates all the way back to 2003 or 2004, I believe, and while it's hallowed, it is also a product that, in its history, managed gain an enormous amount of capability in a short period of time, is seemingly a bit understaffed, and has had to keep up with breaking changes in GHC.
Beyond a question of history, cabal is also a more ambitious project than it may appear. It is intended to be a unified build-tool supporting any Haskell compiler, and this still remains as a goal. Likewise, as it is a fundamental build-tool, it also cannot make use of more recent amenities in the Haskell ecosystem, and must often reduplicate library code if it wishes to use them.
As of this writing, the total size, if you git clone it, is about 132k SLOC, and by the time you read this, it might have grown to 140K SLOC or been reduced to something closer to 110k SLOC as the provisions for older "v1" commands are removed.
Of the non-repo parts of the codebase, CONTRIBUTING.MD is the official guide to the codebase, containing:
- Build Instructions
- Advice on Tests
- Quality Assurance
- Code Style, and library (dependencies shipped with GHC up to 5 years old, which at the time of this writing includes 8.6.2) / language extensions constraints (everything but Template Haskell)
README.MD contains the official notes to the project, including contact information and a link to the official manual
As for the repository itself, the four main parts of the repo are, in order of sub-repository size:
- cabal-install (executable: cabal), the actual CLI tool that Haskell programmers use to build Haskell libraries and applications.
This is the largest repo by size, clocking in at around 60-70k SLOC, especially since it has backward compatibility for "v1" commands.
- Cabal. The "core" of Cabal. This is what actually executes the build instructions and so on.
Notably, Cabal can actually be used without cabal-install (Cabal was originally used with Setup.hs files), and Cabal is usually what's getting called by Stack and Nix, not cabal-install.
Moreover, as the oldest and most stable part of Cabal, it is also the part with the highest code quality.
Unfortunately, Cabal, it seems, can't actually access hackage, or solve for packages if they're not present.
-
Cabal-syntax. This module contains the parser for at least the .cabal manifest (I believe, but cannot confirm quickly, that the .project file parser is in the cabal-install module).
-
cabal-install-solver. This is the dependency solver for cabal.
Most of the other parts of the repo exist for development assistance only, and are not published.
Besides that, bootstrap provides the ability to make cabal-install without having it installed beforehand.
Cabal uses its own custom Prelude, but since Cabal is intended to support other implementations of Haskell beyond just GHC, it has to use other means to disable the default Prelude than GHC's -XNoImplicitPrelude. The Haskell Report allows the use of import Prelude () for most of this purpose, but still results in imports of instances.
Many of the individual repositories contain their own Prelude, but they seem to commonly point to Distribution.Compat.Prelude in Cabal-syntax at the end.
Source Guide is your best bet right now. I also recommend that you start with Cabal Core, as it is more self-contained than cabal-install, the section most users are most familiar with.
If you're unfamiliar with the interface provided by Cabal Core, consider Setup.hs in the Cabal documentation.
While Cabal core is intended to be used as a library, in the documentation, Setup.hs is mentioned. This can be treated as an alternate entry point.
Here, Setup.hs simply calls defaultMain in Distribution.Simple.Simple.
defaultMain getArgs, then passes the result to defaultMainHelper with "simpleUserHooks" (defaults).
defaultMainHelper calls expandResponse to extract the response files (or arg files, a standard way to get around command line character limits), then runs a case on the result of commandsRun (globalCommand commands) commands args' (with args' being the adjusted args after considering response files).
Here, commandsRun is the arg parser from Distribution.Command (which itself is a caller for an actual parser), and globalCommand is a global command producer in Distribution.Setup.Global, which creates a CommandUI value wrapping some data together.
Where clause contents:
Most of the where clause consists of helpers, i.e, a variety of putStr foos that terminate the program.
More interesting is the "prog" definition, as well as the command list.
The prog definition is used as arguments for part of the command list, and what it does, between Distribution.Simple.Program.Db and Distribution.Simple.Program.Builtin, is load up a default list of programs.
The command list uses commandAddAction from Distribution.Simple.Command to build a list of commands from IO functions in the Distribution.Simple module.
Let's take a brief detour into commandsRun parser in Distribution.Command.
commandsRun is actually basically a splitter and caller here.
The globalCommands field is called via passing it first to commandParseArg, which then passes it to getOpt', which is, as suggested before, a clone of getOpt' from base.
This basically parses the string input according to a bunch of commands generated by additional functions, then passes it out as a formatted datatype, indicating its command representation.
As for commands, this gets used twice. First, it has a helpui item added to it, which then gets filtered for "normal commands". This is then appended to opts from CommandList.
The other time this is used is when CommandReadyToGo yields a non-help argument. Then, a look-up is attempted on the command list, and if it fails, an error is passed on.
Cabal (core) has a lot of commands in the Simple module, and one logical way to go through the ordering is to use the Setup.hs documentation.
As a starting example, let's try the runhaskell Setup.hs configure --ghc, runhaskell Setup.hs build, runhaskell Setup.hs install toolchain.
Here's a very interesting thing that comes up here. A substantial portion of the IO actions taken here are not hardcoded into the functions, but derived from the hooks argument to Cabal/Distribution.Simple.configure.
UserHooks is actually a wrapper allowing choice of a whole slew of custom functions, IO and otherwise, which can be directly fed into the library via defaultMainWithHooks functions.
Since UserHooks IO actions and functions can mean anything, let's just avoid the defaultMainWithHooks and go to the default hooks instead.
{Draft
findDistPrefOrDefault (configDistPref flags) -> this, in order, produces either the flag from the arguments given, the CABAL_BUILDDIR environment variable, or the default "dist" prefix.
preConf hooks Args
confPkgDescr
confHook hooks epkg_descr flags' -> produces localBuildInfo0 (start), which then gets some defaults slapped on. Afterwards, it's just reportage.
writePersistBuildConfig
postConf
}
{Draft
findDistPrefOrDefault
getBuildConfig
reconfigurePrograms
hookedAction
}
tbc
The first interesting idiosyncrasy you might find on cabal-install is the ultra-sparse main. This is a norm with many Haskell applications, where the actual application is just a very thin wrapper around a library, allowing for easy reuse of existing code.
It runs getArgs, then passes the args to another main in the library section of the subrepository.
Here, we have a bit more complexity.
The main, to begin with, is not actually a main, but rather an initialization function.
Most of the initialization calls are self-explanatory, but one point to note is the Response File support.
The args given to the main function are split, then processed via expandResponse, which grants support for response files, a way to override command line argument limits.
The processed result is then pushed into the mainWorker function, which, using topHandler as an exception handler, runs commandsRun to, given a commands list in the where clause of mainWorker, to produce data is then pattern-matched into IO actions for actual execution.